I my effort to beat the SentiStrength text sentiment analysis algorithm by Mike Thelwall I came up with a low-hanging fruit killer approach, — I thought. Using the standard movie review data set of Bo Pang available in NLTK (used in research papers as a benchmark data set) I would train an NTLK classifier and compare it with my valence-labeled wordlist AFINN and readjust its weights for the words a little.
What I found, however, was that for a great number of words the sentiment valence between my AFINN word list and the classifier probability trained on the movie reviews were in disagreemet. A word such as ‘distrustful’ I have as a quite negative word. However, the classifier reports the probability for ‘positive’ to be 0.87, i.e., quite positive. I examined where the word ‘distrustful’ occured in the movie review data set:
$ egrep -ir "\bdistrustful\b" ~/nltk_data/corpora/movie_reviews/
The word ‘distrustful’ appears 3 times and in all cases associated with a ‘positive’ movie review. The word is used to describe elements of the narrative or an outside reference rather than the quality of the movie itself. Another word that I have as negative is ‘criticized’. Used 10 times in the positive moview reviews (and none in the negative) I find one negation (‘the casting cannot be criticized’) but mostly the word in a contexts with the reviewer criticizing the critique of others, e.g., ‘many people have criticized fincher’s filming […], but i enjoy and relish in the portrayal’.
The top 15 ‘misaligned’ words using my ad hoc metric are listed here:
It seems that reviewers are interested in movies that have a certain amount of ‘melancholy’, ‘anger’, distrustfulness and (further down the list) scandal, apathy, hoax, struggle, hopelessness and hindrance. Whereas smile, amusement, peacefulness and gratefulness are associated with negative reviews. So are movie reviewers unempathetic schadefreudians entertained by the characters’ misfortune? Hmmm…? It reminds me of journalism where they say “a good story is a bad story”.
So much for philosophy, back to reality:
The words (such as ‘hurrah’) that have a classifier probability on 0.25 and 0.75 typically occure each only once in the corpus. In this application of the classifier I should perhaps have used a stronger prior probability so ‘hurrah’ with 0.25 would end up on around the middle of the scale with 0.5 as the probability. I haven’t checked whether it is possible to readjust the prior in the NLTK naïve Bayes classifier.
The conclusion on my Thelwallizer is not good. A straightforward application of the classifier on the movie reviews gets you features that look on the summary of the narrative rather than movie per se, so this simple approach is not particular helpful in readjustment of the weights.
However, there is another way the trained classifier can be used. Examining the most informative features I can ask if they exist in my AFINN list. The first few missing words are: slip, ludicrous, fascination, 3000, hudson, thematic, seamless, hatred, accessible, conveys, addresses, annual, incoherent, stupidity, … I cannot use ‘hudson’ in my word list, but words such as ludicrous, seamless and incoherent are surely missing.
(28 January 2012: Lookout in the code below! The way the features are constructed for the classifier is troublesome. In NLTK you should not only specify the words that appear in the text with ‘True’ you should also normally specify explicitely the words that do not appear in the text with ‘False’. Not mentioning words in the feature dictionary might be bad depending on the application)