The Facebook emotion contagion experiment, Experimental evidence of massive-scale emotional contagion through social networks, has caused quite a stir.
I have commented and collected a bit on the study on the Brede Wiki were there are pointers to news and blog articles as well as related research and critique.
During the brouhaha I was contacted by a Wired writer, Marcus Wohlsen. Apparently he had run into Jonty Waering which had made a Chrome browser extension as a sarcastic comment to the claims in the study. Waering had used my sentiment analysis word list for his browser extention. I discovered Wohlsen email too late for the deadline to Wohlson’s article, but Waering and his browser extension put the claims and the experiment well down to earth.
I made a few notes to Wohlsen, here slightly edited:
Simple sentiment analysis looks at individual words and surely does not necessarily capture the ‘true’ expressed emotion ignoring, say, context, negation and sarcasm. Nevertheless, many studies now show that this simple word-based approach can to some degree determine the sentiment of a text. There has been some effort for more sophisticated sentiment handling emoticons, negation and booster words, such as VERY happy as well as combining the many sentiment-labeled words there now exists. Such methods generally increase the performance of the sentiment analysis. If you have access to a data set where texts are labeled for sentiment, machine learning can boost the performance further.
It should be noted that a text does not necessarily have a definite sentiment. On short text, such as Twitter messages, even humans may not agree on sentiment.
A remaining question is how well the expressed emotion in a status message as determined by sentiment analysis corresponds to the internal emotional state of the writer. Some researchers take a one-to-one correspondence for granted and argue that you can measure ‘happiness’ . But is it so? I think we are on more shaky ground. There are research projects on suicide prediction by social media monitoring. The Durkheim Project  focuses on predicting military and veteran suicide risk, and by the way work with Facebook for recruitment, has full IRB approval and opt-in. I see no results reported yet.
If we look outside social media a pair of researchers from Montclair State University collected song lyrics from non-suicide and suicide lyricists and created a classifier for predicting whether a song was associated with a suicide lyricist . With my word-based sentiment analysis as one of the features the researchers do a reasonable good prediction. So text analytics seems to be able to predict ‘real’ emotions, – yet again with the condition ‘to some degree’ and certainly not with certainty.
I would also like to note a problem with the Facebook study as I see it. It is the issue of word burstiness. Texts are correlated. To me it is unclear if the study just shows the contagiousness of words – regardless of the emotionality.