Reviews R back frm the Making Sense of Microposts wrkshp where they apparently accepted my 6p position papr A new ANEW: Evaluation of a word list for sentiment analysis in microblogs. The workshop is part of the ESWC 2011 conference. For an introduction to my paper read my previous blog post.
The reviews for my article were not surprising. They stated something like: a bit low on originality and ok methodology. The editors requested an example that illustrated the difference between my approach and other approaches. In the table above I have an example with one of the very early tweets — apparently among the first 10,000. The example I looked on showed to illustrate the issues nicely. The tweeter has an ear infection and uses a couple of words with valence: “infection”, “impossible” and “fucking”. It turns out that my word list (AFINN) has only “fucking”, while General Inquirer (GI) only has “infection” and OpinionFinder (OF) only “impossible”. Interestingly OF has “infected” and “infectious”. ANEW has “infection” and also “sleep” which it sets as mostly positive. I wonder if the best way is to assemble all the word lists into one big one. The example is now in the newest version of the paper. (If you examine the paper I hope you appreciate the great amount of illegal tweaking in the LaTeX document preparation system I did to get the paper under the 6-page limit while adding the table).
One reviewer noted a disbalance between “normal” and obscene words, e.g., that “hell” (my present valence = -4) and “tits” (my present valence -2). The first word could be quite ambiguous and the other used mostly positive or neutral, e.g., “It feel good as hell outside” and “yoo her tits look like candy corns”. I somewhat agree. A better mean valence for “tits” could be +1. With “hell” I am not sure. Though ambiguously used, most frequently it is used fairly negative. Maybe -3 or -2 would be bet
One emailing researcher asked about the accuracy. It didn’t quite fit in the six pages I had available for the paper, but the confusion matrix is listed below. From that is it possible to compute the total accuracy: (277+5+299)/1000 = 58.1%. If the calculation of the accuracy is restricted to positive and negative I get: (277+299)/(277+61+123+299) = 75.8%. one-off accuracy is (1000-123-61)/1000 = 81.6%
The title of the accepted papers for the workshop is available. Nine out of 19 submission were accepted. They seem all quite interesting. I see there are two papers on political tweets. I look forward to hear about them, as Danish media tells us that Danish politicians are at war on Twitter. Denmark is going to have election for the “Folketing” parlament later this year and it will be the first election where we likely will see that politicians use social media seriously.
There is a Facebook group for the workshop and in the conference registration you could key in LinkedIn and Twitter accounts as well as blog address. Last time I went to the ESWC conference (in 2009) some among the participants/organizers distributed RFID tags we could wear. With data mining at the end of the conference they could determine who among the participants were spending most time together. If I remember correctly Sebastian Schaffert and his teddy bear won.