Making sense of microposts

Posted on

Example tweet scoring. -5 has been subtracted from the AMT and ANEW score. AMT is Alan Mislove’s data scored from the Amazon Mechanical Turk.

Words
ear
infection
making
it
impossible
2
sleep
headed
2
the
doctors
2
get
new
prescription
so
fucking
early

AFINN
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
-4
0
-4
ANEW
0
-3.34
0
0
0
0
2.2
0
0
0
0
0
0
0
0
0
0
0
-1.14
GI
0
-1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0

1
OF
0
0
0
0
-1
0
0
0
0
0
0
0
0
0
0
0
0
0
-1
SS


















-2
AMT


















-3.4

Reviews R back frm the Making Sense of Microposts wrkshp where they apparently accepted my 6p position papr A new ANEW: Evaluation of a word list for sentiment analysis in microblogs. The workshop is part of the ESWC 2011 conference. For an introduction to my paper read my previous blog post.

The reviews for my article were not surprising. They stated something like: a bit low on originality and ok methodology. The editors requested an example that illustrated the difference between my approach and other approaches. In the table above I have an example with one of the very early tweets — apparently among the first 10,000. The example I looked on showed to illustrate the issues nicely. The tweeter has an ear infection and uses a couple of words with valence: “infection”, “impossible” and “fucking”. It turns out that my word list (AFINN) has only “fucking”, while General Inquirer (GI) only has “infection” and OpinionFinder (OF) only “impossible”. Interestingly OF has “infected” and “infectious”. ANEW has “infection” and also “sleep” which it sets as mostly positive. I wonder if the best way is to assemble all the word lists into one big one. The example is now in the newest version of the paper. (If you examine the paper I hope you appreciate the great amount of illegal tweaking in the LaTeX document preparation system I did to get the paper under the 6-page limit while adding the table).

One reviewer noted a disbalance between “normal” and obscene words, e.g., that “hell” (my present valence = -4) and “tits” (my present valence -2). The first word could be quite ambiguous and the other used mostly positive or neutral, e.g., “It feel good as hell outside” and “yoo her tits look like candy corns”. I somewhat agree. A better mean valence for “tits” could be +1. With “hell” I am not sure. Though ambiguously used, most frequently it is used fairly negative. Maybe -3 or -2 would be bet
ter.

One emailing researcher asked about the accuracy. It didn’t quite fit in the six pages I had available for the paper, but the confusion matrix is listed below. From that is it possible to compute the total accuracy: (277+5+299)/1000 = 58.1%. If the calculation of the accuracy is restricted to positive and negative I get: (277+299)/(277+61+123+299) = 75.8%. one-off accuracy is (1000-123-61)/1000 = 81.6%

AMT
Positive Neutral Negative
Positive 277 5 61
AFINN Neutral 157 5 67
Negative 123 6 299

The title of the accepted papers for the workshop is available. Nine out of 19 submission were accepted. They seem all quite interesting. I see there are two papers on political tweets. I look forward to hear about them, as Danish media tells us that Danish politicians are at war on Twitter. Denmark is going to have election for the “Folketing” parlament later this year and it will be the first election where we likely will see that politicians use social media seriously.

There is a Facebook group for the workshop and in the conference registration you could key in LinkedIn and Twitter accounts as well as blog address. Last time I went to the ESWC conference (in 2009) some among the participants/organizers distributed RFID tags we could wear. With data mining at the end of the conference they could determine who among the participants were spending most time together. If I remember correctly Sebastian Schaffert and his teddy bear won.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s