Simplest sentiment analysis in Python with AFINN

Posted on Updated on

I have previously blogged about sentiment analysis. Code for simple sentiment analysis with my AFINN sentiment word list is also available from the appendix in the paper A new ANEW: Evaluation of a word list for sentiment analysis in microblogs as well as ready for download. It might be a little difficult to navigate the code, so here I have made the simplest example in Python of sentiment analysis with AFINN that I could think of.

(2012-12-01: Updated link to new gist at github)

Advertisements

30 thoughts on “Simplest sentiment analysis in Python with AFINN

    Michael said:
    March 2, 2012 at 11:01 am

    I can help you make money from sentiment analysis. I am not a programmer I know some one who will pay for a specific application of this. I understand their problem and the solution but I am not a programmer. Would you like to work with me?

    Mayank Amencherla said:
    April 7, 2014 at 5:02 am

    Hi, I used this very same code to understand the given example (“Finn is stupid and idiotic”) on python and it didn’t print the sentiment rating. Is there something that needs to be fixed?

      Finn Årup Nielsen responded:
      April 7, 2014 at 12:25 pm

      Dear Mayank,

      Did you run it as a script, rather than a module? Try to import the file and this run afinn.sentiment(‘good’) and see what comes out.

        Mayank Amencherla said:
        April 10, 2014 at 5:58 pm

        I copy pasted the code into my python editor and ran it.
        You said there was an error and that I needed to add a line of code to fix the pedagogic error?

        What do you mean by importing the file? You mean importing the AFINN-111 file?

        Finn Årup Nielsen responded:
        April 10, 2014 at 6:16 pm

        You should be able to download the afinn.py file via the link https://gist.githubusercontent.com/fnielsen/4183541/raw/001acc714d2aa4dd8146d7b9e5c0ad2478a489b5/afinn.py

        If the AFINN-111 file is installed in a subdirectory AFINN as AFINN/AFINN-111.txt, then you should be able to start python and do:
        >>> import afinn
        >>> afinn.sentiment(“Finn is stupid and idiotic”)
        -2.23606797749979

        I see that the script makes an error because it cannot connect to Twitter since they changed the API, but you should be able to see the first part of the print out:
        python afinn.py
        -2.24 Finn is stupid and idiotic
        -1.58 Finn is only a tiny bit stupid and not idiotic

        Finn Årup Nielsen responded:
        April 10, 2014 at 6:23 pm

        There errors is related to unicoding in Python 2. I believe it is only a problem for the word “naïve”.

    Mayank Amencherla said:
    April 13, 2014 at 10:12 pm

    I really don’t know what I am doing wrong as I did exactly what you asked me to without getting the same results. When I run the code, it doesn’t give me the results.
    Also, the github link you gave me just contains code without a link to download the file.

    Mayank Amencherla said:
    April 13, 2014 at 10:45 pm

    Here’s something funny: When I run it on commandprompt like regular python instead of camopy – my regular editor, it works well.

      Finn Årup Nielsen responded:
      April 14, 2014 at 12:32 pm

      I guess you mean Enthought Canopy. The script is meant to be run from the command prompt. I am not familiar with Canopy, but I suggest you try to move the text and print statement outside “if __name__ == ‘__main__'” block.

        Mayank Amencherla said:
        April 16, 2014 at 10:14 pm

        Ok thanks!

        I have a couple of other questions:

        I have a huge database of tweets in a csv file. I am using data.split to split the line into the tweet text. But when I use f.readline() iteratively, it says that I cannot iterate through the lines and read them. Is there a way to run through the file line by line and output the sentiment rating of each tweet?

        Finn Årup Nielsen responded:
        April 17, 2014 at 4:24 am

        See the short example at https://docs.python.org/2/library/csv.html – the documentation for the csv module

        Finn Årup Nielsen responded:
        May 5, 2014 at 1:42 pm

        You might also want to look into the Pandas library. It has a convenient read_csv function.

    Gerald said:
    July 25, 2014 at 10:40 pm

    Hello Finn, can you tell me anything about the performance of your algorithm? given a testset how many % does it correctly classify?

      Finn Årup Nielsen responded:
      July 28, 2014 at 4:03 pm

      There is a few links to evaluations of AFINN at http://neuro.compute.dtu.dk/wiki/AFINN#Evaluation. I myself did a correlation study, but there are a few other evaluations with classification. It dependent on whether it is 2-class or 3-class classification. It seems to be around 75% for 2-class classification.

        anuj shah said:
        October 27, 2014 at 11:41 pm

        afinn = dict(map(lambda (w, s): (w, int(s)), [ws.strip().split(‘\t’) for ws in open(filenameAFINN)]))
        SyntaxError: invalid syntax

        I keep getting this error. Not sure what I am doing wrong. Using Python 34. Please help.

        Finn Årup Nielsen responded:
        October 28, 2014 at 2:15 pm

        This is related to a change for the lambda keyword between Python 2 and Python 3 http://www.diveintopython3.net/porting-code-to-python-3-with-2to3.html#tuple_params

        One workaround is this:

        afinn = dict(map(lambda ws: (ws[0], int(ws[1])), [ws.strip().split(‘\t’) for ws in open(filenameAFINN)]))

    saimadhu said:
    February 2, 2015 at 8:08 am

    Good one thank’s for sharing

    Abiud Leal said:
    November 3, 2015 at 7:41 pm

    Hello Friend. What does that punctuation ??
    -2.24 Finn is stupid and idiotic
    -1.58 Finn is only a tiny bit stupid and not idiotic

      Finn Årup Nielsen responded:
      November 3, 2015 at 9:31 pm

      These numbers are floating point values representing the overall valence/sentiment of the text.

    Rudolf said:
    February 10, 2016 at 1:37 am

    Hello. This is great; thanks for putting this up. By the way, what is the logic behind weighting individual word sentiment in sqrt(len(sentiments)? I tried making sense of it but to no avail. Thanks!

      Finn Årup Nielsen responded:
      February 10, 2016 at 10:31 am

      There are usually two ways to normalize the sum of the individual words sentiment: by ‘N’ so the result becomes the mean or by 1 (no normalization) so the result becomes the sum. These may be ok, but often they are not particular good. Consider a short post with a single word, “Great”, and a longer post “This is just so great”. With the mean, short text will often score extreme values (here 3) while longer text will score closer to zero (here 3/5) even though the text may be said to have similar overall sentiment. On the other hand summing with no normalization will tend to give longer texts extreme values compared to short texts. The square root normalization is a compromise between the two, – somewhat resembling the normalization in the Student t-test.

        Rudolf said:
        March 25, 2016 at 6:50 am

        Hi Finn. Thanks for the clarification. That makes a lot of sense. By the way, I am going to use your Python code for my senior thesis project. How do I properly cite it?

        Finn Årup Nielsen responded:
        March 25, 2016 at 9:28 am

        The canonical citation is: ‘Finn Årup Nielsen, “A new ANEW: evaluation of a word list for sentiment analysis in microblogs”, Proceedings of the ESWC2011 Workshop on ‘Making Sense of Microposts’: Big things come in small packages. Volume 718 in CEUR Workshop Proceedings: 93-98. 2011 May. Matthew Rowe, Milan Stankovic, Aba-Sah Dadzie, Mariann Hardey (editors)’

        See also https://github.com/fnielsen/afinn and http://neuro.compute.dtu.dk/wiki/AFINN

    Prayson Daniel said:
    November 26, 2016 at 9:24 pm

    Hi Finn,

    I love you script. I am thinking of updating it to notice the context. E.g. as of today, it scores two opposite sentences the same:

    >>> afinn.score(‘Jeg elsker dig ikke’)
    2.0
    >>> afinn.score(‘Jeg elsker dig’)
    2.0

    Let me know if you are already working on it.

      Finn Årup Nielsen responded:
      November 28, 2016 at 11:21 am

      The word list approach is somewhat limited. Proper handling of the contexts such as negation probably would need a machine learning approach like that of Richard Socher (“Semi-supervised recursive autoencoders for predicting sentiment distributions”/”Recursive deep models for semantic compositionality over a sentiment treebank”). I do not beleive that “simple” ways of handling negation will suffice.

    Prayson Daniel said:
    November 28, 2016 at 11:50 am

    How about detecting the presence of negations(e.g. ikke, aldrig &c), and multiply the word prior or posterior given the context(using pos-tagger would be handy here) with -1?Are you aware of any POS-tagger for Danish language?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s