Sentiment analysis of Wikipedia pages on Danish politicians

Posted on Updated on

Nielsen2013python_ellentrane

We are presently analyzing company articles on Wikipedia with simple sentiment analysis to determine how well we see any interesting patterns, e.g., whether the Wikipedia sentiment correlates with real world attitudes and events with relation to the company. Such analyses might uncover that there was a small edit war in relation to Lundbeck articles in the beginning of December 2012. We are also able to see that the Arlas Foods article was affected by the Muhammed Cartoon Crisis and the 2008 Chinese milk scandal.

In Denmark in the beginning of January 2013 there has been media buzz on Danish politicians and their staff doing biased edits in the Danish Wikipedia. The story carried forth by journalist Lars Fogt focused initially on Ellen Trane Nørby.

It is relatively easy to turn our methods employed for companies to Danish politicians. The sentiment analysis works by matching words to a word list labeled with “valence”. The initial word list worked only for English, but I have translated it to Danish and continuously extend it. So now one needs only to download the relevant Wikipedia history for a page and run the text through the sentiment analysis using the computer code I already have developed.

The figure shows the sentiment for Ellen Trane Nørby’s Danish Wikipedia article through time. The largest positive jump in sentiment (the way that I measure it) comes from a user inserting content on 2 March 2011. This revision inserts, e.g., “great international commitment” and “impressive election”. Journalist Lars Fogt identified the user as Ellen Trane Nørby staff.

Surely the simple word list approach does not work well all the time. The second largest positive jump in sentiment arise when a user deletes a part of the article for POV reasons. That part contained negative words such as svag (weak), trafficking and udsatte (exposed). The simple word list detects the deletion of the words as a positive event. However, the context which they appeared in was actually positive, e.g, “… Ellen Trane Nørby is a socially committed politician, who also fights for the weak and exposed in society, …”.

As far as I understand journalist Lars Fogt used the Danish version of the Wikipedia Scanner provided by Peter Brodersen, see the list generated for Ellen Trane Nørby. Brodersen’s tool does not (yet?) provide automated sentiment score, but does a good job in providing an overview of the edit history.

(2013-01-16: typo correction)

8 thoughts on “Sentiment analysis of Wikipedia pages on Danish politicians

    […] Årup Nielsen's blog A fine WordPress.com site « Sentiment analysis of Wikipedia pages on Danish politicians Jean-Pierre Hombach and Amazon.com: Large-scale Wikipedia copyright infringers? […]

      Anders Boje Larsen (@anders_boje) said:
      May 14, 2013 at 9:33 am

      How did you make the translation into Danish? I am doing a thesis on CBS about sentiment analysis and need a danish word list for analyzing the comment from Facebook and Twitter

        Finn Årup Nielsen responded:
        May 14, 2013 at 9:38 am

        I have translated my word list into Danish. I have not yet made any form of validation or put it online, but you can contact me and I can send you the word list.

    Anders Boje Larsen (@anders_boje) said:
    May 14, 2013 at 9:34 am

    How did you make the translation into Danish? I am doing a thesis on CBS about sentiment analysis and need a danish word list for analyzing the comment from Facebook and Twitter

    Klara Sørensen said:
    January 9, 2019 at 2:31 pm

    How many words does the danish wordlist contain?

Leave a comment