wikipedia

Sentiment colored sequential collaboration network

Posted on Updated on

Nielsen2013realtime

Sentiment colored sequential collaboration network of some of the Wikipedians editing the Wikipedia articles associated with the Lundbeck company. Red are negative sentiment, green are positive.

The “sequential collaboration network” is inspired by Analyzing the creative editing behavior of Wikipedia editors: through dynamic social network analysis. Brian Keegan has also done similar kind of network visualization.

Sentiment analysis is based on the AFINN word list.

Jean-Pierre Hombach and Amazon.com: Large-scale Wikipedia copyright infringers?

Posted on Updated on

An entity calling itself “Jean-Pierre Hombach” presents itself with “I’m a German writer Comedian and short filmmaker. I’m studying media at the University of Vic.”

The profile on Twitter states “Jean-Pierre Hombach: I’m a German Hobbie writer Comedian and short filmmaker. I’m studying media at the University of Vic. Jean-Pierre speaks fluent English. Rio de Janeiro · http://goo.gl/bFdsV“. There are also a Google Plus account and a Facebook account linked.

The shortlink leads to Amazon.com that lists 17 works. All these have been published in the first part of 2012. What a prolific writer!

If you go to the Justin Bieber book on Google Books you will find “Copyright (C) Jean-Pierre Hombach. All rights reserved. ISBN 978-1-4710-8069-2″. So apparently this Hombach takes the copyright for the work.

If you go a bit further in the book you will read as the first line “Justin Drew Bieber is a Canadian pop/R&B singer, songwriter and actor.” That sounds awfully wikipedish and an examination of the book quickly reveals that this is Wikipedia! “Hombach” has simply aggregated a lot of Wikipedia articles together. If you go all the way to page 505 you will even see that the Jakarta Wikipedia article has been included in the Justin Bieber book… ehhh…?

I leave it as a execise to the reader to examine the rest of the books of Mr. “Hombach”. You may, e.g., begin with the Bob Marley book.

Obviously the copyright does not belong the “Hombach”, but to Wikipedia contributors. It is licensed under CC BY-SA and should be stated so according to the license (and re-licensed under the same license). Otherwise it is not even Copyfraud it is simply Copyright infringement.

Amazingly, a book by “Jean-Pierre” reached number 16 on the music biographies bestseller list according to Los Angeles Times. In that book the contributors are listed in the back and there might also be the CC-license although the page is not available to me on Google Books. Maybe he have read a bit about the CC license.

Amazon.com will gladly sell that to you for $23.90 without telling you that the author is not Jean-Pierre.

One interesting issue to note is that “Hombach” copied Wikipedia hoaxster Legolas2186 material on Lady Gaga. Initially it confused me as the “Hombach” book was stated to be copyrighted in 2010 while Legolas2186 hoaxster first added the segment to Wikipedia in the summer of 2011.

To me the wrongful attribution, lack of proper attribution and obfuscation (wrt. copyright year) seem illegal. Wikipedia contributors to the respective works should be able to sue Hombach and Amazon.com for selling their copyrighted works that are not appropriately licensed.

Update 2013-01-25: A Google search on Jean-Pierre Hombach reveals that the works of Jean-Pierre has at least been used five time as a source in Wikipedia, i.e., we have a citation circle! One time in Belieber and one time in Decca Records.

Update 2013-01-25: Apparently Wikipedia has a page for everything http://en.wikipedia.org/wiki/Wikipedia:Republishers Thanks to  Gadget850 (Ed)

More on automated sentiment analysis of Danish politicians on Wikipedia

Posted on Updated on

Previously today I put up sentiment analysis of Danish politician Ellen Trane Nørby on the text of the Danish Wikipedia.

Unfortunately, I could not resist the temptation of spending a bit of time on also running the analysis for some other Danish Politicians. I did it for Prime Minister Helle Thorning-Schmidt, former Prime Minister Lars Løkke Rasmussen, former Foreign Minister Lene Espersen and former Minister Ole Sohn.

For Rasmussens article we see a neutral factual bibliographic article until 2008, though with a slight increase in the end of 2007 when he became Minister of Finance. Then in May 2008 we see a drop in sentiment with the introduction of a paragraph mentioning an “issue” related to his use of county funds for private purposes. Since then the article has been extended and now generally positive. There are some spikes in the plots. These spikes are typically vandalism that persist for a few minutes until reverted.

For Helle Thorning-Schmidt we see a gradual drop up towards the election she wins and after that her article gains considerable positivity. I haven’t check up much on this in the history, but I believe it is related to the tax issue her and her husbond, movie star Stephen Kinnock, had and a number of other issues. As I remember there was concern or discussion on the Danish Wikipedia on whether these “issues” should fill up so large a portion of the article and on the 3 December 2011 a user moved the content to another page.

I believe I am one of the major perpetrators behind both the Lene Espersen and Ole Sohn articles. Both of the articles have large sections which describe negative issues (I really must work on my positivity, these politicians are not that bad). However, the sentiment analysis shows the Ole Sohn article as more positive. Maybe this is due to the “controversy” section described that he paid “tribute” to East Germany and that his party received “support” from Moscow, i.e., my simple sentiment analysis does not understand the controversial aspect of support from communist Moscow and just think that “support” is positive.

Writing politicians article on Wikipedia I find it somewhat difficult to identify good positive articles that can be used as sources. The sources used for the encyclopedic articles usually comes from news articles and these have often a negative bias with a focus on “issues” (political problems). Writing the Lene Espersen article I found that even the book “Bare kald mig Lene”, which I have used a source, has a negativity bias. If I remember correctly Espersen did not want to participate in the development of the book, presumably because she already had the notion that the writers would focus on the problematic “issues” in her career.

Nielsen2013python_llrNielsen2013python_htsNielsen2013python_leneespersenNielsen2013python_olesohn

(2013-01-10: spell correction)

Sentiment analysis of Wikipedia pages on Danish politicians

Posted on Updated on

Nielsen2013python_ellentrane

We are presently analyzing company articles on Wikipedia with simple sentiment analysis to determine how well we see any interesting patterns, e.g., whether the Wikipedia sentiment correlates with real world attitudes and events with relation to the company. Such analyses might uncover that there was a small edit war in relation to Lundbeck articles in the beginning of December 2012. We are also able to see that the Arlas Foods article was affected by the Muhammed Cartoon Crisis and the 2008 Chinese milk scandal.

 

In Denmark in the beginning of January 2013 there has been media buzz on Danish politicians and their staff doing biased edits in the Danish Wikipedia. The story carried forth by journalist Lars Fogt focused initially on Ellen Trane Nørby.

 

It is relatively easy to turn our methods employed for companies to Danish politicians. The sentiment analysis works by matching words to a word list labeled with “valence”. The initial word list worked only for English, but I have translated it to Danish and continuously extend it. So now one needs only to download the relevant Wikipedia history for a page and run the text through the sentiment analysis using the computer code I already have developed.

 

The figure shows the sentiment for Ellen Trane Nørby’s Danish Wikipedia article through time. The largest positive jump in sentiment (the way that I measure it) comes from a user inserting content on 2 March 2011. This revision inserts, e.g., “great international commitment” and “impressive election”. Journalist Lars Fogt identified the user as Ellen Trane Nørby staff.

 

Surely the simple word list approach does not work well all the time. The second largest positive jump in sentiment arise when a user deletes a part of the article for POV reasons. That part contained negative words such as svag (weak), trafficking and udsatte (exposed). The simple word list detects the deletion of the words as a positive event. However, the context which they appeared in was actually positive, e.g, “… Ellen Trane Nørby is a socially committed politician, who also fights for the weak and exposed in society, …”.

 

As far as I understand journalist Lars Fogt used the Danish version of the Wikipedia Scanner provided by Peter Brodersen, see the list generated for Ellen Trane Nørby. Brodersen’s tool does not (yet?) provide automated sentiment score, but does a good job in providing an overview of the edit history.

(2013-01-16: typo correction)

NumPy beginner’s guide: Date formatting, stock quotes and Wikipedia sentiment analysis

Posted on Updated on

Nielsen2012numpy

Last year I acted as one of the reviewers on a book from Packt Publishing: The NumPy 1.5 Beginner’s Guide (ISBN 13 : 978-1-84951-530-6) about the numerical programming library in the Python programming language. I was “blinded” by the publisher, so I did not know that the author was Ivan Idris before the book came out. For my reviewing effort I got a physical copy of the book, an electronic copy of another book and some new knowledge of certain aspects of the NumPy.

One of the things that I did not know before I came across it while reviewing the book was the date formatter in the plotting library (matplotlib) and the ability to download stock quotes via a single function in the NumPy library (there is an example starting on page 171 in the book). There is a ‘candlestick’ plot function that goes well with the return value of the quotes download function.

The plot shows an example of the use of date formatting with stock quotes downloaded from Yahoo! via NumPy together with sentiment analysis of Wikipedia revisions of the Pfizer company.

WikiViz: create the most insightful visualization of Wikipedia???s impact

Posted on

The WikiViz challenge has now been officially announced. The challenge is to create the most insightful visualization of Wikipedia’s impact

“The main goal of this competition is to improve our understanding of how Wikipedia is affecting the world beyond the scope of its own community.”

There are more details here.

Any inspirations? One of the organizers is Dario Taraborelli from the Wikimedia Foundation (he is one of guys behind the not necessarily useful but ridiculously aesthetic Wikipedia deletion discussion visualization). He has made the Readermeter Mendeley readership analysis and visualization website. So maybe you could do something similar with the Wikipedia reader statistics at http://stats.grok.se/ or the raw data here? hmmm… It may be a good idea to take a look at the First Monday article Visualizing the overlap between the 100 most visited pages on Wikipedia for September 2006 to January 2007. See also Google image search for Wikipedia visualization.

Who knows Who knows? You can now play a Facebook application while doing research

Posted on Updated on

Whoknows

On 8th Extended Semantic Web Conference researchers from Potsdam showed a Facebook application. It is a quiz game and is called Who Knows?

The special thing about it is that the questions are automatically generated from Wikipedia via DBpedia. As users’ interaction with the game is recorded the result may be used to improve the ranking of triple data in Semantic Web applications as well as find errors in Wikipedia/DBpedia.

The background scientific paper is WhoKnows? – Evaluating Linked Data Heuristics with a Quiz that Cleans Up DBpedia. Last author Harald Sack is presently on the top of the high score list.

Another of their Facebook quiz applications is Risq.