Month: January 2013

Jean-Pierre Hombach and Large-scale Wikipedia copyright infringers?

Posted on Updated on

An entity calling itself “Jean-Pierre Hombach” presents itself with “I’m a German writer Comedian and short filmmaker. I’m studying media at the University of Vic.”

The profile on Twitter states “Jean-Pierre Hombach: I’m a German Hobbie writer Comedian and short filmmaker. I’m studying media at the University of Vic. Jean-Pierre speaks fluent English. Rio de Janeiro ·“. There are also a Google Plus account and a Facebook account linked.

The shortlink leads to that lists 17 works. All these have been published in the first part of 2012. What a prolific writer!

If you go to the Justin Bieber book on Google Books you will find “Copyright (C) Jean-Pierre Hombach. All rights reserved. ISBN 978-1-4710-8069-2”. So apparently this Hombach takes the copyright for the work.

If you go a bit further in the book you will read as the first line “Justin Drew Bieber is a Canadian pop/R&B singer, songwriter and actor.” That sounds awfully wikipedish and an examination of the book quickly reveals that this is Wikipedia! “Hombach” has simply aggregated a lot of Wikipedia articles together. If you go all the way to page 505 you will even see that the Jakarta Wikipedia article has been included in the Justin Bieber book… ehhh…?

I leave it as a execise to the reader to examine the rest of the books of Mr. “Hombach”. You may, e.g., begin with the Bob Marley book.

Obviously the copyright does not belong the “Hombach”, but to Wikipedia contributors. It is licensed under CC BY-SA and should be stated so according to the license (and re-licensed under the same license). Otherwise it is not even Copyfraud it is simply Copyright infringement.

Amazingly, a book by “Jean-Pierre” reached number 16 on the music biographies bestseller list according to Los Angeles Times. In that book the contributors are listed in the back and there might also be the CC-license although the page is not available to me on Google Books. Maybe he have read a bit about the CC license. will gladly sell that to you for $23.90 without telling you that the author is not Jean-Pierre.

One interesting issue to note is that “Hombach” copied Wikipedia hoaxster Legolas2186 material on Lady Gaga. Initially it confused me as the “Hombach” book was stated to be copyrighted in 2010 while Legolas2186 hoaxster first added the segment to Wikipedia in the summer of 2011.

To me the wrongful attribution, lack of proper attribution and obfuscation (wrt. copyright year) seem illegal. Wikipedia contributors to the respective works should be able to sue Hombach and for selling their copyrighted works that are not appropriately licensed.

Update 2013-01-25: A Google search on Jean-Pierre Hombach reveals that the works of Jean-Pierre has at least been used five time as a source in Wikipedia, i.e., we have a citation circle! One time in Belieber and one time in Decca Records.

Update 2013-01-25: Apparently Wikipedia has a page for everything Thanks to  Gadget850 (Ed)

More on automated sentiment analysis of Danish politicians on Wikipedia

Posted on Updated on

Previously today I put up sentiment analysis of Danish politician Ellen Trane Nørby on the text of the Danish Wikipedia.

Unfortunately, I could not resist the temptation of spending a bit of time on also running the analysis for some other Danish Politicians. I did it for Prime Minister Helle Thorning-Schmidt, former Prime Minister Lars Løkke Rasmussen, former Foreign Minister Lene Espersen and former Minister Ole Sohn.

For Rasmussens article we see a neutral factual bibliographic article until 2008, though with a slight increase in the end of 2007 when he became Minister of Finance. Then in May 2008 we see a drop in sentiment with the introduction of a paragraph mentioning an “issue” related to his use of county funds for private purposes. Since then the article has been extended and now generally positive. There are some spikes in the plots. These spikes are typically vandalism that persist for a few minutes until reverted.

For Helle Thorning-Schmidt we see a gradual drop up towards the election she wins and after that her article gains considerable positivity. I haven’t check up much on this in the history, but I believe it is related to the tax issue her and her husbond, movie star Stephen Kinnock, had and a number of other issues. As I remember there was concern or discussion on the Danish Wikipedia on whether these “issues” should fill up so large a portion of the article and on the 3 December 2011 a user moved the content to another page.

I believe I am one of the major perpetrators behind both the Lene Espersen and Ole Sohn articles. Both of the articles have large sections which describe negative issues (I really must work on my positivity, these politicians are not that bad). However, the sentiment analysis shows the Ole Sohn article as more positive. Maybe this is due to the “controversy” section described that he paid “tribute” to East Germany and that his party received “support” from Moscow, i.e., my simple sentiment analysis does not understand the controversial aspect of support from communist Moscow and just think that “support” is positive.

Writing politicians article on Wikipedia I find it somewhat difficult to identify good positive articles that can be used as sources. The sources used for the encyclopedic articles usually comes from news articles and these have often a negative bias with a focus on “issues” (political problems). Writing the Lene Espersen article I found that even the book “Bare kald mig Lene”, which I have used a source, has a negativity bias. If I remember correctly Espersen did not want to participate in the development of the book, presumably because she already had the notion that the writers would focus on the problematic “issues” in her career.


(2013-01-10: spell correction)

Sentiment analysis of Wikipedia pages on Danish politicians

Posted on Updated on


We are presently analyzing company articles on Wikipedia with simple sentiment analysis to determine how well we see any interesting patterns, e.g., whether the Wikipedia sentiment correlates with real world attitudes and events with relation to the company. Such analyses might uncover that there was a small edit war in relation to Lundbeck articles in the beginning of December 2012. We are also able to see that the Arlas Foods article was affected by the Muhammed Cartoon Crisis and the 2008 Chinese milk scandal.

In Denmark in the beginning of January 2013 there has been media buzz on Danish politicians and their staff doing biased edits in the Danish Wikipedia. The story carried forth by journalist Lars Fogt focused initially on Ellen Trane Nørby.

It is relatively easy to turn our methods employed for companies to Danish politicians. The sentiment analysis works by matching words to a word list labeled with “valence”. The initial word list worked only for English, but I have translated it to Danish and continuously extend it. So now one needs only to download the relevant Wikipedia history for a page and run the text through the sentiment analysis using the computer code I already have developed.

The figure shows the sentiment for Ellen Trane Nørby’s Danish Wikipedia article through time. The largest positive jump in sentiment (the way that I measure it) comes from a user inserting content on 2 March 2011. This revision inserts, e.g., “great international commitment” and “impressive election”. Journalist Lars Fogt identified the user as Ellen Trane Nørby staff.

Surely the simple word list approach does not work well all the time. The second largest positive jump in sentiment arise when a user deletes a part of the article for POV reasons. That part contained negative words such as svag (weak), trafficking and udsatte (exposed). The simple word list detects the deletion of the words as a positive event. However, the context which they appeared in was actually positive, e.g, “… Ellen Trane Nørby is a socially committed politician, who also fights for the weak and exposed in society, …”.

As far as I understand journalist Lars Fogt used the Danish version of the Wikipedia Scanner provided by Peter Brodersen, see the list generated for Ellen Trane Nørby. Brodersen’s tool does not (yet?) provide automated sentiment score, but does a good job in providing an overview of the edit history.

(2013-01-16: typo correction)