Coming Scholia, WikiCite, Wikidata and Wikipedia sessions

Posted on

In the coming months I will have three different talks on Scholia, WikiCite, Wikidata and Wikipedia at al.:

  • 3. October 2018 in DGI-byen, Copenhagen, Denmark as part of Visuals and Analytics that Matter conference, – the concluding conference for the DEFF-sponsored project Research Output & Impact Analyzed and Visualized (ROIAV).
  • 7. November 2018 in Mannheim as part of the Linked Open Citation Database (LOC-DB) 2018 workshop.
  • 13. december 2018 at the library of the Technical University of Denmark as part of Wikipedia – a media for sharing knowledge and research, an event for researchers and students (and still in the planning phase).

In september I presented Scholia as part of the Workshop on Open Citations. The slides with title Scholia as of September 2018 is available here.


A viewpoint on a viewpoint on Wikipedia’s neutral point of view

Posted on Updated on

I recently looked into what we have of Wikipedia research from Denmark and discovered several papers that I did not know about. I have now added some to Wikidata, so that Scholia can show a list of them.

Among the papers was one from Jens-Erik Mai titled Wikipedian’s knowledge and moral duties. Starting from the English Wikipedia’s Neutral Point of View (NPOV) policy, he stresses a dichotomy between the subjective and the object and argues for a rewrite of the policy. Mai claims the policy has an absolutistic center and a relativistic edge, corresponding to an absolutistic majority view and relativistic minority views.

As a long time Wikipedia editor, I find Mai’s exposition is too theoretical. I lack good exemplifications: cases where the NPOV fails, and I cannot see in what concrete way the NPOV policy should be changed to accommodate Mai’s critique. I am not sure that Wikipedians distinguish so much between the objective and the subjective; the key dichotomy is verifiability vs. not veriability, – that the statements in Wikipedia are supported by reliable sources. In terms of center-edge, I came to think of events associated with conspiracy theories. Here the “center” view could be the conventional view while the conspiracy views the edge. It is difficult for me to accommodate a standpoint that conspiracy theories should be accepted as equal as the conventional view. It is neither clear to me that the center is uncontested and uncontroversial. Wikipedia – like a newspaper – has the ability to represent opposing viewpoints. This is done by attributing the viewpoint to the reliable sources that express them. For instance, central in the description of evaluation of films are quotations from reviews of major newspapers and notable reviewers.

I don’t see the support for the claim that the NPOV policy assumes a “politically dangerous ethical position”. On the contrary, Wikipedia is now – after the increase of fake news – been called the “last bastion”. The example given in The Atlantic post is the recent social media fuzz with respect to Sarah Jeong where Wikipedians reach a work with “shared facts about reality.”

A question to Wikimedia Foundation and Wikimania 2014

Posted on Updated on

An open question to Wikimedia Foundation and Wikimania 2014 and its organizing committee with Ed:

Almost everything with the Wikimania meeting in London in August 2014 went very well, people, talks, entertainment, organization, monkey, squirrel, etc. What I am confused about is what happen during my last hour at the meeting Sunday evening: After the buffet I had the experience of meeting two females, one who gave me a business card with a link to, and claimed to be behind the wiki web site. The females seemed not very old, in fact when queried one of them claimed to be 10 years old, and when queried further, she responded she had made the web site when 6 years old with a little help from a family member…

During the Wikimania meeting the documentary about Aaron Swartz, The Internet’s Own Boy, was shown. In that documentary we learned that Aaron Swartz was 12 years old when he created the Wikipedia-like site InfoBase. Thus prodigies can create wiki web sites when they are 12 years old. From that we can deduce that it is unlikely that a six year old female can produce a wiki web site. The closest explanation for my extraordinary vivid experience at Wikimania I can come up with is then that it was a hallucination.

My question is then: How do I get rid of the hallucination? Have other Wikimania participants had a similar hallucinations of meeting preteens claiming to make web sites? Or am I just getting old?

The hallucination has persisted for many days now because I still both see and feel the business card I got.

Sentiment colored sequential collaboration network

Posted on Updated on


Sentiment colored sequential collaboration network of some of the Wikipedians editing the Wikipedia articles associated with the Lundbeck company. Red are negative sentiment, green are positive.

The “sequential collaboration network” is inspired by Analyzing the creative editing behavior of Wikipedia editors: through dynamic social network analysis. Brian Keegan has also done similar kind of network visualization.

Sentiment analysis is based on the AFINN word list.

Jean-Pierre Hombach and Large-scale Wikipedia copyright infringers?

Posted on Updated on

An entity calling itself “Jean-Pierre Hombach” presents itself with “I’m a German writer Comedian and short filmmaker. I’m studying media at the University of Vic.”

The profile on Twitter states “Jean-Pierre Hombach: I’m a German Hobbie writer Comedian and short filmmaker. I’m studying media at the University of Vic. Jean-Pierre speaks fluent English. Rio de Janeiro ·“. There are also a Google Plus account and a Facebook account linked.

The shortlink leads to that lists 17 works. All these have been published in the first part of 2012. What a prolific writer!

If you go to the Justin Bieber book on Google Books you will find “Copyright (C) Jean-Pierre Hombach. All rights reserved. ISBN 978-1-4710-8069-2”. So apparently this Hombach takes the copyright for the work.

If you go a bit further in the book you will read as the first line “Justin Drew Bieber is a Canadian pop/R&B singer, songwriter and actor.” That sounds awfully wikipedish and an examination of the book quickly reveals that this is Wikipedia! “Hombach” has simply aggregated a lot of Wikipedia articles together. If you go all the way to page 505 you will even see that the Jakarta Wikipedia article has been included in the Justin Bieber book… ehhh…?

I leave it as a execise to the reader to examine the rest of the books of Mr. “Hombach”. You may, e.g., begin with the Bob Marley book.

Obviously the copyright does not belong the “Hombach”, but to Wikipedia contributors. It is licensed under CC BY-SA and should be stated so according to the license (and re-licensed under the same license). Otherwise it is not even Copyfraud it is simply Copyright infringement.

Amazingly, a book by “Jean-Pierre” reached number 16 on the music biographies bestseller list according to Los Angeles Times. In that book the contributors are listed in the back and there might also be the CC-license although the page is not available to me on Google Books. Maybe he have read a bit about the CC license. will gladly sell that to you for $23.90 without telling you that the author is not Jean-Pierre.

One interesting issue to note is that “Hombach” copied Wikipedia hoaxster Legolas2186 material on Lady Gaga. Initially it confused me as the “Hombach” book was stated to be copyrighted in 2010 while Legolas2186 hoaxster first added the segment to Wikipedia in the summer of 2011.

To me the wrongful attribution, lack of proper attribution and obfuscation (wrt. copyright year) seem illegal. Wikipedia contributors to the respective works should be able to sue Hombach and for selling their copyrighted works that are not appropriately licensed.

Update 2013-01-25: A Google search on Jean-Pierre Hombach reveals that the works of Jean-Pierre has at least been used five time as a source in Wikipedia, i.e., we have a citation circle! One time in Belieber and one time in Decca Records.

Update 2013-01-25: Apparently Wikipedia has a page for everything Thanks to  Gadget850 (Ed)

More on automated sentiment analysis of Danish politicians on Wikipedia

Posted on Updated on

Previously today I put up sentiment analysis of Danish politician Ellen Trane Nørby on the text of the Danish Wikipedia.

Unfortunately, I could not resist the temptation of spending a bit of time on also running the analysis for some other Danish Politicians. I did it for Prime Minister Helle Thorning-Schmidt, former Prime Minister Lars Løkke Rasmussen, former Foreign Minister Lene Espersen and former Minister Ole Sohn.

For Rasmussens article we see a neutral factual bibliographic article until 2008, though with a slight increase in the end of 2007 when he became Minister of Finance. Then in May 2008 we see a drop in sentiment with the introduction of a paragraph mentioning an “issue” related to his use of county funds for private purposes. Since then the article has been extended and now generally positive. There are some spikes in the plots. These spikes are typically vandalism that persist for a few minutes until reverted.

For Helle Thorning-Schmidt we see a gradual drop up towards the election she wins and after that her article gains considerable positivity. I haven’t check up much on this in the history, but I believe it is related to the tax issue her and her husbond, movie star Stephen Kinnock, had and a number of other issues. As I remember there was concern or discussion on the Danish Wikipedia on whether these “issues” should fill up so large a portion of the article and on the 3 December 2011 a user moved the content to another page.

I believe I am one of the major perpetrators behind both the Lene Espersen and Ole Sohn articles. Both of the articles have large sections which describe negative issues (I really must work on my positivity, these politicians are not that bad). However, the sentiment analysis shows the Ole Sohn article as more positive. Maybe this is due to the “controversy” section described that he paid “tribute” to East Germany and that his party received “support” from Moscow, i.e., my simple sentiment analysis does not understand the controversial aspect of support from communist Moscow and just think that “support” is positive.

Writing politicians article on Wikipedia I find it somewhat difficult to identify good positive articles that can be used as sources. The sources used for the encyclopedic articles usually comes from news articles and these have often a negative bias with a focus on “issues” (political problems). Writing the Lene Espersen article I found that even the book “Bare kald mig Lene”, which I have used a source, has a negativity bias. If I remember correctly Espersen did not want to participate in the development of the book, presumably because she already had the notion that the writers would focus on the problematic “issues” in her career.


(2013-01-10: spell correction)

Sentiment analysis of Wikipedia pages on Danish politicians

Posted on Updated on


We are presently analyzing company articles on Wikipedia with simple sentiment analysis to determine how well we see any interesting patterns, e.g., whether the Wikipedia sentiment correlates with real world attitudes and events with relation to the company. Such analyses might uncover that there was a small edit war in relation to Lundbeck articles in the beginning of December 2012. We are also able to see that the Arlas Foods article was affected by the Muhammed Cartoon Crisis and the 2008 Chinese milk scandal.

In Denmark in the beginning of January 2013 there has been media buzz on Danish politicians and their staff doing biased edits in the Danish Wikipedia. The story carried forth by journalist Lars Fogt focused initially on Ellen Trane Nørby.

It is relatively easy to turn our methods employed for companies to Danish politicians. The sentiment analysis works by matching words to a word list labeled with “valence”. The initial word list worked only for English, but I have translated it to Danish and continuously extend it. So now one needs only to download the relevant Wikipedia history for a page and run the text through the sentiment analysis using the computer code I already have developed.

The figure shows the sentiment for Ellen Trane Nørby’s Danish Wikipedia article through time. The largest positive jump in sentiment (the way that I measure it) comes from a user inserting content on 2 March 2011. This revision inserts, e.g., “great international commitment” and “impressive election”. Journalist Lars Fogt identified the user as Ellen Trane Nørby staff.

Surely the simple word list approach does not work well all the time. The second largest positive jump in sentiment arise when a user deletes a part of the article for POV reasons. That part contained negative words such as svag (weak), trafficking and udsatte (exposed). The simple word list detects the deletion of the words as a positive event. However, the context which they appeared in was actually positive, e.g, “… Ellen Trane Nørby is a socially committed politician, who also fights for the weak and exposed in society, …”.

As far as I understand journalist Lars Fogt used the Danish version of the Wikipedia Scanner provided by Peter Brodersen, see the list generated for Ellen Trane Nørby. Brodersen’s tool does not (yet?) provide automated sentiment score, but does a good job in providing an overview of the edit history.

(2013-01-16: typo correction)