Wikipedia review of papers, tools and datasets

Posted on Updated on


In April 2008 I wrote a small Danish overview of Wikipedia research called Wikipedia – nørdernes sejr over vandalerne? (Wikipedia – victory of the nerds over the vandals?). I translated parts of it to English and extended it but soon found out that if it was going to be exhaustive I would be into a megaproject. In March 2011 I put the first version of working draft on the web and if you look into the present appendix at page 50 you will see that there are loads of references that I only have listed and noted yet examined.

There are a few others that attempts a Wikipedia review. Chitu Okoli, Mohamad Mehdi, Arto Lanamäki and Mostafa Mesgari are now making a major effort and in March 2011 called out for additional scholarly works to the list of over 2,100 peer-reviewed studies they found so far.

Lists of scholarly works have been collaboratively built on the Wikipedia/Wikimedia sites. The primary list seems to be Wikipedia:Academic studies of Wikipedia.

The four researchers also began to collect pointers to Wikipedia tools and datasets with several other adding to the list, e.g., Torsten Zesch, Andrew Krizhanovsky and Paolo Massa.

I have now added many of the references to the Wikipedia page on the Brede Wiki. There is a fair number of data sets, tools and papers. Although I consider myself to be reasonably up-to-date regarding Wikipedia research I was surprised by the many tools and data sets available, for example:

  • WikiTeam is a project that archive wikis (not just Wikipedias). And there are many wiki, see WikiIndex.
  • Trending Topics is a Web site that apparently tracks trends based on Wikipedia page views. Strangely, the Reactive oxygen species article have been on the top of the “Rising (Last 24 hours)” list while I have observing it. If you search Google News with the article title you don’t get very much, just a Nature article with the title TLR signalling augments macrophage bactericidal activity through mitochondrial ROS. Looking up Henrik’s Wikipedia article traffic statistics for Reactive oxygen species in April 2011 you will see little activity beyond “the usual” approximately 800 page views, so I don’t understand the Trending Topics statistics.
  • StatMediaWiki can give an overview in plots of the editing activity. After an email to the developer and a quick response I got the program running with the commands:
    svn checkout
    python statmediawiki/trunk/ --outputdir="/home/fnielsen/tmp" --sitename="Brede Wiki" --siteurl="" --dbname=wikidb

    Unfortunately, StatsMediaWiki seems to be somewhat slow. Working on the Brede Wiki with 3,300 total pages I am over 6 hours of CPU time for generation of a statistics report and it is not yet finished. One of the plots the program generates is the ‘global content evolution in Brede Wiki’ which shows the number of bytes in the Brede Wiki as a function of time since it was begun in the January 2009. This is the plot you see here.

One thought on “Wikipedia review of papers, tools and datasets

    Erkan_Yilmaz said:
    August 18, 2011 at 11:17 pm

    – the classic version (StatMediaWiki 1.1) did not work for me- so I used the interactive version 0.1.3:it took 50 mins to read 1123 pages, most of the features not usable unfortunately :-( see my report here:

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s