POLITICIAN! Occupation as politician is not very frequent among people in the Panama Papers. This may come as a surprise to those who had studied a bubble chart put in a post on my blog. A sizeable portion of blog readers, tweeters and probably also Facebook users seem to have seriously misunderstood it. The crucial problem with the chart is that it is made from data in Wikidata, which only contains a very limited selection of persons from the Panama Papers. Let me tell you some background and detail the problem:
- Open Knowledge Foundation Danmark hosted a 2-hours meetup in Cafe Nutid organized by Niels Erik Kaaber Rasmussen the day after the release of the Panama Papers. We were around 10 data nerds sitting with our laptops and with the provided links most if not all started downloading the Panama Papers data files with the names and company information. Some tried installing the Neo4J database which may help querying the data.
- I originally spend most of my time at the cafe looking through the data by simple means. I used something like “egrep -i denmark’ on the officers.csv file. This quick command will likely pull out most of the Danish people in the release Panama Papers. The result of the command is a small manageable list of not more than 70 listings. Among the names I recognized NO politician, neither Danish nor international.
- The Danish broadcasting company DR has had a priority access to the data. It is likely they have examined the more complete data in detail. It is also likely that if there had been a Danish politician in the Panama Papers DR would have focused on that, breaking the story. NO such story came.. Thus I think that it is unlikely that there is any Danish politicians in the more complete Panama Papers dataset.
- Among the Danish listings in the officers.csv file from the released Panama Papers we found a couple of recognizable names. Among them was the name Knud Foldschack. Already Monday, the day of the release, a Danish newssite had run a media story about that name. One Knud Foldschack is a lawyer who has involved himself in cases for leftwing causes. Having such a lawyer mentioned in the Panama Papers was a too-good-to-be-true media story, – and it was. It turned out that Knud Foldschack had no less than both a father and a brother with the same name, and the newssite now may look forward to meet one of the Foldschacks in court as he wants compensation for being wrongly smeared. His brother seems to be some sort of business man. René Bruun Lauritsen is another name within the Danish part of the Panama Papers. A person bearing that name has had unfavourable mentioning in Danish media. One of the stories was his scheme of selling semen to women in need of a pregnancy. His unauthorized handling of semen with hand delivery got him a bit of a sentence. Another scheme involved outrageous stock trading. Whether Panama-Lauritsen is the same as Semen-Lauritsen I do not know, but one would be disappointed if such an unethical businessman was not in the Panama Papers. A third name shares a fairly unique name with a Danish artist. To my knowledge Danish media had not run any story on that name. But the overall conclusion of the small sample investigated, is that politicians are not present, but names may be related to business persons and possibly an artist.
- Wikidata is a site in the Wikipedia family of sites. Though not well-known, the Wikidata site is one of the most interesting projects related to Wikipedia and in terms of main namespace pages far larger than the English Wikipedia. Wikidata may be characterized as the structured cousin of WIkipedia. Rather than edit in free-form natural language as you do in Wikipedia, in Wikidata you only edit in predefined fields. Several thousand types of fields exist. To describe a person you may use fields such as date of birth, occupation, authority identifiers, such as VIAF, homepage and sex/gender.
- So what is in Wikidata? Items corresponding to almost all Wikipedia articles appear in Wikidata – not just the articles in the English Wikipedia, but also for every language version of Wikipedia. Apart from these items which can be linked to WIkipedia articles, Wikidata also has a considerable number of other items. For instance, one Dutch user has created items for a great number of paintings for the National Gallery of Denmark, – painting which for the most part have no Wikipedia article in any language. Although Wikidata records an impressive number of items, it does not record everything. The number of persons in Wikidata is only 3276363 at the time of writing and rarely includes persons that hasn’t made his/her mark in media. The typical listing in the Panama Papers is a relative unknown man. He will unlikely appear in Wikidata. And no one adds such a person just because s/he is listed in the Panama Papers. Obviously Wikidata has an extraordinary bias against famous persons: politicians, nobility, sports people, artists, performers of any kind, etc.
- Items for persons in Wikidata who also appear in the Panama Papers can indicate a link to the Panama Papers. There is no dedicated way to do this but the ‘key event’ property has been used for that. It is apparently noted Wikimedian Gerard Meijssen who has made most of these edits. How complete it is with respect to persons in Wikidata I do not know, but Meijssen also added two Danish football players who I believe where only mentioned in Danish media. He could have relied on the English Wikipedia which had a overview of Panama Paper-listed people.
- When we have data in Wikidata, there are various ways to query the data and present them. One way use wiki whizkid Magnus Manske’s Listeria service with a query on any Wikipedia. Manske’s tool automagically builds a table with information. Wikimedia Danmark chairman Ole Palnatoke Andersen apparently had discovered Meijssen’s work on Wikidata, and Palnatoke used Manske’s tool to make a table with all people in Wikidata marked with the ‘key event’ “Panama Papers”. It only generates a fairly small list as not that many people in Wikidata are actually linked to the Panama Papers. Palnatoke also let Manske’s tool show the occupation for each person.
- Back to the Open Knowledge Foundation meeting in Copenhagen Tuesday evening: I was a bit disappointed not being able to data mine any useful information from the Panama Papers dataset. So after becoming aware of Palnatoke’s table I grabbed (stole) his query statement and modified to count the number of occupations. Wikimedia Foundation – the organization that hosts Wikipedia and Wikidata – has setup a so-called SPARQL endpoint and associated graphical interface. It allows any Web user to make powerful queries across all of Wikidata’s many millions of statements, including the limited number of statements about Panama Papers. The service is under continuous development and has in the past been somewhat unstable, but nevertheless is a very interesting service. Frontend developer Jonas Kress has in 2016 implemented several ways to display the query result. Initially it was just a plain table view, but now features results on a map – if any geocoordinates are along in the query result – and a bubble chart if there is any numerical data in the query result. Other later implemented forms of output results are timelines, multiview and networks. Making a bubble chart with counts of occupations with the SPARQL service is nothing more than a couple of lines of commands in the SPARQL language, and a push on the “Run” button. So the Panama Papers occupation bubble chart should rather be seen as a demonstration of capabilities of Wikidata and its associated services for quick queries and visualizations rather than a faithful representation of occupation of people mentioned in the released Panama Papers.
- A sizeable portion of people misunderstood the plot and regarded it as evidence of the dark deeds of politicians. Rather than a good understanding of the technical details of Wikidata, people used their preconceived opinions about politicians to interpret the bubble chart. They were helped along the way by, in my opinion, misleading title (“Panama Papers bubble chart shows politicians are most mentioned in document leak database”) and incomplete explanation in an article of The Independent. On the other hand, Le Monde had a good critical article.
- I believe my own blog where I published the plot was not to blame. It does include a SPARQL command so any knowledgeable person can see and modify the results himself/herself. Perhaps some people were confused of my blog describing me as a researcher, – and thought that this was a research result on the Panama Papers.
- My blog has in its several years of existence had 20,000 views. The single post with the Panama Papers bubble chart yielded a 10 fold increase in the number of views over the course of a few days, – my first experience with a viral post. Most referrals were from Facebook. The referral does not indicate which page on Facebook it comes from, so it is impossible to join the discussion and clarify any misunderstanding. A portion of referrals also came from Twitter and Reddit where I joined the discussion. Also social media users using the WordPress comment feature on my blog I tried to engage. On Reddit I felt a good response while for Facebook I felt it was irresponsible. Facebook boosts misconceptions and does not let me join the discussion and engage to correct any misconceptions.
- Is there anything I could have done? I could have erased my two tweets and modified my blog post introducing a warning with a stronger explanation.
Summing up my experience with the release of the Panama Papers and the subsequent viral post, I find that our politicians show not to be corrupt and do not deal with shady companies – except for a few cases. Rather it seems that loads of people had preconceived opinions about their politicians and they are willing to spread their ill-founded beliefs to the rest of the world. They have little technical understand and does not question data provenance. The problems may be augmented by Facebook.
And here is the now infamous plot:
So Posterous has been acquired by Twitter. Great. And Posterous Spaces will remain up and running without disruption. Great.
“Twitter says that it will give users “ample notice” if it is going to make any changes to the service. We’ll take them at their word on this one, but if I was someone running a personal blog on Posterous, I would think about finding another place to host it soon.”
“So, in other words, Posterous will be available to you now, but we’ll let you know if we plan on shutting it down. That must be a fairly likely scenario to warrant that language being included in the initial announcement of the acquisition.”