Month: August 2014

A question to Wikimedia Foundation and Wikimania 2014

Posted on Updated on

An open question to Wikimedia Foundation and Wikimania 2014 and its organizing committee with Ed:

Almost everything with the Wikimania meeting in London in August 2014 went very well, people, talks, entertainment, organization, monkey, squirrel, etc. What I am confused about is what happen during my last hour at the meeting Sunday evening: After the buffet I had the experience of meeting two females, one who gave me a business card with a link to, and claimed to be behind the wiki web site. The females seemed not very old, in fact when queried one of them claimed to be 10 years old, and when queried further, she responded she had made the web site when 6 years old with a little help from a family member…

During the Wikimania meeting the documentary about Aaron Swartz, The Internet’s Own Boy, was shown. In that documentary we learned that Aaron Swartz was 12 years old when he created the Wikipedia-like site InfoBase. Thus prodigies can create wiki web sites when they are 12 years old. From that we can deduce that it is unlikely that a six year old female can produce a wiki web site. The closest explanation for my extraordinary vivid experience at Wikimania I can come up with is then that it was a hallucination.

My question is then: How do I get rid of the hallucination? Have other Wikimania participants had a similar hallucinations of meeting preteens claiming to make web sites? Or am I just getting old?

The hallucination has persisted for many days now because I still both see and feel the business card I got.

Big big data

Posted on Updated on

Big data, one of recent years’ new buzzwords, has now gotten itself a book with said title. Mayer-Schönberger and Kenneth Cukier’s “Big data: a revolution that will transform how we live, work and think” focuses mostly on what businesses can do with big data, and you ain’t gonna find no much material as a technological-oriented data scientist. The book is from 2013 and already seems dated in the light of the Snowden revelations. The authors critique of personal big data collection does not mention the dragnet operations of signal intelligence agencies besides an 8-line William Binney-paragraph.

The authors claim three features of big data (“three major shifts of mindset”): “More”, messy and correlation rather than causality. I am not entirely convinced that these features distinguish big data. Interventional A/B-testing seems at least to some degree to probe causality rather than just correlation. Such tests are continuously done by major Internet companies on unsuspecting users on large scale. Thus I would say big data processing is indeed probing causality. I neither agree that the big data is more messy than old-time small data. Anyone working seriously with small data may easily find the handling of such data can be a considerable headache and require some processing and ‘understanding’. Indeed big data technologies have brought us means for handling messy data in a more structured way (JSON, NoSQL, Semantic Web, Wikidata). The reason why small data may feel less messy could be because the clean-up of small data can be done manually in a spreadsheet by a non-programmer, while for big data you need automatic tools and probably a programmer.

The authors also claim that we will see a rise in the profession called ‘the algorithmist’ whose job it will be to review algorithms. I do not think this is likely. The closest will probably get is the Google advisor board on the ‘right to be forgotten’.

The authors also fail to give us a proper critique of big data hype: Their initial example on Google Flu Trends is dated: A publication from March 2014 shows a wrong flu prevalence estimation from Google Flu Trends (see ‘The Parable of Google Flu: Traps in Big Data Analysis’). The Zeo EEG big data ZEO mentioned in the book hailed back in 2013 as one of the “8 Best Sleep Tracking Apps and Devices” has run out of money, is ‘out of business’ and you won’t find a response from

While the authors tell us that companies collect vast amount of data and that “Companies may be powerful” they ensure us on page 156 that the companies “don’t have the state’s powers to coerce”. Well, yes. But the states have the ability to coerce the company to hand over any personal data. Indeed U.S. companies are coerced to hand over overseas data. Loretta A. Preska of the United States District Court told that to Microsoft. And within the U.S. PRISM program the handover is determined in secret FISA courts.


Review also available on LibraryThing.