Month: July 2010

Wikisympediamania II

Posted on

I have a previous post on WikiSym and Wikimania in Gdansk 2010 which mostly touched upon WikiSym, and here comes some notes on the Wikimania meeting. Another wikian Felipe Ortage has also reported from Wikimania and WikiSym.

Let me first tell you about the excellent sideprogram: A party, a concert and a documentary film. The party took place in the old shipyard area known from the Solidarnosc. The concert was performed by the Polish Baltic Philharmonic playing Polish music from the 20th and 21st century, with two Gershwin- Gustav Holst-like concertinos of (“The”) pianist and composer Wladyslaw Szpilman presented by his son Andrzej Szpilman. had composed a piano concertino for the event! The concert also featured some good old modern music by Witold Lutoslawski.

The next evening we got a full-feature documentary: Truth in numbers? Everything, according to Wikipedia. It had been in production for several years. My impression of it when I first heard about the project on the web was that it was a community effort. Apparently, it should have been community funded too. Well. It comes out more as the sole work of directors Scott Glosserman and Nic Hill and has the touch more of a “standard” documentary with a clear focus. This focus is chosen to be the drama between youthful enthusiasts with evangelist Jimbo Wales on the one side and the white old crumpy experts, such as self-installed anti-Wikipedia king Andrew Keen, on the other side. I believe you got to make a ‘focus’ shortcut if you want ‘ordinary’ people to get it, but I think wikipedians in the audience weren’t completely captivated by that approach. During the question/answer session after viewing a witty lama expressed doubt (if I remember correctly). Through his work with facilitating meetings with wikipedians and British Museum staff he has seen that expertice and Wikipedia go well together.

The budget seems to have grown so much (a lot of travel expenses) that the directors need to get money back, and there are not yet releasing it for Creative Commons copying, but would like to sell DVDs. There are free leftover clips on the movie homepage.

Ubuntu is incredibly easy: Upgrading and Ubuntu Studio

Posted on

“Incredibly easy process” we are told. You know when such wording are used that there is something hmmm.

And oh yes, I was just about to update my old Ubuntu LTS to a newer LTS (the 10.04 edition) on my work laptop that has now become stationary due to the Danish multimedia tax. Unfortunately when I got the computer it was partitioned in a way that it often recommended, but which I several times have found not so optimal: One partition for the system, one partition for the home directory, one partition for dual booting Microsoft Windows, one swap partition, and one partition that is just there at the beginning of the harddisk. Having just one big partition for the entire system (excluding the swap) I find the best since I invariable run into systems and home directories that grows. My system directory has no more than around 200MB left so it will be difficult to update directly. I still have many gigabytes left on the home directory, but with the present partitioning it is difficult to use. Instead I imaging installing a new system on the erased Windows partition.

I already had a USB stick. The USB stick is partitioned with a live distribution on one partition and a partition for data on another partition, ??? but unfortunately with an old Karmic (9.10) Ubuntu. So the idea was to download a new ISO image and put it on the USB disk: This is incredible easy!

So I went like this:

  1. I look for “Create a USB startup disk” in the KDE menu. Can’t find it, and wonder if it only shows up in Gnome.
  2. Log into Gnome and see that it is neither there.
  3. Search the internet to find which program and package is behind “Create a USB startup disk”: It is called “usb-creator”.
  4. Seeing that it is not there with “aptitude search usb-creator”
  5. Looking at the Ubuntu package site for usb-creator. Apparently in “hardy-backports”. So does that mean that the Ubuntu version I have is too old to have that program?
  6. Attempted something like sudo mount -o loop ubuntu-10.04-desktop-i386.iso …
  7. Giving up, and finding another computer which already have a recent version of Ubuntu.
  8. usb-creator-gtk reports that there is not enough space on the USB disk.
  9. Starts to erase.
  10. Discovering that the usb-creator-gtk creator erased not only the partition with the old Ubuntu installation, but also the partition with my data files. Wow, what a blunder!
  11. Booting with the USB stick on the laptop computer at 17:20.
  12. Not entirely sure which partition to select. I select the choice with “Brug det st??rste sammenh??ngende ledige omr??de p?? disken” (use the largest continuous free space on the disk).
  13. Starting installation of Ubuntu 10.04 at 17:30.
  14. 17:43: Booting from computer (that was quick). Wrong display resolution. My old home directory is not mounted.
  15. Fixing the home directory. Getting error message related to .Xmodmap and “Could not update ICEauthority”. Problem with my user id.
  16. 18:43: Installed Emacs and Latex. Installed “brightside” for edge flipping but it does not help. Find out that it is actually compiz that may take care of that so “sudo aptitude install compizconfig-settings-manager”. After that the “System” menu has a “CompizConfig Settings Manager” item and this manager has a “Desktop wall” item where the edge flipping is set.
  17. My user name did not show up in the Gnome login screen. Fixed user id problem with the use of magic in /etc/login.defs: Set the UID_MIN to a lower value corresponding to the user id.

So the update was reasonable, but I don’t find it incredible easy. I have my problems with installation of Ubuntu on another laptop. This time it went a bit smoother. There are still minor annoying problems that can take ridiculous time, e.g., the problem of “burning” a USB stick, determining how to set the edge flip.

The installation on my other laptop, an Acer Aspire One N450, still has issues: Plugging in an external screens might blacken both screens, wireless is shaky – it sometimes falls of the connection and Eduroam is particularly a hassle, Skype has a funny the-balance-needs-to-be-on-the-left-for-the-microphone-to-work bug, closing the lid for sleeping does not necessarily mean that it wakes up nicely again.

And did I mention Ubuntu Studio? I am trying to get music production working on the small netbook and that requires considerable effort. I am lost somewhere between real-time kernels, alsa, jack, pulseaudio and the all the rest. Here and there I can get things to work.

I like the ZynAddSubFX and the thick sounds you can get from it, but that software synthesizer program seems not a go well together with Jack. I have managed to construct a bit with the nice looping program SooperLooper. On my present Myspace account “SG3” is with SooperLooper and “Forsigtig” with ZynAddSubFx. SooperLooper works with Jack and I believe I used effects from Jack Rack for the SG3 piece.

I would also like to get started with multitracking audio and midi. There are several programs for that in the Ubuntu distribution, and perhaps I will some day get used to them. Today I have run into a funny Ubuntu Ardour mute problem, and Ardour-just-lost-the-connection-in-the-jack-control problem, and why-is-there-no-input-from-the-microphone problem with magic necessary in the alsamixer.

My tinwhistle does not suffer from these problems.

Open Access is bad, bad, bad

Posted on Updated on


Open Access is bad, bad, bad. At least so says J??rgen Burchardt, researcher and chairman of “Danske Videnskabsredakt??rer” (Danish Science Editors), about the threat of Open Access on popular science. In the Danish popular science magazine Aktuel Naturvidenskab he calls Open Access “taxpayer-paid ideological experiments” and declares that Open Access will mark the end of popular science when journals and magazines loose their income through direct sale and subscriptions. It is a repetition of his previous writings about general science publishing.

Burchardt is reacting on the Librarian Lobby and their propaganda in a recent Danish report by the Open Access Committee which did not include members from the publishing industry. Burchardt is on the publishing industry side. He has been editor on the Danish scholarly journal Tidsskrift for Arbejdsliv which sells at 175 Danish Kroner per issue. I must admit I never heard of that journal before.

One strong argument for Open Access is that if a taxpayer pay for a scientist then the work of the scientist should be directly accessible to the taxpayer. It’s how it works in the US on a federal level. It means that, e.g., works of NIH and NASA are in the public domain, such as the famous photo Apollo 8 Earthrise.

Properly Open Access licensed science articles may be reused in a number
of ways, e.g., translated, aggregated in course material and included in a
wiki with wikilinks and semantic markup, ??? see an example in my Brede Wiki.

Many of the arguments against Open Access have been countered by Open Access publisher BioMed Central in their (Mis)Leading Open Access Myths (thanks Iain Hrynaszkiewicz for the pointer). These pages argue on the matter in relation to the discussion that has taken place in the United Kingdom.

My own experience in regard to “Myth 2” (Access is not a problem – virtually all [] researchers have access they need): I sometimes find scientific articles difficult to get: I perform research in an interdisciplinary area with ties to medicine and business. Journals in these areas may not always be available from the library at my technical university.

One thing that bothers me with Open Access is: Why do Open Access “Article Processing Charges” have to be so big? (e.g., 1800 Euros) I have published in the electronic journal First Monday and if I was less incompetent I might publish in Journal of Machine Learning Research. Their articles are open accessible and they have no article processing charges. If you want to have the printed books of Journal of Machine Learning Research you pay. One of my latest conference articles is archived in CEUR Workshop Proceedings, ??? openly accessible and with no apparent cost to me (a paid for registering to the conference). One must remember the big costs paid by the author to the publisher also occurs in the ‘ordinary’ subscription publishing model: Fees for color pages and page excess is prevalent and may be quite big.

The Danish librarians have got the Australian Professor John Houghton to make cost-benefit analysis on the publishing models. In the report (Danish summary) he estimates the production cost of a journal article to 125’000 Danish Kroner. Most of that cost is related to the writing. He finds (perhaps not surprising) that the Open Access publishing model is the cheaper. Well perhaps.

The publishing companies have got a lot of extra articles to publish in recent years (I know the journal NeuroImage has grown inches thick now). The modern researchers have word processors, email and Internet-browsers so writing and submitting articles have become way easier. Publishers also need to work on IT-infrastructure to support print and electronic dual-mode publishing and scanning old issues. These issues have probably led to a lot of extra cost for the publishers. Still the criticism is that the cost of subscription for the libraries has risen beyond the cost for the publisher. Back in 2001 Guardian wrote:

Last year the most powerful journal publisher, the Anglo-Dutch firm Reed Elsevier, made a profit of ??252m on a turnover of ??693m in its science and medical business.

A section in the Wikipedia “Elsevier” article also gives some pointers on the controversy. It is really an unfortunate situation researchers, universities and libraries have gotten into, ??? an asymmetric situation that would make an economist cry. Even if Danes opt in on Open Access the libraries still need to subscribe to subscription journals, since Danish researchers rely on access to what other researchers publish. But one would imaging that the threat of Open Access would weigh in when Danish libraries negotiate subscription fees with the multinational subscription-based publishers.

I see Open Access as part of the larger notion of the free culture movement. The opposite stance may be characterized by a quotation from John Jarvis, Mangaging Director of Wiley Europe (one of the big
subscription-based publishers) regarding patients reading scientific information:

Without being pejorative or elitist, I think that is an issue that we should think about very, very carefully, because there are very few members of the public, and very few people in this room, who would want to read some of this scientific information, and in fact draw wrong conclusions from it […] Speak to people in the medical profession, and they will say the last thing they want are people who may have illnesses reading this information, marching into surgeries and asking things. We need to be careful with this very, very high-level information.

The free culture movement sails under the Jimmy Wales Wikipedia quotation:
“Imagine a world in which every single person on the planet is given free access to the sum of all human knowledge.” The public should be en
lightened, not roam in darkness. These ideas have been likened to the Folk high school movement.

I also see Open Access as part of Open Science. Open Access of scientific journals is just one part of that issue. Open access to the methods of science and open access to the data is a issue that is being discussed more and more. I often find inadequate texual description present in articles of its experimental setup and its results, e.g., in human brain mapping the result is often a large set of estimates for several 100’000 of measurements in the brain. In journal articles we only get a small table with a few selected measurements reported. Journal articles are simply not suitable to report all the interesting results that comes out of a modern scientific experiment. I believe open science databases is our solution. One of my recent reports describes such as system for personality genetics, where data, computational methods and results are openly available from a Web-based platform. As it was a contribution for the WikiSym 2010 meeting I signed off the copyright to ACM and the article is now published by them. It is available in their digital library. You can purchase the article for 15 American Dollars. Yeah.

Hot or not or what: Data mining attractiveness

Posted on Updated on


From the media we hear that women are most attractive at 31. That fact is based on an “poll of 2,000 men and women, commissioned by the shopping channel QVC to celebrate its Beauty Month.” So this is a kind of science that is part of a media effort of a company. We also see such use of science in neuromarketing research. However, in this case the results are likely to be reasonably ok.

The web site Hot or Not has according to Wikipedia both been an inspiration for YouTube and Facebook. The site allows you to rate men and women based on their uploaded photo.

Back in 2009 I became aware of Hot or Not in a nerdish way: The computer programming book Programming Collective Intelligence uses the site as a real-life example for prediction based on annotation in the social web. Hot or Not has an API, so you can get some data from the site. You need an API key, and last time I checked you couldn’t obtain new keys, but I could use the one given in the book.

So I started to download data. You don’t get the individual ratings but the average rating for each person as well as a bit of demographics, e.g., the age. So there is really not so much you can do. The programming book try to predict the rating based on gender, age and location (US state).

I tried to see how the rating varied with age. I managed to make a plot of a sample of men and women from Hot or Not, and the result somewhat surprised me. I was expecting a decay in rating for women and men as a function of age, with around 31 years as a good candidate for maximum rating. However when I look on the ratings for women there is very little decay, in fact if you fit a second order polymonium you actually see a slight rise for older women. With unscrupulous extrapolation you would say that 100-year old women are maximum attractive. Men have the ‘correct’ decay with a highest rating somewhere around 30 or before. But there is considerably variance within year compared to the average between years.

One explanation for the effect seen among women is that only beautiful older ladies would “dare” to upload their image, while ugly young women are not afraid. There is also the possibility that we really cannot trust the average ratings reported to us by Hot or Not. I have got an account myself and uploaded an image. Presently I got a rating on 7.7 based on 206 people (the scale goes from 1 to 10). Hot or Not reports that I am “hotter than 74% of men on this site!”. When I compare 7.7 with the data I can download the percentage does not fit: Around 90% of males score higher than my 7.7. Yet another possibility is that the way I call the Hot or Not API does not give a fair sample of the people actually in the Hot or Not database.

Hot or Not data has been used in a few scientific reports, see, e.g., Economic principles motivating social attention in humans that made their own ratings and If I’m Not Hot, Are You Hot or Not? that has employees on the author list and thereby gained access to its unique data.

NemID: Danes get difficult easy new login system

Posted on

So in Denmark we have got a new authentication system: NemID. As employed in the state through the university I get my salary information in an e-box. Logging into such a system is one of the uses of NemID. Netbanking also opts in. Entry of taxation information. Quite a lot. Premiered in the beginning of 2010 July 3 million Danes are suppose to get NemID in the next half a year [1]. NemID means ‘easy identifier’ and it makes a promise for an easy system, and according to the Minister of Research Charlotte Sahl-Madsen it “gives the Danes one secure code, which can be used by all to almost everything everywhere”. I wouldn’t bet my right hand on that, though it seems fine in some aspects. Sahl-Madsen mentions that the system has been usability tested on elderly and a handful of young students. Furthermore an organization of the blind has been involved in the testing.

For the user NemID consists of a user-id, e.g., our Danish personal registration number – the CPR number (what the Americans call the Social Security number), one self-selected password and a paper card with one-time codes (a one-time pad). You login with all three items. Behind the scene is a centralized keystore.

The good news is that the paper one-time codes make it difficult for an attacker to use your credentials if he ‘only’ has control over your hacked computer or your hacked smartphone. He would usually need physical access to the paper card to make his attack complete. One-time codes are difficult to break. There 148 one-time numbers on the card.

And now for the bad news:

  1. The first day of operation a bottleneck arose since the company, CSC, taking care of the CPR numbers had not enough capacity to follow the requests from DanID (the company operating the NemID system)! It basically meant a denial of service like situation, and since police and hospitals also use the CPR system they too were affected.
  2. In the first days of operation the web-based system greeted users with the message “The security is compromised” when emailing the support staff. At one point the DanID didn’t know why the error message was triggered!
  3. Two weeks later an error in a certificat resulted in half a day where new NemID could not be ordered.
  4. One commentor on the Ingeni??ren web site noted that there seems to be no difference in case handling for the password. Thus “pASsWoRd” and “password” would be the same, quite reducing the number of possibilities a brute-force attacker needs to go through. The password can be as short as 6 characters and cannot have special characters.
  5. CPR numbers are supposed not to be disclosed in public. If you use the CPR number as user-id in the NemID login interface the number is displayed unhidden.
  6. You can write on the paper card with the one-time codes. If a forgetful user writes the password on the card and the wallet is stolen, then the thief has straight access. The CPR number is available in the drivers license and with the two other codes he should be able to login from anywhere in the world.
  7. It is easy to copy the one-time code card, e.g., with an ordinary copy machine. I did that, but it took me quite a while to destroy the copy. Copy machines prevents color copies of banknotes but not NemID. An attacker may copy or take a photo of your card without you knowing it if he has a brief access to it. Then he mostly only needs to work on your self-selected password. And passwords may be easy to guess, e.g., Obama Twitter account were guessed from public information. A hardware security token would have been more difficult to copy. A high-resolution surveillance camera together with a keylogger at an internet cafe should also allow a hacker full entry.
  8. I ordered my NemID right away, but it took some days before the letter with NemID activation information came. For security reasons two separate letters are sent on separate days. However I was away on meetings, WikiSym and Wikimania (a report on that is available here) and the two letters lay there in my mailbox for several days. A mailbox thief would have gotten my CPR number (on my bank statement), the one-time code card and the initial code number to start the activation process.
  9. Several commentors have pointed to the strange use of paper instead of hardware security token. When the 148 one-time codes have been used a new card needs to be given to the user. Postal service will be happy. I should think that the paper card is only a temporary solution. And yes it is. Apparently, the paper card is there because it is a reasonable easy way for people to accept the system. It is planned that users later can buy systems such as a token. So then you need to pay to log efficiently into required government systems. DanID is owned by PBS, ??? a company owned by Danish banks.

The introduction of NemID has been supported by some propaganda/advertisements. See this video where NemID is likened to the Opera House in Sydney. Like the Opera House NemID has also been delayed.

So far I have obtained my NemID. I have been able to login to the e-box, but I failed to login to the netbanking so far. Halleluja.


Posted on Updated on


I am back from wiki and Wikipedia meetings WikiSym 2010 and Wikimania 2010 in Gdansk (Danzig). Premeeting issues with registration and accommodation gave some indications of problems in the organization: Unsure of the state of the accommodation ??? a dorm ??? I took sleeping bag and sleeping pad with me and considered bringing a tent. When I came it mostly went great and smoothly. Especially Wikimania was one of the best meetings I have participated in. Other Wikimania participants were also generally happy, although there was a few notes, e.g., a few missed double-sided badges. :-)

The academic symposium WikiSym was scheduled before the community meeting Wikimania. One notable part of WikiSym that I like is the Open-space meetings, which is sort of a “structured coffee break” and citing Wikipedia it is where “individuals participate without prior groupings or agendas, and that they accept the agendas and groupings that arise from the meeting process, with only minimal restrictions on scope.” The Open Spaces I participated in were about “wiki and programming” as well as two meetings on data and structured data in wikis. The latter topic being a continous challenge in wiki-research. There are a few notes for the wiki and programming session on a page on the WikiSym wiki. The issues were larger than I initially thought. Other wiki engines than MediaWiki already have programming capabilities, but “programming and wikis” is a topic that still could need some more research and development.

Among other WikiSym presentations I heard Andrew West‘s demonstration of his STiki multilingual anti-vandalism system, that acts as a supplement to bots such as ClueBot. The system is a client server model where machine learning is used to find patterns in vandalism edits with automated labels from rollbacks and the predications are based on revision metadata. Wikipedians can download an associated Java client and go on to fight vandalims.

Another presentation was from Daniel Kinzler about his WikiPics a crosslingual image search engine. The Woogle4MediaWiki presentation showed an extension to MediaWiki that would combine search results with wiki pages. Another demo showed GravPad ??? a realtime collaborative editor.

I had a poster about a fielded wiki, a sort of online spreadsheet with revision control that combine wiki-style easy entry with numerical computation and visualization, and which I apply in personality genetics. The PDF of the poster is available.

The third day of WikiSym was overlapping with Wikimania, which I think diluted WikiSym somewhat. I skipped WikiSym class and joined Wikimania for most of the day like quite a number of other WikiSym participants. The only thing I heard that day at WikiSym was Andrew Lih‘s keynote on Wikipedia. One thing I remember from the talk was the statistics on how many people press save after they have pressed the edit button on a wiki page. The ratio is quite low, and the explanation is that potential editors get scared away by the nerdish wiki markup. Wikia with WYSIWYG editor gets a better ratio (if I remember Andrew’s statistics correctly).

WikiSym is supposed to be about wikis in general (the WikiSym Web site runs with the Tiki Wiki engine). WikiSym 2010 had a quite strong focus on the MediaWiki engine and Wikipedia, ??? for good and for worse.