Month: April 2011

Michael Nielsen talks about Open Science

Posted on

Via two independent mailing lists I have now received two references to a TEDxWaterloo talk by Michael Nielsen about Open Science, so I guess it should be interesting?

The talk is commented on The Guardian and available on YouTube.

As a successful example on Open Science Michael Nielsen points to Timothy Gowersss blog and his Polymath project spawned from his question Is massively collaborative mathematics possible?.

Micheal Nielsen points to Qwiki as a “failure” (@4:01) of Open Science. It is a wiki for quantum computing started in 2005 and now “essential dead” according to Michael Nielsen (@5:36). Qwiki has 994 content pages, while my Brede Wiki started in 2009 has 1,813 content pages. So am I approximately a double failure?

While not related to Michael Nielsen biologically-wise (not that I know of) I may be related in terms of the visions for Open Science. I do see some problems though.

  1. Michael Nielsen states “It is my believe that any publicly funded science should be Open Science.” (@13:15). In publicly funded strategic research researchers may be ask to consider the possibility for commercial application and patents in their grant application. How can such considerations be accomodated with Open Science? A nation funding research would like the fruits of the research benefit the nation rather than some company based in a foreign country.
  2. While programming and mathematics may be entirely done collaboratively – because the “proof” is ready at hand, sciences that rely on experiments and observations need to have a researcher that vouches that the experimental setup and measurement were done correctly.
  3. In science involving human subjects there is the issue of privacy. With modern scientific instruments, brain scanners and genome testing devises, you get personal identifiable data. I believe such data should have restricted circulation. (Although this hasn’t stopped the fMRI Data Center from distributing brain scans after removing the face from the brain scan).
  4. It is still not clear (to me) why we choose to collaborate or not, e.g., why is Wikipedia successful and Qwiki not?

Wikipedia review of papers, tools and datasets

Posted on Updated on

Global_content_evolution

In April 2008 I wrote a small Danish overview of Wikipedia research called Wikipedia – nørdernes sejr over vandalerne? (Wikipedia – victory of the nerds over the vandals?). I translated parts of it to English and extended it but soon found out that if it was going to be exhaustive I would be into a megaproject. In March 2011 I put the first version of working draft on the web and if you look into the present appendix at page 50 you will see that there are loads of references that I only have listed and noted yet examined.

There are a few others that attempts a Wikipedia review. Chitu Okoli, Mohamad Mehdi, Arto Lanamäki and Mostafa Mesgari are now making a major effort and in March 2011 called out for additional scholarly works to the list of over 2,100 peer-reviewed studies they found so far.

Lists of scholarly works have been collaboratively built on the Wikipedia/Wikimedia sites. The primary list seems to be Wikipedia:Academic studies of Wikipedia.

The four researchers also began to collect pointers to Wikipedia tools and datasets with several other adding to the list, e.g., Torsten Zesch, Andrew Krizhanovsky and Paolo Massa.

I have now added many of the references to the Wikipedia page on the Brede Wiki. There is a fair number of data sets, tools and papers. Although I consider myself to be reasonably up-to-date regarding Wikipedia research I was surprised by the many tools and data sets available, for example:

  • WikiTeam is a project that archive wikis (not just Wikipedias). And there are many wiki, see WikiIndex.
  • Trending Topics is a Web site that apparently tracks trends based on Wikipedia page views. Strangely, the Reactive oxygen species article have been on the top of the “Rising (Last 24 hours)” list while I have observing it. If you search Google News with the article title you don’t get very much, just a Nature article with the title TLR signalling augments macrophage bactericidal activity through mitochondrial ROS. Looking up Henrik’s Wikipedia article traffic statistics for Reactive oxygen species in April 2011 you will see little activity beyond “the usual” approximately 800 page views, so I don’t understand the Trending Topics statistics.
  • StatMediaWiki can give an overview in plots of the editing activity. After an email to the developer and a quick response I got the program running with the commands:
    svn checkout https://forja.rediris.es/svn/statmediawiki
    python statmediawiki/trunk/smw.py --outputdir="/home/fnielsen/tmp" --sitename="Brede Wiki" --siteurl="http://neuro.imm.dtu.dk/wiki" --dbname=wikidb

    Unfortunately, StatsMediaWiki seems to be somewhat slow. Working on the Brede Wiki with 3,300 total pages I am over 6 hours of CPU time for generation of a statistics report and it is not yet finished. One of the plots the program generates is the ‘global content evolution in Brede Wiki’ which shows the number of bytes in the Brede Wiki as a function of time since it was begun in the January 2009. This is the plot you see here.

Danish crime drama – good and bad

Posted on

Danish TV crime drama is receiving a great number of accolades these years. Ingrid Dahl, Sarah Lund (in the US remake The Killing called Sarah Linden), Hallgrim Ørn Hallgrimsson, Rasmus, Jasmina and Jonas have been nominated to or received an Emmy award.

The Icelandic sweater from the Faroe Islands, Sarah Lund, subtitled on British BBC4 has been called brilliant, amazing, most intensely thrilling televison drama experience in British broadcasting of the moment, a diamond of a series – complex, dramatic, thoroughly gripping and the best thing on TV right now (in February 2011).

Before the modern Danish crime drama we had “En by i Provinsen” from 1977 which was quite good. But then we had the “Rejseholdet”, i.e., the first “Rejseholdet” from 1983. The series with Ingrid Dahl from 2000 was in Danish also called “Rejseholdet” (English: “Unit 1”) but the difference between this two series is big.

The 1983 “Rejseholdet” can be found in the Danish Broadcasting Corporation archive and is amazingly bad. Unbrilliant. On IMDb it gets a formidable 4.5 grade. Almost everything is wrong here: Script, direction, acting, sound track, the set, the lightning, the sound recording and the editing. According to IMDb it was suppose to have 18 episodes but 12 episodes were cancelled “due to overwhelming negative response from critics and viewers”. It was presented as a crime comedie and yes indeed it is funny at times, but only unintended when the lines and acting becomes hilariously bad.

The beginning of modern Danish crime drama might be the 1987 “Een gang Strømer…” (Once a cop) with effective editing, fine acting and a memorable title song Sjæl i flammer by Kasper Winding and Lars Muhl.

Of the newer crime drama sound tracks we have the Forgiveness title song from The Eagle. Also a nice one in 2000 Rejseholdet title song. Outside crime drama we have Tim Christensen’s beautiful Right Next To The Right One song from Emmy-awarded Nikolaj og Julie. Musician Frans Bak, who did the theme to Danish Forbrydelsen also did the sound track to the US remake the Killing.

Poul Thorsen: a new Milena Penkowa? II

Posted on

Yesterday (i.e., 13 April 2011 Danish time, 12 April 2011 American time) I blogged a bit about Poul Thorsen. Now today (i.e., 14 April 2011 Danish time, 13 April 2011 American time) the newest development is a Reuters message sent from Atlanta stating that Thorsen “has been indicted by a federal grand jury in Atlanta” and US authorities are seeking to extradite him from Denmark.

There has still been strangely little written about this case in Denmark, — and nothing about the new development.

In December 2010 science journalist Jens Ramskov wrote a bit about Poul Thorsen being co-author on the scientific article Parental infertility and cerebral palsy in children even though the University of Aarhus had declared:

Aarhus University will not be able to collaborate with Poul Thorsen in the future. To the extent that other parties collaborating with Aarhus University wish to draw on Poul Thorsen’s expertise, Aarhus University will only accept such collaboration if it has the purpose of securing data or protecting the interests of participating researchers and funding agencies”

Most of the information in Denmark is from a single article in Information by freelancer Sanne Maja Funch published in March 2010. Journalist Ulla Danielsen has also written about the lack of Danish media attention to the case which moneywise seems to be bigger than the Milena Penkowa case. Danielsen has written a few English articles to Age of Autism, — seemingly an anti-vaccine blog as far as I can determine.

 

(Typo correction: 14 April 2011)

Poul Thorsen: a new Milena Penkowa?

Posted on Updated on

In Denmark we have the hilarious case of the neuroscientist Milena Penkowa from the University of Copenhagen which involves embezzlement, forgery, the head of the university, allegation of scientific misconduct, personal ties to the former Minister of Science, to an employee of the Ministry of Science, inappropriate use of research funds, change of a Danish law because of anonymous questions, a documentary movie, personal ties to the documentary movie maker, and so on. I have a previous blog post on the case from February 2011.

Now we have a new case. That of Poul Thorsen. Like Penkowa he is/was an industrious researcher and he headed a large research group: North Atlantic Neuro-epidemiology Alliances (NANEA) at the University of Aarhus. I am trying to find heads and tails on this story and begun the Danish version of the Wikipedia article on Poul Thorsen. If you compared with the corresponding article on Milena Penkowa you will see that it is much smaller.

  1. Poul Thorsen gained large grants from the US Center of Disease Control to NANEA: almost 8 million American Dollars in 2000 and a renewal on over 8 millions in 2007. These are a very large sums. Apparently, he is suspected for falsifying documents from the US Center of Disease Control so University of Aarhus paid him 2 million US Dollars (believing that money to cover it would come). The university discovered this in the spring 2009. See Information.
  2. It is unclear who had the responsibility for administering the money. According to Ulla Danielsen the Danish Agency for Science, Technology and Innovation was the administrator until the University of Aarhus took over the administration in November 2009. If the university has paid Poul Thorsen before that date it seems that the agency has not done its job well.
  3. On his center in Aarhus Thorsen apparently employed a person that received a special salary support from the state. However Poul Thorsen adds some extra money on top and that is not legal, according to Information.
  4. NANEA is under the University of Aarhus and Thorsen was employed there. However, he has also been employed at the Emory University and this double employment was not approved by the University of Aarhus. See Information.
  5. In the Danish media there has been much less written about Poul Thorsen compared to Milena Penkowa. Penkowa has been keen on interviews before and after her fall from the throne, so the press has lots of interesting citations and photos and videos with her, see one here. For Thorsen there seems to be very few images (one?). I guess the difference in good picture material might be the reason for the discrepancy in press coverage. The case of Thorsen also has little juicy sex. In the case of Penkowa there was speculations around the Minister of Science and the head of the University. Though these speculations are very likely unfounded they nevertheless fueled the story that ran almost each day in February and part of March.
  6. Poul Thorsen has done research in autism and vaccine, see Thimerosal and the occurrence of autism: negative ecological evidence from Danish population-based data. Since Andrew Wakefield this area has been a minefield, where some contend that (mercury-containing) vaccination is harmful and causes autism. There are groups that are anxious about the issue. Age of Autism seems to be one. One blogger refers to this as “the anti-vaccine propaganda blog of Generation Rescue“. Age of Autism has not failed to write critical articles about Poul Thorsen. Even one of the Kennedys has written critically about the issue in Huffington Post.
  7. I think the issues around Poul Thorsen moneywise calls his scientific integrity into question. I haven’t heard much of this issue.
  8. Poul Thorsen has fairly few first author articles from recent years. The first author is usually the one with the hands in the data — and that has the possibility to falsify data. Thorsen is not first author on the Thimerosal/autism article. The collegue and medical doctor Kresten Meldgaard Madsen says that Thorsen could not change or compromise the data.
  9. According to Ulla Danielsen Poul Thorsen is charged with tax evation and the case is planned to begin 13 April 2011 (as I am writing this is today). There is a list for court cases in Aarhus. Among them is a case beginning 9:30 with the lawyer Jan Schneider. This might be the one. As far as I understand the court room was packed when Penkowa went to trial. I am not sure Poul Thorsen will attract as many, but we might hear more in the afternoon if any journalist attends.

Making sense of microposts

Posted on

Example tweet scoring. -5 has been subtracted from the AMT and ANEW score. AMT is Alan Mislove’s data scored from the Amazon Mechanical Turk.

Words
ear
infection
making
it
impossible
2
sleep
headed
2
the
doctors
2
get
new
prescription
so
fucking
early

AFINN
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
-4
0
-4
ANEW
0
-3.34
0
0
0
0
2.2
0
0
0
0
0
0
0
0
0
0
0
-1.14
GI
0
-1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0

1
OF
0
0
0
0
-1
0
0
0
0
0
0
0
0
0
0
0
0
0
-1
SS


















-2
AMT


















-3.4

Reviews R back frm the Making Sense of Microposts wrkshp where they apparently accepted my 6p position papr A new ANEW: Evaluation of a word list for sentiment analysis in microblogs. The workshop is part of the ESWC 2011 conference. For an introduction to my paper read my previous blog post.

The reviews for my article were not surprising. They stated something like: a bit low on originality and ok methodology. The editors requested an example that illustrated the difference between my approach and other approaches. In the table above I have an example with one of the very early tweets — apparently among the first 10,000. The example I looked on showed to illustrate the issues nicely. The tweeter has an ear infection and uses a couple of words with valence: “infection”, “impossible” and “fucking”. It turns out that my word list (AFINN) has only “fucking”, while General Inquirer (GI) only has “infection” and OpinionFinder (OF) only “impossible”. Interestingly OF has “infected” and “infectious”. ANEW has “infection” and also “sleep” which it sets as mostly positive. I wonder if the best way is to assemble all the word lists into one big one. The example is now in the newest version of the paper. (If you examine the paper I hope you appreciate the great amount of illegal tweaking in the LaTeX document preparation system I did to get the paper under the 6-page limit while adding the table).

One reviewer noted a disbalance between “normal” and obscene words, e.g., that “hell” (my present valence = -4) and “tits” (my present valence -2). The first word could be quite ambiguous and the other used mostly positive or neutral, e.g., “It feel good as hell outside” and “yoo her tits look like candy corns”. I somewhat agree. A better mean valence for “tits” could be +1. With “hell” I am not sure. Though ambiguously used, most frequently it is used fairly negative. Maybe -3 or -2 would be bet
ter.

One emailing researcher asked about the accuracy. It didn’t quite fit in the six pages I had available for the paper, but the confusion matrix is listed below. From that is it possible to compute the total accuracy: (277+5+299)/1000 = 58.1%. If the calculation of the accuracy is restricted to positive and negative I get: (277+299)/(277+61+123+299) = 75.8%. one-off accuracy is (1000-123-61)/1000 = 81.6%

AMT
Positive Neutral Negative
Positive 277 5 61
AFINN Neutral 157 5 67
Negative 123 6 299

The title of the accepted papers for the workshop is available. Nine out of 19 submission were accepted. They seem all quite interesting. I see there are two papers on political tweets. I look forward to hear about them, as Danish media tells us that Danish politicians are at war on Twitter. Denmark is going to have election for the “Folketing” parlament later this year and it will be the first election where we likely will see that politicians use social media seriously.

There is a Facebook group for the workshop and in the conference registration you could key in LinkedIn and Twitter accounts as well as blog address. Last time I went to the ESWC conference (in 2009) some among the participants/organizers distributed RFID tags we could wear. With data mining at the end of the conference they could determine who among the participants were spending most time together. If I remember correctly Sebastian Schaffert and his teddy bear won.

April fool fools: Gmail Motion and ultrasonography on mobil phones

Posted on

I remember that several years ago the Danish television channel DR showed a news spot by journalist Morten Hartkorn on 1 April where the story was suspiciously strange. The clever April’s fool joke here was that the story was actually real and (if I remember correctly) a TV critics fell for the joke and thought it was an April’s fool joke (to the schadenfreude of other TV critics).

For the April’s fool joke of the year 2011 we have heard of Google Motion, a system that let a user interact with an email program by gestures and postures in front of the computer. The cleverness in Google’s joke may be that the knew that this was likely that people (nerds) could implement the joke. In didn’t take long. Already on 1 April postdoctoral researcher Evan Suma had uploaded a video demonstrating such a system.

Suma and his folks used the Microsoft Kinect. It would probably be harder to implement such a system with a standard PC/smartphone camera. Emailing may not be the obvious choice for body motion control. Typically you need to type in longer text which is yet best handled with a keyboard. I think the control of a television might be a more appropriate application. Typically you only need to change channel and turn up and down for the volume. You would need camera of the Kinect type on the television set and body motion recognition software. I am not sure I want such a system in my house. If the television set has Internet connection and gets hacked the perpetrator can broadcast your living or bed room to the curiosity of the world via something like Bambuser.

In Denmark one of the newspaper carried a story on ultrasonography with a mobil phone. Given that modern smartphones have so many trasducers and receivers (microphone, loadspeaker, wifi, mobil antenna, gyroscope, accelerometer, GPS, vibration, camera, screen, did I forget any?) it is interesting to ask how far we are from implementing this April’s fool joke. AFAIK you will need a piezoelectric crystal as the Ultrasonic transceiver. I am not too familiar with the issues around the design of such things but my guess is that it would not be impossible to fit a transceiver inside the mobilphone, but there might be problems with the vibrations it emits to the entire mobil phone.

Back in 2009 American researcher reported they could do ultrasonography with a mobil phone. But this was with an external USB transceiver. There are commercial USB systems, see InNovaSound. FDA-approved they are quite expensive with a price of well more than 1000 American dollars as far as I can determine, but it seems that the price is falling. With non-FDA-approved mass-production we see in the mobil phone industry the inclusion of ultrasonography into modern smartphones seems to be a possibility.