Month: January 2011

Pew survey: 42% of adult Americans use Wikipedia

Posted on

Pew Research Center’s Interet & American Life Project has conducted an interesting telephone-based survey about Internet and Wikipedia use in spring 2010. The report with the results was published around the 10 year anniversary of Wikipedia in January 2011. They have a previous report from 2007. They report that 42% of adult Americans used Wikipedia in May 2010, up from 25% in 2007. If we linearly extrapolate then 110% of adult Americans will be using Wikipedia in 13 years.

If you are a young white man with a high education level, have a broadband connection and a good income but not the highest it is likely that you use Wikipedia. On the other hand if you are an old Hispanic woman with no high school education and a low income sitting with a dial-up connection then you are less likely to use Wikipedia.

One thing that surprised me was how little difference there was between male and female users of Wikipedia. Among Internet users 56% of males use Wikipedia while the corresponding figure for females is 50%. These percentages are for readers. I suppose the males are more active as writers – from my personal experience. It is also what the Wikipedia Survey finds (page 7): Only 13% of contributors to Wikipedia (that took the survey) are female.

There are a few things I don’t understand. They report the Wikipedia use among Internet users to be 79%. In the methodology section they report a sample size on 2’252 and 1’756 Internet users. If you divide the two numbers you get 1756/2252 = 0.77975 which is nearer to 78% than 79%. Another strange issue is that there are 1756 Internet users (according to the methodology) while for the characterization of the demographics of Wikipedia users there are only 852 Internet users. They report they called 33’594 phone numbers and got a “response rate” around 20%. 20%
of 33’594 gives around 6’700 which is not 2’252. So where is the rest lost? Perhaps somewhere around the “completion rate”, “eligibility rate” and “cooperation rate”? Could the 78/79% issue be related to telephone interview debias weighting…?

The data is available on their homepage.

Putting "Putting Wikipedia to the test: a case study" to the test

Posted on

A fairly small study on the quality of Wikipedia called Putting Wikipedia to the test: a case study seems to have been electronically published from Brisbane, Australia in 2008.

The study looked on just three medical topics:

and did a blinded comparison with three other online services

For UpToDate the researchers aggregated separated articles into one. For Wikipedia non-scientific sections were omitted, e.g., history and movie characters with multiple sclerosis.

Three medical academics evaluated each four of the 12 articles. They were blinded and ranked the content for accuracy, coverage, concision, currency and overall suitability for medical students. Medical librarians (unblinded presumably) assessed accessibility and usability (cost and login requirement) “ease of finding and navigating the information” and “the quality of presentation”.

Results are summarized in their Table 3 with AccessMedicine generally scoring the best (39 points) followed with some distance by UpToDate (32 points), Wikipedia (30 points) and eMedicine (29).

My critique is:

  1. Overall well-carried out study with blinding.
  2. Very small study with only three topics evaluated. This makes the study one of the smallest I have ever seen, and questions how much we can generalize from the study.
  3. The researchers summarize their results on Wikipedia with “relatively concise and current, but failed to cover key aspects of two topics and contained some factual errors. Each reviewer deemed it unsuitable as a learning resource for medical students”. It should be noted that for the Otitis media article both the eMedicine and the AccessMedicine article were deemed unsuitable. Also the eMedicine’s ‘Multiple sclerosis’ article were deemed unsuitable. UpToDate’s ‘Otitis Media’ was only “generally suitable but with some limitations.” This leaves the medical student with no really good online resource for ‘Otitis Media’.
  4. Factual errors were reported for 5 of the 12 articles (these numbers include 2 of the 3 Wikipedia articles). It is not reported what precisely these factual errors were.
  5. I find it interesting that they find “navigating around the information was simpler in UpToDate and eMedicine than in Wikipedia”. I wonder what that is due to. This is not explained.

The study points to two other recent Wikipedia quality studies Scope,
completeness, and accuracy of drug information in Wikipedia
and Wiki-Surgery? Internal validity of Wikipedia as a medical and surgical reference.

(Correction 16:48, 2011-01-17, 18 & 19 other minor corrections)

Navigating the Natalie Portman graph: Finding a co-author path to a NeuroImage author

Posted on Updated on


Hollywood actress Natalie Portman I first remarked in the Mike Nichols 2004 film Closer. According to rumor on the Internet a few years before Closer she co-authored a functional neuroimaging scientific article called Frontal lobe activation during object permanence: data from near-infrared spectroscopy. She was attributed as Natalie Hershlag.

I have written before of data mining a co-author graph for the Erdös number and “Hayashi” number, and I wondered if it would be possible to find a co-author path from Portman to me. And indeed yes.

Abigail A. Baird first-authored Portman’s article, and the article Functional magnetic resonance imaging of facial affect recognition in children and adolescents has Abigail Baird and psychiatry professor Bruce M. Cohen among the authors. Bruce M. Cohen and Nicholas Lange is among the co-authors on Structural brain magnetic resonance imaging of limbic and thalamic volumes in pediatric bipolar disorder and Lange and I are linked through our Plurality and resemblance in fMRI data analysis, — an article that contrasted different fMRI analysis methods.

So the co-author path between Portman and me is: Portman – Baird – Cohen – Lange – me, which bring my “Portman number” to 4.

Navigating a graph is a general problem if you only know the local connections. There has even been written scientific articles about it, e.g., Jon Kleinberg‘s Navigating in a small world. When a human (such as I) navigate a social graph such as the co-author graph of scientific articles one can utilize auxillary information, here the information about where a researcher has worked, what his/her interest are and how prominent the researcher is (how many co-authors s/he has). As Portman worked from Harvard a good guess would be to start looking among my co-authors that are near Harvard. Nicholas Lange is from Harvard and we collaborated in the American funded Human Brain Project. I knew that radiology professor Bruce R. Rosen was/is a central figure in Boston MRI researcher, so I thought that there might be a productive connection from him, — both to Lange and to Portman. Portman’s co-author Baird is professor and has written some neuroimaging papers, so among Portman’s co-authors Baird was probably the one that could lead to a path. While searching among Lange and Baird co-authors I confused Bruce Rosen and Bruce Cohen (their Hamming distance is not great). This error proved fertile.

If I didn’t run into Cohen and really wanted to find a path between Portman and me then I think a more automated and brute force method could have been required. One way would be to query PubMed and put the co-author graph into NetworkX which is a Python package. It has a shortest path algorithm. Joe Celko in his book SQL for Smarties: Advanced SQL programming shows a shortest path algorithm in SQL. That might be an alternative to NetworkX.

(Photo: gdcgraphics, CC-by, taken from Wikimedia Commons)

5-HTTLPR episode 17: The revenge of the neurocriticcritic

Posted on Updated on

I am sort of a neuropessimist believing that a large part of neuroscience results are more variable than we would like to think. I dont think that I am extremist like Why Most Published Research Findings Are False. I still need to understand its mathematical details and its critique.

The oldtimer 5-HTTLPR genetic polymorphism has long been hailed and then dethroned as associated with anxiety-related personality traits. Quite a number of meta-analyses have examined its effect on a range of variables and I recently listed some of these in tables for 5-HTTLPR meta-meta-analysis. The results are somewhat – hmmm – well – perhaps there is an effect on depression, perhaps only little effect or perhaps no effect. For the interaction between 5-HTTLPR and “stressful life events” on depression two 2009 meta-analyses (Munafo and some others) found no effect.

Anonymous neuroimaging blogger The Neurocritic had in 2009 a piece called Myth of the Depression Gene where he (probably not a she) with a certain amount of schadenfreude dethroned the optimistic original 2003 study of Caspi, Sugden, Moffit and all the others. Now yesterday neurocriticcritic nooffensebut pointed to a new meta-analysis published a few days ago, The serotonin transporter promoter variant (5-HTTLPR), stress, and depression meta-analysis revisited: evidence of genetic moderation, that claims a fair amount of effect from the 5-HTTLPR-stress interaction on depression.

Now I would say that you can’t trust the papers that say you can’t trust papers. But in the true spirit of neuropessimism I would say that you also shouldn’t trust that.

For you PubMed junkies: The next episode of 5-HTTLPR will come to a web-page near to you.

Does Yandex honor robots.txt?

Posted on

I have setup arobots.txtwith “User-agent: *” and appropriate Disallow, but I discovered in my logthat the Apache2 server was under heavy load from the bots of Russiansearch engine Yandex. Is it me who have setup the robots.txt wrongly? Asfar as I can see no other bots get to the place I do not want to becrawled.

People on the internet suggest “User-agent: Yandex” and disallow rightafter, but others claim that Yandex does not look at robots.txt andsuggest putting the following in the .htaccess file in the document root(usually /var/www/):

SetEnvIfNoCase User-Agent "^Yandex*" bad_bot Order Deny,Allow Deny from env=bad_bot

This seems to work for me, although I also needed something like“AllowOverrideall”in the configuration file usually found in thedirectory /etc/apache2/sites-available/

So this is one of the silly things you can spend your life on.

A strange characterization of fMRI

Posted on

Google Scholar has alerted me to a recent paper, Automatic Detection, Estimation, and Validation of Harmonic Components in Measured Power Spectra: All-in-One Approach that is to be published in IEEE Transactions on Instrumentation and Measurement. Finding the context where we were cited I read (page 1, first column):

In functional magnetic resonance imaging measurements,
one is interested in detecting tumor tissue based on a harmonic
analysis of the data [3]-[5]

There are two strangenesses here: First of all functional magnetic resonance imaging (fMRI) is not (primarily) interested in detecting tumor tissue. MRI without the ‘f’ might be interested in tumor detection. Second, harmonic analysis is not necessarily (and indeed rarely) used for fMRI analysis, and indeed it will be difficult (impossible?) with event-related fMRI. However, harmonic analysis by Fourier transform for image reconstruction is embedded in the MR-scanner.

The citation to our work (the “[5]”) goes to the “Hansen Harmonic Detector” (HHD) that Lars Kai Hansen came up with — a funny detector that can find harmonics on both sides of the Nyquist frequency. Coming from a classic signal processing background you might think that this is magic, but the approach “just” uses a linear model and Bayesian estimation with conjugate priors reaching a normal-inverse-gamma distribution. Keine hexerei.