Hot or not or what: Data mining attractiveness

Posted on Updated on


From the media we hear that women are most attractive at 31. That fact is based on an “poll of 2,000 men and women, commissioned by the shopping channel QVC to celebrate its Beauty Month.” So this is a kind of science that is part of a media effort of a company. We also see such use of science in neuromarketing research. However, in this case the results are likely to be reasonably ok.

The web site Hot or Not has according to Wikipedia both been an inspiration for YouTube and Facebook. The site allows you to rate men and women based on their uploaded photo.

Back in 2009 I became aware of Hot or Not in a nerdish way: The computer programming book Programming Collective Intelligence uses the site as a real-life example for prediction based on annotation in the social web. Hot or Not has an API, so you can get some data from the site. You need an API key, and last time I checked you couldn’t obtain new keys, but I could use the one given in the book.

So I started to download data. You don’t get the individual ratings but the average rating for each person as well as a bit of demographics, e.g., the age. So there is really not so much you can do. The programming book try to predict the rating based on gender, age and location (US state).

I tried to see how the rating varied with age. I managed to make a plot of a sample of men and women from Hot or Not, and the result somewhat surprised me. I was expecting a decay in rating for women and men as a function of age, with around 31 years as a good candidate for maximum rating. However when I look on the ratings for women there is very little decay, in fact if you fit a second order polymonium you actually see a slight rise for older women. With unscrupulous extrapolation you would say that 100-year old women are maximum attractive. Men have the ‘correct’ decay with a highest rating somewhere around 30 or before. But there is considerably variance within year compared to the average between years.

One explanation for the effect seen among women is that only beautiful older ladies would “dare” to upload their image, while ugly young women are not afraid. There is also the possibility that we really cannot trust the average ratings reported to us by Hot or Not. I have got an account myself and uploaded an image. Presently I got a rating on 7.7 based on 206 people (the scale goes from 1 to 10). Hot or Not reports that I am “hotter than 74% of men on this site!”. When I compare 7.7 with the data I can download the percentage does not fit: Around 90% of males score higher than my 7.7. Yet another possibility is that the way I call the Hot or Not API does not give a fair sample of the people actually in the Hot or Not database.

Hot or Not data has been used in a few scientific reports, see, e.g., Economic principles motivating social attention in humans that made their own ratings and If I’m Not Hot, Are You Hot or Not? that has employees on the author list and thereby gained access to its unique data.

From the folklore of network analysis: The Erdos-Bacon number

Posted on Updated on

I have just discovered that I have an entry on IMDb through Director Dola Bonfis‘ documentary film Tankens Anatomi (The anatomy of thought). It is from 1997 but I do not recall seeing an entry for the film nor me on IMDb before. It almost makes it easy to compute my Bacon number. The Web-service The Oracle of Bacon allows you to type in name of two IMDb-listed people and it will then find the shortest path. However, I don’t seem to be present in The Oracle of Bacon database. Danish Entertainer, scientist and author Peter Lund Madsen also appears in the Tankens Anatomi movie, and he is present in the Oracle. Depending on the options set in The Oracle of Bacon it is possible to get to Kevin Bacon, although we need to go over, e.g., Mr Nice Guy which is just a recorded comedy show released on video. Mr Nice Guy features Trine Dyrholm who is a “proper” actress and from her it gets easy, e.g., by P.O.V. to Gareth Williams and Digging to China with Kevin Bacon. So it seems that I have a Bacon number of 4.

My combined Erdos-Bacon number then drops to 7.

In our research group we have relatively low Erdos numbers since our hub, Professor Lars Kai Hansen, wrote the concisely titled paper Neural Network Ensembles with Peter Salamon, – a researcher with an Erdos number of 1. The Hansen-Salamon paper from 1990 has become the most cited from our department (as far as I can determine). With Lars Kai I have written a large number of articles, e.g., Modeling of activation data in the BrainMapTM database: Detection of outliers.

Seven is still far from the five of Kiralee Hayashi, a former gymnastics champion, former scientist and present actress. According to her LinkedIn Profile she has worked at the Laboratory of Neuro Imaging (LONI), – a well-known neuroimaging research group. With noted neuroimaging researcher Paul Thompson she is on the author list together with big shot mathematician Shing-Tung Yau who has a Erdos number of 2, – according to Paul Thompson’s Wikipedia-cited Erdos number page. Their paper is Brain Surface Parameterization Using Riemann Surface Structure.

Now I have been trying to compute my Hayashi-Hayashi number. This must be 12 or less. Paul Thompson has a Hayashi-science number of one and through In vivo evidence for post-adolescent brain maturation in frontal and striatal regions Californian Terry Jernigan gets an Hayashi-science number of 2. (See also entry for the paper in the Brede Wiki). Terry is also in our Danish CIMBI brain project and with Jan Kalbitzer’s interesting neuroimaging seasonality paper Seasonal Changes in Brain Serotonin Transporter Binding in Short Serotonin Transporter Linked Polymorphic Region-Allele Carriers but Not in Long-Allele Homozygotes, where both Terry and I are in the author list, I will get a Hayashi-science number of just 3!

Allowing for the documentary/video trick and with Kiralee Hayashi and Trine Dyrholm in The Oracle of Bacon I get a Hayashi-film number of 5, and my Hayashi-Hayashi number then becomes 8.

What a small world.

(minor edit: 2012-10-16)


Brede Wiki and Brede Database 2009

Posted on Updated on



I have just drafted a section for the CIMBI 2009 annual report:

We have argued for a wiki approach to database information from published neuroimaging articles [1], and we now have implemented the Brede Wiki available from the Web site

The wiki is based on MediaWiki – the software that runs Wikipedia. With an extensive use of so-called MediaWiki templates information can be structured and easily extracted [2]. The content in the wiki is focused on neuroscience information: Text and data about neuroimaging studies, brain regions, topics, software, researchers, organizations, journals and events. The wiki makes extensive use of deep links to other neuroscience databases, enabling federation of content with other neuroinformatics databases. The Brede Wiki has almost 1,500 pages, e.g., describing 206 brain regions and 175 scientific papers. With the extracted data from the structured part of the Brede Wiki a small search interface has been constructed that allows for searching for nearby coordinates to a given query coordinate. The Brede Wiki also allows for upload of volume files in a standardized format. Thus it provides a uncomplicated means for sharing result volumes from neuroimaging statistical analyses.

Another project of the group – Brede Database – has now been included in the large scale American database federation effort ‘Neuroscience Information Framework‘. Furthermore some of the visualization efforts for the Brede Database was described in a recent article [3].

  1. Lost in localization: A solution with neuroinformatics 2.0? Finn Årup Nielsen, NeuroImage, 48:11-13, 2009
  2. Brede Wiki: Neuroscience data structured in a wiki, Finn Årup Nielsen, Proceedings of the Fourth Workshop on Semantic Wikis – The Semantic Wiki Web : 6th European Semantic Web Conference, Hersonissos, Crete, Greece, June 2009: Lange, Christoph
  3. Visualizing data mining results with the Brede tools, Finn Årup Nielsen, Frontiers in Neuroinformatics, 3:26, 2009.

The image is a figure from the CC-by Frontiers in Neuroinformatics article

23 year in coma and then in headlines.

Posted on Updated on

Rom Houben was thought to have been in coma for around 20 years but then Steven Laureys brain scanned him with positron emission tomography and found that he was minimally conscious. From a German news article it gets into English news and further even to the front page of a Danish tabloid. And the news media had citations from Rom himself: So he can communicate with complete sentences! That is something of a story.

I heard of this story and found that it already was on Wikipedia, where on Rom Houben’s page one could read a section called “controversy”. A video was linked from the Wikipedia page and it clearly showed Rom communicating via Facilitated Communication (FC) (p?? dansk: staveplade). Now FC has exceptionally low standing in the scientific community, and immediately that would call the whole story into question. I heard Steven Laureys in one scientific conference and he seemed to me to be an ok guy—not one that would start using FC. But this story could undermine his credibility. Anibal from Spain, that I follow on Twitter, pointed me to the an entry in Neurologica Blog where commentors were also very sceptical. But one—presumably Flemish speaking—commentor pointed to a recent Belgian news article where Steven Laureys had spoken. The commentor translated it to English, and according to this Steven Laureys says:

That (FC) is a debate that troubles me much more. I myself am sceptical, and that kind of facilitated communication still has a bad reputation, and rightly so. I’m not part of that, and have never suggested using it.

So it seems the news media made this story big by not being critical about the FC. And Wikipedia is more credible?

Using Google Web-service to keep track of scientific citations to me

Posted on Updated on


Google Scholar allows me to see which scientific papers cite my scientific papers. However, it does not order them according to date so I cannot easily identify the most recent papers with cite to me.

One way to somehow identify recent citations is to use the “as_ylo” parameter available in the advanced search. With as_ylo=2009 only the papers published in 2009 are shown to the given query. Combining that with a negative ‘author:’ query gets you some of the way, e.g., with “Nielsen FA” -author:”FA Nielsen” (included as_ylo=2009) I find papers from 2009 mentioning ‘Nielsen FA’ that are not authored by me.

To get a higher retrieval rate I list some of the different variations of my name in the query. The real query is then (abbreviated) “Nielsen FA” OR … -author:”FA Nielsen” …!

As the year progresses one gets more and more citations and it becomes difficult to identify the new ones. Using the real-time search in the standard Google Web search one may try an alternative way. Restricting the search to PDF files and real-time search for past month data may result in newer data, – but probably also lacking papers from publishers letting Google Scholar in but Google Web out: “Nielsen FA” OR … filetype:pdf

It is possible that Google Alerts also can help.

2010-11-25: Typo correction