Month: June 2011
Lone "The Adjectived" Aburas, The Second
I suppose that highest aspiration of an author is to get a phone call early one morning from an English speaking person with a heavy Swedish accent. The second highest may be to become an adjective such as “Shakespearian”. Now young suburbian observer Lone Aburas (her last letter is not indicating genitive) has managed to become an adjective just after her second novel. Congratulation!
The novel is called Den svære toer, – The difficult second, i.e. second book.
Collective novel with modern social realism detailing depressing everyday life of suburbians. Not a winner? Well, the book is loaded with sufficient Danish humor and irony that we well manage. One blogger writes he has a difficulty in seeing the humor in the novel. Sorry for him. Lone Aburas clearly states that she uses irony and the use of meta-commentary was humorous. Even the title was humorous: “[…] I think it was funny […] it is mostly an ironic title […]”. The ironic meta-commentary in the beginning and the end has the clearest scoops in this direction. The end sets up tasks for the reader. The reader may, e.g., “analyze the begining of the novel” and find examples of were the author breaks the rules that she sets up. This is meant ironically, so the reader should not necessarily do that. However, the reader may already have analyzed the beginning while reading it and found out that it was ironic (the beginning). As the rules set up were meant ironic, it means that these rules were not really set up and we expect the rules to be broken, meaning that the meta-rule is that the rules are to be broken. (The obvious next step for me here would be to come up with some meta-humoristic irony in a comment to the meta-humoristic irony of Aburas. I will not do that though)
Humor with irony sits centrally in Danish popular art: Hans Christian Andersen’s fairy tales, double entendre Barbie Girl of Aqua, humorous text of long-time popular Danish music group Shubidua (selling more records than the population of Denmark). Most best selling Danish film in Denmark in the last 40 years are comedies: Olsen Gang, Den eneste ene, Italiensk for begyndere and Klovn – the movie. Even Albert Speer-lover Lars Trier most popular work in Denmark is humorous.
But apart from the irony what does the novel wants? Not clear. Lone Aburas leaves her poor characters to their own destiny with divorce and a dog training course. In the Danish hit comedy Italian for beginning we also follow Copenhagen suburbians through a course. But this course in the Italian language ends successfully with a romantic trip to Italy while Lone Aburas dog training course ends with course participants being cheated for the course fee paid up front. Not nice.
On the negative side I also find that the novel lacks an index. The punctuation I find ok though.
Advices for Lone Aburas for her third novel? Well, more structure I would say. And action! Most modern literature involves one or possible a connected series of murders, – a case to solve. A revised second edition could, e.g., consider changing the police stop on page 126 with a dramatic car hunt. Also the car crash on page 134 could be described in detail. Another issue is what she herself acknowledge on page 137 with the words: “Actually I do not like to describe two humans having sex” which is a problem as she further writes “[…] you are not a real writer if you are not capable of writing about erotics”. She needs to work on that bit. Include murder and sex. Possible also international crime and the revolution in Egypt.
Online topic mining with sentiment analysis
I have now updated the Brede topic mining web-service with sentiment analysis using the AFINN word list.
In the example seen in the images I have a few posts from a recent query on Pfizer on Twitter. The sentiment analysis has a problem on the tweet “Pfizer’s Remoxy Fails to Win FDA Approval” as both the words “win” and “approval” are positive but in the contexts the word “fails” negates which the simple sentiment analyzer fails to detect.(correction: 20:20)
Self-citation and the Milena Penkowa and Peter Riisager case
I have previously blogged about the Milena Penkowa case that has entertained the Danish research community in the first half of 2011. If you want an English update there is an overview in the April article Penkowa for dummies.
One of the latest to jump on the wagon for Penkowa bashing is geologist Peter Riisager. Back in March he looked on the self-citations of Penkowa and reported it on his blog. He found that 54% of Penkowa’s citations where her own. The story was picked up a couple of weeks ago by the university newspaper Danish and English as well as a Danish science web-site. When Riisager finding that Penkowa has over 50% self-citations he links to a Nature blogger that claims that “Bad guys have > 50% self-citations” and “good guys have self-citations as < 50% of total cites (I [Brian Derby] am at 25%)”. qed: Penkowa is a bad guy.
But is Riisager (and blogger Brian Derby) right? I cannot find out which method he used. 50% self-citations sounds fairly much.
How can we investigate this further? Well, here is my methodology: I use ISI Web of Science, search on an author, press “Create Citation Report” to get number of articles the author has written (“Results found”) and the number of citations (“Sum of the Times Cited”), For the number of non-self citations I press “View without self-citations” and read off “Result: ” in the upper left corner of the web-page. Is that an ok procedure? Nah. I think the problem is that “Sum of the Times Cited” refers to the number of citations while “View without self-citations” refers to the number of papers with citations without self-citations. What we should (also) do is to get the number of papers with citations (“View Citing Articles”). The problem is that there are multiple citations in each paper. What we also would like to have is the number of citations without self-citations, but I don’t know how to get that number from ISI Web of Science.
Below I have attempted a count on Milena Penkowa, Peter Riisager, myself and big shot neuroimaging analyzer Karl J. Friston. The “self-citation rate (A)” is computed what I believe is the wrong way (citations-Papers with non-self citations)/citations, while “self-citation rate (B)” is computed by the number of citing papers (Papers with citations – Papers with non-self citations)/Papers with citations.
Author | Papers | Citations | Papers with citations | Papers with non-self citations | Self-citation rate (A) | Self-citation rate (B) |
---|---|---|---|---|---|---|
Penkowa M | 108 | 2482 | 1261 | 1179 | 52% | 6.5% |
Riisager P | 32 | 372 | 273 | 254 | 31% | 7.0% |
Nielsen FA | 34 | 649 | 549 | 533 | 18% | 2.9% |
Friston KJ | 459 | 47381 | 26663 | 26285 | 46% | 1.4% |
In his blog post from 8 March 2011 Riisager writes that Penkowa has a total of 2,401 citations where 1296 are self-citations. With my “wrong” methodology I get 2481-1179 = 1302 self-citations, – pretty close to the numbers of Riisager. So are Riisager mixing up the units: papers and citations? Or how did he get his numbers?
The “wrong” (A)-way of computing the self-citation rate seems way off. If you take the (A) self-citation rate of Friston you get 46%. This seems to be an outragous rate. Surely of Friston’s many citations 46% is not generated by himself. That would put him near Brain Derby’s “bad guy”… As long as we do not have the number of citations without self-citations – only the number of papers with citations without self-citations – we can only use that. And if we now look on Penkowa’s self-citation rate it is not over 50% but rather 6.5%. That value is actually lower than the self-citation rate I compute for Peter Riisager! So who is laughing now?
I must admit I am not completely sure on my methodology. To investigate the issue fully one may need to download all the papers and count the citations so we can understand the ISI Web of Science values. My (B)-method gives me a self-citation rate on 2.9%. I think on Google Scholar I have a higher number of self-citations as Google Scholar is indexing all my slides. As I tend to reference myself on the slides my number of citations gets boosted, and it may partially explain why my Google Scholar h-index is higher than my ISI Web of Science h-index.
(2012-03-07: language correction)
Simplest sentiment analysis in Python with AFINN
I have previously blogged about sentiment analysis. Code for simple sentiment analysis with my AFINN sentiment word list is also available from the appendix in the paper A new ANEW: Evaluation of a word list for sentiment analysis in microblogs as well as ready for download. It might be a little difficult to navigate the code, so here I have made the simplest example in Python of sentiment analysis with AFINN that I could think of.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/python | |
# | |
# (originally entered at https://gist.github.com/1035399) | |
# | |
# License: GPLv3 | |
# | |
# To download the AFINN word list do: | |
# wget http://www2.imm.dtu.dk/pubdb/views/edoc_download.php/6010/zip/imm6010.zip | |
# unzip imm6010.zip | |
# | |
# Note that for pedagogic reasons there is a UNICODE/UTF-8 error in the code. | |
import math | |
import re | |
import sys | |
reload(sys) | |
sys.setdefaultencoding('utf-8') | |
# AFINN-111 is as of June 2011 the most recent version of AFINN | |
filenameAFINN = 'AFINN/AFINN-111.txt' | |
afinn = dict(map(lambda (w, s): (w, int(s)), [ | |
ws.strip().split('\t') for ws in open(filenameAFINN) ])) | |
# Word splitter pattern | |
pattern_split = re.compile(r"\W+") | |
def sentiment(text): | |
""" | |
Returns a float for sentiment strength based on the input text. | |
Positive values are positive valence, negative value are negative valence. | |
""" | |
words = pattern_split.split(text.lower()) | |
sentiments = map(lambda word: afinn.get(word, 0), words) | |
if sentiments: | |
# How should you weight the individual word sentiments? | |
# You could do N, sqrt(N) or 1 for example. Here I use sqrt(N) | |
sentiment = float(sum(sentiments))/math.sqrt(len(sentiments)) | |
else: | |
sentiment = 0 | |
return sentiment | |
if __name__ == '__main__': | |
# Single sentence example: | |
text = "Finn is stupid and idiotic" | |
print("%6.2f %s" % (sentiment(text), text)) | |
# No negation and booster words handled in this approach | |
text = "Finn is only a tiny bit stupid and not idiotic" | |
print("%6.2f %s" % (sentiment(text), text)) |
(2012-12-01: Updated link to new gist at github)
Who knows Who knows? You can now play a Facebook application while doing research
On 8th Extended Semantic Web Conference researchers from Potsdam showed a Facebook application. It is a quiz game and is called Who Knows?
The special thing about it is that the questions are automatically generated from Wikipedia via DBpedia. As users’ interaction with the game is recorded the result may be used to improve the ranking of triple data in Semantic Web applications as well as find errors in Wikipedia/DBpedia. The background scientific paper is WhoKnows? – Evaluating Linked Data Heuristics with a Quiz that Cleans Up DBpedia. Last author Harald Sack is presently on the top of the high score list. Another of their Facebook quiz applications is Risq.