I have previously blogged about the Milena Penkowa case that has entertained the Danish research community in the first half of 2011. If you want an English update there is an overview in the April article Penkowa for dummies.

One of the latest to jump on the wagon for Penkowa bashing is geologist Peter Riisager. Back in March he looked on the self-citations of Penkowa and reported it on his blog. He found that 54% of Penkowa’s citations where her own. The story was picked up a couple of weeks ago by the university newspaper Danish and English as well as a Danish science web-site. When Riisager finding that Penkowa has over 50% self-citations he links to a Nature blogger that claims that “Bad guys have > 50% self-citations” and “good guys have self-citations as < 50% of total cites (I [Brian Derby] am at 25%)”. qed: Penkowa is a bad guy.

But is Riisager (and blogger Brian Derby) right? I cannot find out which method he used. 50% self-citations sounds fairly much.

How can we investigate this further? Well, here is my methodology: I use ISI Web of Science, search on an author, press “Create Citation Report” to get number of articles the author has written (“Results found”) and the number of citations (“Sum of the Times Cited”), For the number of non-self citations I press “View without self-citations” and read off “Result: ” in the upper left corner of the web-page. Is that an ok procedure? Nah. I think the problem is that “Sum of the Times Cited” refers to the number of citations while “View without self-citations” refers to the number of papers with citations without self-citations. What we should (also) do is to get the number of papers with citations (“View Citing Articles”). The problem is that there are multiple citations in each paper. What we also would like to have is the number of citations without self-citations, but I don’t know how to get that number from ISI Web of Science.

Below I have attempted a count on Milena Penkowa, Peter Riisager, myself and big shot neuroimaging analyzer Karl J. Friston. The “self-citation rate (A)” is computed what I believe is the wrong way (citations-Papers with non-self citations)/citations, while “self-citation rate (B)” is computed by the number of citing papers (Papers with citations – Papers with non-self citations)/Papers with citations.

Author Papers Citations Papers with citations Papers with non-self citations Self-citation rate (A) Self-citation rate (B)
Penkowa M 108 2482 1261 1179 52% 6.5%
Riisager P 32 372 273 254 31% 7.0%
Nielsen FA 34 649 549 533 18% 2.9%
Friston KJ 459 47381 26663 26285 46% 1.4%

In his blog post from 8 March 2011 Riisager writes that Penkowa has a total of 2,401 citations where 1296 are self-citations. With my “wrong” methodology I get 2481-1179 = 1302 self-citations, – pretty close to the numbers of Riisager. So are Riisager mixing up the units: papers and citations? Or how did he get his numbers?

The “wrong” (A)-way of computing the self-citation rate seems way off. If you take the (A) self-citation rate of Friston you get 46%. This seems to be an outragous rate. Surely of Friston’s many citations 46% is not generated by himself. That would put him near Brain Derby’s “bad guy”… As long as we do not have the number of citations without self-citations – only the number of papers with citations without self-citations – we can only use that. And if we now look on Penkowa’s self-citation rate it is not over 50% but rather 6.5%. That value is actually lower than the self-citation rate I compute for Peter Riisager! So who is laughing now?

I must admit I am not completely sure on my methodology. To investigate the issue fully one may need to download all the papers and count the citations so we can understand the ISI Web of Science values. My (B)-method gives me a self-citation rate on 2.9%. I think on Google Scholar I have a higher number of self-citations as Google Scholar is indexing all my slides. As I tend to reference myself on the slides my number of citations gets boosted, and it may partially explain why my Google Scholar h-index is higher than my ISI Web of Science h-index.


(2012-03-07: language correction)


7 thoughts on “Self-citation and the Milena Penkowa and Peter Riisager case

    Jackson_MM said:
    July 11, 2011 at 4:35 pm

    you are so wrong and for such an obvious reason that I find it stupid to loose the time to explain you why you are wrong.from the help page of web of science:"Sum of the Times CitedThe total number of citations to any of the items in the set of search results. This is the sum of the Total column.In the example below, the Sum of the Times Cited is 39, which is the sum of the Total column.Click the View Citing Article llink to display the citing articles. The number of citing articles retrieved may be smaller than the sum of the Times Cited because an article may cite more than one item in the set of search results. For example, you have six articles in the Citation Report. An author may have cited three of the articles in his or her paper. In this instance, the system retrieves four citing articles – not six. We only display a citing article once.Click the View without self-citations link to display a list of citing articles minus any article that appears in the set of search results on the Citation Report. "I know this last sentence is of difficult comprehension. I totally understand you.

    Anonymous said:
    July 11, 2011 at 8:09 pm

    @Jackson_MM,I don’t understand your criticism. As I understand the Web of Science documentation (that I have just read now) it is precisely aligned with my understanding expressed in the blogpost. So why am I "so wrong"? Have you read my entire blogpost? Note that I regard neither the (A) nor (B) self-citation rate as "right". The (A) approach mixes up the units (ie., I believe Riisager is wrong when he critizes Penkowa, which was the reason I blogged about it). The (B) approach is not really optimal neither: It does give a "sound" self-citation rate, but it is per-paper-based, – not per-citation-based – and this former is not what you typically want. You need to manually count the self-citations to get a "proper" value (unless one can find some hidden functionality in Web of Science?). I have done a sample count this way. Penkowa’s self-citations was not that bad judging from the few papers I browsed.

    Nelsen said:
    October 27, 2011 at 8:34 pm

    You can evaluate self citation in Scopus easily. Under citation overview, you can exclude self citation from the authors. Another example of self citation was recently published on an American youngest professor Nelson Tansu who has 60% self ciation (google it!).

    Anonymous said:
    November 17, 2011 at 5:40 am

    I cannot see how Riisager obtain the number, in the latest ISI Web of Knowledge, one can view the number of non self citationResults found:
    124Sum of the Times Cited [?] :
    2879Sum of Times Cited without self-citations [?] :
    2167So penkowa no. of self-citation is 716, or 25%

    Anonymous said:
    November 17, 2011 at 10:31 am

    I agree overall with @garfield1 I do however get slightly different results. Searching We of Science on "Penkowa M" (1980-) I get 113 publications with 2685 times cited and 2007 non-self citations, yielding 25% – a percentage as @garfield1. I wonder about the drop from @garfield’s 124 to my 113. Amore liberal search on "Penkowa" and all years yield for me (17th November 2011) only 115 publications. I wonder if this could be retractions?For comparison I found 5% self-citations for me with "Nielsen FA" (1980-) from 34 publications.11% self-citations for "Riisager P" (1980-) from 34 publicationsand 4% self-citations for big shot "Friston KJ" (1980-) from 468 publications.Concluding I would say that Penkowa is citing herself a lot but far from the 50% that Riisager claims.

    Anonymous said:
    November 17, 2011 at 8:32 pm

    Web of Knowledge has a larger database than Web of Science although both owned by Thompson Reuters. Web of Knowledg einclude proceedings and books, etc.. so has a slighly larger no. papers and citations

    Anonymous said:
    November 18, 2011 at 5:02 pm

    Thanks for clarifying! I actually didn’t think that the ‘All Databases’ option in "Web of Knowledge" had that feature, but I see it is the ‘Analyze Results’ that "All Databases" option results lacks compared to "Web of Science".

