Google Scholar

My h-index as of June 2017: Coverage of researcher profile sites

Posted on Updated on

The coverage of different researcher profile sites and their citation statistics varies. Google Scholar seems to be the site with the largest coverage, – it even crawls and indexes my slides. The open Wikidata is far from there, but may be the only one with machine-readable free access and advanced search.

Below is the citation statistics in the form of the h-index from five different services.

h Service
28 Google Scholar
27 ResearchGate
22 Scopus
22(?) Semantic Scholar
18 Web of Science
8 Wikidata

Semantic Scholar does not give an overview of the citation statistics, and the count is somewhat hidden on the individual article pages. I attempted as best as I could to determine the value, but it might be incorrect.

I made a similar statistics on 8 May 2017 and reported it on the slides Wikicite (page 42). During the one and a half month since that count, the statistics for Scopus has change from 20 to 22.

Semantic Scholar is run by the Allen Institute for Artificial Intelligence, a non-profit research institute, so they may be interested in opening up their data for search. An API does, to my knowledge, not (yet?) exist, but they have a gentle robots.txt. It is also possible to download the full Semantic Scholar corpus from (Thanks to Vladimir Alexiev for bringing my attention to this corpus).


The Wikidata scholarly profile page

Posted on Updated on


Recently Lambert Heller wrote an overview piece on websites for scholarly profile pages: “What will the scholarly profile page of the future look like? Provision of metadata is enabling experimentation“. There he tabularized the features of the various online sites having scholarly profile pages. These sites include (with links to my entries): ORCID, ResearchGate, Mendeley, Pure and VIVO (don’t know these two), Google Scholar and Impactstory. One site missing from the equation is Wikidata. It can produce scholarly profile pages too. The default Wikidata editing interface may not present the data in a nice way – Magnus Manske’s Reasonator – better, but very much of the functionality is there to make a scholarly profile page.

In terms of the features listed by Heller, I will here list the possible utilization of Wikidata:

  1. Portrait picture: The P18 property can record Wikimedia Commons image related to a researcher. For instance, you can see a nice photo of neuroimaging professor Russ Poldrack.
  2. Researchers alternative names: This is possible with the alias functionality in Wikidata. Poldrack is presently recorded with the canonical label “Russell A. Poldrack” and the alternative names “Russell A Poldrack”, “R. A. Poldrack”, “Russ Poldrack” and “R A Poldrack”. It is straightforward to add more variations
  3. IDs/profiles in other systems: There are absolutely loads of these links in Wikidata. To name a few deep linking posibilities: Twitter, Google Scholar, VIAF, ISNI, ORCID, ResearchGate, GitHub and Scopus. Wikidata is very strong in interlinking databases.
  4. Papers and similar: Papers are presented as items in Wikidata and these items can link to the author via P50. The reverse link is possible with a SPARQL query. Futhermore, on the researcher’s items it is possible to list main works with the appropriate property. Full texts can be linked with the P953 property. PDF of papers with an appropriate compatible license can be uploaded to Wikimedia Commons and/or included in Wikisource.
  5. Uncommon research product: I am not sure what this is, but the developer of software services is recorded in Wikidata. For instance, for the neuroinformatics database OpenfMRI it is specified that Poldrack is the creator. Backlinks are possible with SPARQL queries.
  6. Grants, third party funding. Well there is a sponsor property but how it should be utilized for researchers is not clear. With the property, you can specify that paper or research project were funded by an entity. For the paper The Center for Integrated Molecular Brain Imaging (Cimbi) database you can see that it is funded by the Lundbeck Foundation and Rigshospitalet.
  7. Current institution: Yes. Employer and affiliation property is there for you. You can see an example of an incomplete list of people affiliated with research sections at my department, DTU Compute, here, – automagically generated by the Magnus Manske’s Listeria tool.
  8. Former employers, education etc.: Yes. There is a property for employer and for affiliation and for education. With qualifiers you can specify the dates of employment.
  9. Self assigned keywords: Well, as a Wikidata contributor you can create new items and you can use these items for specifying field of work of to label you paper with main theme.
  10. Concept from controlled vocabulary: Whether Wikidata is a controlled vocabulary is up for discussion. Wikidata items can be linked to controlled vocabularies, e.g., Dewey’s, so there you can get some controlness. For instance, the concept “engineer” in Wikidata is linked the BNCF, NDL, GND, ROME, LCNAF, BNF and FAST.
  11. Social graph of followers/friends: No, that is really not possible on Wikidata.
  12. Social graph of coauthors: Yes, that is possible. With Jonas Kress’ work on D3 enabling graph rendering you got on-the-fly graph rendering in the Wikidata Query Service. You can see my coauthor graph here (it is wobbly at the moment, there is some D3 parameter that need a tweak).
  13. Citation/attention metadata from platform itself: No, I don’t think so. You can get page view data from somewhere on the Wikimedia sites. You can also count the number of citations on-the-fly, – to an author, to a paper, etc.
  14. Citation/attention metadata from other sources: No, not really.
  15. Comprehensive search to match/include own papers: Well, perhaps not. Or perhaps. Magnus Manske’s sourcemd and quickstatement tools allow you to copy-paste a PMID or DOI in a form field press two buttons to grap bibliographic information from PubMed and a DOI source. One-click full paper upload is not well-supported, – to my knowledge. Perhaps Daniel Mietchen knows something about this.
  16. Forums, Q&A, etc.: Well, yes and no. You can use the discussion pages on Wikidata, but these pages are perhaps mostly for discussion of editing, rather than the content of the described item. Perhaps Wikiversity could be used.
  17. Deposit own papers: You can upload appropriately licensed papers to Wikimedia Commons or perhaps Wikisource. Then you can link them from Wikidata.
  18. Research administration tools: No.
  19. Reuse of data from outside the service: You better believe! Although Wikidata is there to be used, a mass download from the Wikidata Query Service can run into timeout problems. To navigate the structure of individual Wikidata item, you need programming skills, – at least for the moment. If you are really desperate you can download the Wikidata dump and Blazegraph and try to setup your own SPARQL server.


Altmetrics for a department

Posted on

Suppose you want to measure the performance of individual researchers of a university department. Which variables can you get hold on and how relevant would they be to measure academic performance?

Here is my take on it:

  1. Google Scholar citations number. Google Scholar records total number of citations, h-index and i10-index as well as the numbers for a fixed period.
  2. Scopus citation numbers.
  3. Twitter. The number of tweets and the number of followers would be relevant.
    One issue here is that the number of tweets may not be relevant to the academic performance and it is also susceptible to manipulation. Interestingly there has been a comparison between Twitter numbers and standard citation counts with a coefficient between the two numbers named the Kardashian index.
  4. Wikidata and Wikipedia presence. Whether Wikidata has a item of the researcher, the number of articles of the researchers, the number of bytes they span, the number of articles recorded in Wikidata. There is an API to get these numbers, and – interestingly – Wikidata can record a range of other identifiers for Google Scholar, Scopus, Twitter, etc. which would make it a convenient open database for keeping track of researcher identifiers across sites of scientometric relevance.
    The number of citations in Wikipedia to the work of a researcher would be interesting to have, but is somewhat more difficult to automatically obtain.
    The numbers of Wikipedia and Wikidata are a bit manipulable.
  5. Stackoverflow/Stackexchange points in relevant areas. The question/answering sites under the Stackexchange umbrella have a range of cites that are of academic interest. In my area, e.g., Stackoverflow and Cross Validated.
  6. GitHub repositories and stars.
  7. Publication download counts. For instance, my department has a repository with papers and the backend keeps track of statistics. The most downloaded papers tend to be introductory or material and overviews.
  8. ResearchGate numbers: Publications, reads, citations and impact points.
  9. ResearcherID (Thomson Reuters) numbers: total articles in publication list, articles with citation data, sum of the time cited, average citations per article, h-index.
  10. Microsoft Academic Search numbers.
  11. Count in the dblp computer science bibliography (the Trier database).
  12. Count of listings in ArXiv.
  13. Counts in Semantic Scholar.
  14. ACM digital library counts.


Google Scholar used for spam?

Posted on Updated on

I have just received a citation alert from the Google Scholar system as I was cited in

Interestingly, the alert did not come from the First Monday journal directly but from a paper on (see the excerpt below). To me it seems that is abusing First Monday material on their site. Their URL redirects to This must be spam.

[HTML] Font Size Current Issue Atom logo
http : / />

D Geifman, DR Raban, R Sheizaf
Abstract Prediction Markets are a family of Internet–based social
computing applications,
which use market price to aggregate and reveal information and opinion
from dispersed
audiences. The considerable complexity of these markets inhibited the
full realization of *…*

When I last checked, Google Scholar redirected to the spam site. However, I cannot find the insurancetribe version among the indexed versions now :

Google scholar citations for Responsible Business in the Blogosphere project

Posted on Updated on

GS Year First author Title
35 2011 Finn Årup Nielsen A new ANEW: evaluation of a word list for sentiment analysis in microblogs
24 2011 Gerardo Patriotta Maintaining legitimacy: controversies, orders of worth and public justifications
19 2011 Annemette Leonhardt Kjærgaard Mediating identity: a study of media influence on organizational identity construction in a celebrity firm
16 2011 Lars Kai Hansen Good friends, bad news – affect and virality in Twitter
10 2012 Adam Arvidsson Value in informational capitalism and on the Internet
8 2010 Adam Arvidsson The ethical economy: new forms of value in the information society
7 2011 Adam Arvidsson Ethics and value in customer co-production
5 2012 Chitu Okoli The people’s encyclopedia under the gaze of the sages: a systematic review of scholarly research on Wikipedia
5 2013 Finn Årup Nielsen Wikipedia research and tools: review and comments
4 2011 Toke Jansen Hansen Non-parametric co-clustering of large scale sparse bipartite networks on the GPU
4 2011 Friederike Schultz Strategic framing in the BP crisis: a semantic network analysis of associative frames
1 2012 Michael Kai Petersen Cognitive semantic networks: emotional verbs throw a tantrum but don’t bite
1 2010 Michael Etter On relational capital in social media
1 2011 Mette Morsing State-owned enterprises: a corporatization of governments
0 2013 Elanor Colleoni CSR communication for organizational legitimacy in social media
0 2013 Anne Vestergaard Humanitarian appeal and the paradox of power
0 2011 Bjarne Ørum Wahlgreen Large scale topic modeling made practical
0 2012 Michael Kai Petersen On an emotional node: modeling sentiment in graphs of action verbs
0 ? Friederike Schultz The construction of corporate social responsibility in network society: a communication
0 2013 Adam Arvidsson The potential of consumer publics

I typed in many of the publications from our Responsible Business in the Blogosphere project into the Brede Wiki together with an identifier for Google Scholar for each publication. With a script (do not run code you obtained from a wiki!) I am able to collect the citation information from Google Scholar using the MediaWiki API, the categories and the templates. Here it is sorted according to number of citations.

Are you on Google Scholar?

Posted on Updated on


Google introduced (was it a few weeks ago) a new version of Google Scholar where you as a scientist can claim your name and your scientific papers that you have authored. Previously you could just search, e.g., to get your papers listed, see my previous blog post. However, if you got a common name, e.g., “J. Larsen” you would run into the problem that your publications would be entangled with the publications of other people called “J. Larsen” or “RJ Larsen” or “JC Larsen”, etc. With the new system it almost seems that Google does co-author mining so they are better to distinguish the different similar-named authors. Furthermore, – and most important – with a Google Scholar account you can claim your papers which solves the ambiguity problem, – and you can add and merge papers. Editing functionality was already present in CiteSeer long ago (if I remember correctly) and in Microsoft Academic Search you can also do editing of the publication list.

You can see my Google Scholar account here. By a strange coincidence I have found that my number of citations is presently exactly the same as one of my co-authors, Cyril Goutte: 1668.

The new Google Scholar functionality seems not to be that good in discovering new relevant papers, e.g., those papers that cite you. There the old fashion Google Scholar email alert seems better. What is does provide is a nice overview for h-index junkies. The number is automatically computed and makes Google Scholar a serious competitor the the pay-walled ISI Web of Science.