The coverage of different researcher profile sites and their citation statistics varies. Google Scholar seems to be the site with the largest coverage, – it even crawls and indexes my slides. The open Wikidata is far from there, but may be the only one with machine-readable free access and advanced search.
Below is the citation statistics in the form of the h-index from five different services.
|18||Web of Science|
Semantic Scholar does not give an overview of the citation statistics, and the count is somewhat hidden on the individual article pages. I attempted as best as I could to determine the value, but it might be incorrect.
I made a similar statistics on 8 May 2017 and reported it on the slides Wikicite (page 42). During the one and a half month since that count, the statistics for Scopus has change from 20 to 22.
Semantic Scholar is run by the Allen Institute for Artificial Intelligence, a non-profit research institute, so they may be interested in opening up their data for search. An API does, to my knowledge, not (yet?) exist, but they have a gentle robots.txt. It is also possible to download the full Semantic Scholar corpus from http://labs.semanticscholar.org/corpus/. (Thanks to Vladimir Alexiev for bringing my attention to this corpus).
Recently Lambert Heller wrote an overview piece on websites for scholarly profile pages: “What will the scholarly profile page of the future look like? Provision of metadata is enabling experimentation“. There he tabularized the features of the various online sites having scholarly profile pages. These sites include (with links to my entries): ORCID, ResearchGate, Mendeley, Pure and VIVO (don’t know these two), Google Scholar and Impactstory. One site missing from the equation is Wikidata. It can produce scholarly profile pages too. The default Wikidata editing interface may not present the data in a nice way – Magnus Manske’s Reasonator – better, but very much of the functionality is there to make a scholarly profile page.
In terms of the features listed by Heller, I will here list the possible utilization of Wikidata:
- Portrait picture: The P18 property can record Wikimedia Commons image related to a researcher. For instance, you can see a nice photo of neuroimaging professor Russ Poldrack.
- Researchers alternative names: This is possible with the alias functionality in Wikidata. Poldrack is presently recorded with the canonical label “Russell A. Poldrack” and the alternative names “Russell A Poldrack”, “R. A. Poldrack”, “Russ Poldrack” and “R A Poldrack”. It is straightforward to add more variations
- IDs/profiles in other systems: There are absolutely loads of these links in Wikidata. To name a few deep linking posibilities: Twitter, Google Scholar, VIAF, ISNI, ORCID, ResearchGate, GitHub and Scopus. Wikidata is very strong in interlinking databases.
- Papers and similar: Papers are presented as items in Wikidata and these items can link to the author via P50. The reverse link is possible with a SPARQL query. Futhermore, on the researcher’s items it is possible to list main works with the appropriate property. Full texts can be linked with the P953 property. PDF of papers with an appropriate compatible license can be uploaded to Wikimedia Commons and/or included in Wikisource.
- Uncommon research product: I am not sure what this is, but the developer of software services is recorded in Wikidata. For instance, for the neuroinformatics database OpenfMRI it is specified that Poldrack is the creator. Backlinks are possible with SPARQL queries.
- Grants, third party funding. Well there is a sponsor property but how it should be utilized for researchers is not clear. With the property, you can specify that paper or research project were funded by an entity. For the paper The Center for Integrated Molecular Brain Imaging (Cimbi) database you can see that it is funded by the Lundbeck Foundation and Rigshospitalet.
- Current institution: Yes. Employer and affiliation property is there for you. You can see an example of an incomplete list of people affiliated with research sections at my department, DTU Compute, here, – automagically generated by the Magnus Manske’s Listeria tool.
- Former employers, education etc.: Yes. There is a property for employer and for affiliation and for education. With qualifiers you can specify the dates of employment.
- Self assigned keywords: Well, as a Wikidata contributor you can create new items and you can use these items for specifying field of work of to label you paper with main theme.
- Concept from controlled vocabulary: Whether Wikidata is a controlled vocabulary is up for discussion. Wikidata items can be linked to controlled vocabularies, e.g., Dewey’s, so there you can get some controlness. For instance, the concept “engineer” in Wikidata is linked the BNCF, NDL, GND, ROME, LCNAF, BNF and FAST.
- Social graph of followers/friends: No, that is really not possible on Wikidata.
- Social graph of coauthors: Yes, that is possible. With Jonas Kress’ work on D3 enabling graph rendering you got on-the-fly graph rendering in the Wikidata Query Service. You can see my coauthor graph here (it is wobbly at the moment, there is some D3 parameter that need a tweak).
- Citation/attention metadata from platform itself: No, I don’t think so. You can get page view data from somewhere on the Wikimedia sites. You can also count the number of citations on-the-fly, – to an author, to a paper, etc.
- Citation/attention metadata from other sources: No, not really.
- Comprehensive search to match/include own papers: Well, perhaps not. Or perhaps. Magnus Manske’s sourcemd and quickstatement tools allow you to copy-paste a PMID or DOI in a form field press two buttons to grap bibliographic information from PubMed and a DOI source. One-click full paper upload is not well-supported, – to my knowledge. Perhaps Daniel Mietchen knows something about this.
- Forums, Q&A, etc.: Well, yes and no. You can use the discussion pages on Wikidata, but these pages are perhaps mostly for discussion of editing, rather than the content of the described item. Perhaps Wikiversity could be used.
- Deposit own papers: You can upload appropriately licensed papers to Wikimedia Commons or perhaps Wikisource. Then you can link them from Wikidata.
- Research administration tools: No.
- Reuse of data from outside the service: You better believe! Although Wikidata is there to be used, a mass download from the Wikidata Query Service can run into timeout problems. To navigate the structure of individual Wikidata item, you need programming skills, – at least for the moment. If you are really desperate you can download the Wikidata dump and Blazegraph and try to setup your own SPARQL server.
Suppose you want to measure the performance of individual researchers of a university department. Which variables can you get hold on and how relevant would they be to measure academic performance?
Here is my take on it:
- Google Scholar citations number. Google Scholar records total number of citations, h-index and i10-index as well as the numbers for a fixed period.
- Scopus citation numbers.
- Twitter. The number of tweets and the number of followers would be relevant.
One issue here is that the number of tweets may not be relevant to the academic performance and it is also susceptible to manipulation. Interestingly there has been a comparison between Twitter numbers and standard citation counts with a coefficient between the two numbers named the Kardashian index.
- Wikidata and Wikipedia presence. Whether Wikidata has a item of the researcher, the number of articles of the researchers, the number of bytes they span, the number of articles recorded in Wikidata. There is an API to get these numbers, and – interestingly – Wikidata can record a range of other identifiers for Google Scholar, Scopus, Twitter, etc. which would make it a convenient open database for keeping track of researcher identifiers across sites of scientometric relevance.
The number of citations in Wikipedia to the work of a researcher would be interesting to have, but is somewhat more difficult to automatically obtain.
The numbers of Wikipedia and Wikidata are a bit manipulable.
- Stackoverflow/Stackexchange points in relevant areas. The question/answering sites under the Stackexchange umbrella have a range of cites that are of academic interest. In my area, e.g., Stackoverflow and Cross Validated.
- GitHub repositories and stars.
- Publication download counts. For instance, my department has a repository with papers and the backend keeps track of statistics. The most downloaded papers tend to be introductory or material and overviews.
- ResearchGate numbers: Publications, reads, citations and impact points.
- ResearcherID (Thomson Reuters) numbers: total articles in publication list, articles with citation data, sum of the time cited, average citations per article, h-index.
- Microsoft Academic Search numbers.
- Count in the dblp computer science bibliography (the Trier database).
- Count of listings in ArXiv.
- Counts in Semantic Scholar.
- ACM digital library counts.
I have just received a citation alert from the Google Scholar system as I was cited in http://firstmonday.org/ojs/index.php/fm/article/view/3203/3019
Interestingly, the alert did not come from the First Monday journal directly but from a paper on firstmonday.insurancetribe.com (see the excerpt below). To me it seems that insurancetribe.com is abusing First Monday material on their site. Their URL redirects to homesecurityfix.com. This must be spam.
[HTML] Font Size Current Issue Atom logo
http : / / scholar.google.com/scholar_url?url=http://firstmonday.insurancetribe.com/ojs/index.php/fm/article/view/3203/3019>
D Geifman, DR Raban, R Sheizaf
Abstract Prediction Markets are a family of Internet–based social
which use market price to aggregate and reveal information and opinion
audiences. The considerable complexity of these markets inhibited the
full realization of *…*
When I last checked, Google Scholar redirected to the spam site. However, I cannot find the insurancetribe version among the indexed versions now :
I typed in many of the publications from our Responsible Business in the Blogosphere project into the Brede Wiki together with an identifier for Google Scholar for each publication. With a script (do not run code you obtained from a wiki!) I am able to collect the citation information from Google Scholar using the MediaWiki API, the categories and the templates. Here it is sorted according to number of citations.
Google introduced (was it a few weeks ago) a new version of Google Scholar where you as a scientist can claim your name and your scientific papers that you have authored. Previously you could just search, e.g., to get your papers listed, see my previous blog post. However, if you got a common name, e.g., “J. Larsen” you would run into the problem that your publications would be entangled with the publications of other people called “J. Larsen” or “RJ Larsen” or “JC Larsen”, etc. With the new system it almost seems that Google does co-author mining so they are better to distinguish the different similar-named authors. Furthermore, – and most important – with a Google Scholar account you can claim your papers which solves the ambiguity problem, – and you can add and merge papers. Editing functionality was already present in CiteSeer long ago (if I remember correctly) and in Microsoft Academic Search you can also do editing of the publication list.
The new Google Scholar functionality seems not to be that good in discovering new relevant papers, e.g., those papers that cite you. There the old fashion Google Scholar email alert seems better. What is does provide is a nice overview for h-index junkies. The number is automatically computed and makes Google Scholar a serious competitor the the pay-walled ISI Web of Science.