science

My h-index as of June 2017: Coverage of researcher profile sites

Posted on Updated on

The coverage of different researcher profile sites and their citation statistics varies. Google Scholar seems to be the site with the largest coverage, – it even crawls and indexes my slides. The open Wikidata is far from there, but may be the only one with machine-readable free access and advanced search.

Below is the citation statistics in the form of the h-index from five different services.

h Service
28 Google Scholar
27 ResearchGate
22 Scopus
22(?) Semantic Scholar
18 Web of Science
8 Wikidata

Semantic Scholar does not give an overview of the citation statistics, and the count is somewhat hidden on the individual article pages. I attempted as best as I could to determine the value, but it might be incorrect.

I made a similar statistics on 8 May 2017 and reported it on the slides Wikicite (page 42). During the one and a half month since that count, the statistics for Scopus has change from 20 to 22.

Semantic Scholar is run by the Allen Institute for Artificial Intelligence, a non-profit research institute, so they may be interested in opening up their data for search. An API does, to my knowledge, not (yet?) exist, but they have a gentle robots.txt. It is also possible to download the full Semantic Scholar corpus from http://labs.semanticscholar.org/corpus/. (Thanks to Vladimir Alexiev for bringing my attention to this corpus).

When does an article cite you?

Posted on Updated on

Google Scholar alerted me to a recent citation to my work from Teacher-Student Relationships, Satisfaction, and Achievement among Art and Design College Students in Macau, a paper published in Journal of Education and Practice of to me unknown repute.

In the references, I see a listing of Persistence of Web References in Scientific Research where I was among the coauthors. So in which context is this paper cited? I seems strange that an article about link rot is cited by an article about teacher-student relationships… Indeed I cannot find the reference in body text when I search on the first author’s last name (“lawrence”).

Indeed several other items in references listing I cannot find: Joe Smith’s “One of Volvo’s core values”, Strunk et al.’s “The element of style” and Van der Geer’s “The art of writing a scientific article”. Notable is it that the first four references is out of order in the otherwise alphabetic sorted list of references, so there must be an error. Perhaps it is an error arising from a copy-and-paste typo?

In this case, I would say, that even though being listed, I am not actually cited by the article. The “fact” of whether it is a citation or not is important to discuss if we want to record the citation in Wikidata, where “Persistence of Web References in Scientific Research” is recorded with the item Q21012586, see also the Scholia entry. Possible we could record the erroneous citation and the use the Wikidata deprecated rank facility: “Value is known to be wrong but (used to be) commonly believed”.

Some statistics on scholarly data in Wikidata

Posted on Updated on

The Wikicite initiative have spawned a lot of work on bibliographic/source information in Wikidata. Particularly scholarly bibliographic information has been added to Wikidata. Recently James Hare announced that we have over 3 million citations recorded in Wikidata, – mostly due to automated additions made by Hare himself.

With the tools of Magnus Manske and James Hare that are presently central to the growth of scholarly bibliographic data on Wikidata, we do not get a direct link to the authors items of Wikidata. Such information presently needs to be added manually or in a semi-automated fashion. Sponsor/funding information is neither added automatically, – except for a US organization where James Hare added this information.

So how much data do we have in Wikidata when we ask if the data is linked to other Wikidata items? Below are a few queries to the Wikidata Query Service that attempt to answer some aspects of this question.

Scientific articles

How many items do we have in Wikidata that describe a scientific article and that is linked to an author item?

SELECT (COUNT(DISTINCT ?work) AS ?count)
WHERE {
  ?work wdt:P31 wd:Q13442814 .
  ?work wdt:P50 ?author .
}

The query returns 45’253.

How many scientific articles with one or more author items and no author name string (indicating that the author linking may be complete).

SELECT (COUNT(DISTINCT ?work) AS ?count)
WHERE {
  ?work wdt:P31 wd:Q13442814 .
  ?work wdt:P50 ?author .
  FILTER NOT EXISTS { ?work wdt:P2093 ?authorname }
}

This query gives 3’567.

How many items do we have in Wikidata that is claimed to be a scientific article?

SELECT (COUNT(DISTINCT ?work) AS ?count)
WHERE {
  ?work wdt:P31 wd:Q13442814 .
}

This query gives 677’630.

Scientific authors

How many authors are in Wikidata that have written a scientific article?

SELECT (COUNT(DISTINCT ?author) AS ?count)
WHERE {
  ?work wdt:P31 wd:Q13442814 .
  ?work wdt:P50 ?author .
}

The query returns 10’193.

How many authors are in Wikidata that have written a scientific article and where the gender is indicated?

SELECT (COUNT(DISTINCT ?author) AS ?count)
WHERE {
  ?work wdt:P31 wd:Q13442814 .
  ?work wdt:P50 ?author .
  ?author wdt:P21 ?gender .
}

This query gives 8’853.

How many authors are there in Wikidata that have written a scientific article and where the scientific article is recorded having made one or more citations.

SELECT (COUNT(DISTINCT ?author) AS ?count)
WHERE {
  ?work wdt:P31 wd:Q13442814 .
  ?work wdt:P50 ?author .
  ?work wdt:P2860 ?cited_work .
}

This query returns 6’586.

How many authors are there in Wikidata that have written a scientific article and where the scientific article is recorded having made one or more citations and the cited work is recorded with one or more author items.

SELECT (COUNT(DISTINCT ?author) AS ?count)
WHERE {
  ?work wdt:P31 wd:Q13442814 .
  ?work wdt:P50 ?author .
  ?work wdt:P2860 ?cited_work .
  ?cited_work wdt:P50 ?cited_author .
}

This query returns 5’614.

How many authors are there in Wikidata that have written a scientific article and where the scientific article is recorded having made one or more citations and the cited work is recorded with one or more author items and where the genders of both the citing and the cited author are known.

SELECT (COUNT(DISTINCT ?author) AS ?count)
WHERE {
  ?work wdt:P31 wd:Q13442814 .
  ?work wdt:P50 ?author .
  ?work wdt:P2860 ?cited_work .
  ?cited_work wdt:P50 ?cited_author .
  ?author wdt:P21 ?gender .
  ?cited_author wdt:P21 ?cited_gender .
}

This query gives 4,730.

How many authors are there in Wikidata that have written a scientific article and where the scientific article is recorded having made one or more citations and the cited work is recorded with one or more author items and where the genders of both the citing and the cited author are known and where there is no author name string in neither the work nor the cited work (indicating that the work and the cited work may be completely linked with respect to author name.

SELECT (COUNT(DISTINCT ?author) AS ?count)
WHERE {
  ?work wdt:P31 wd:Q13442814 .
  ?work wdt:P50 ?author .
  ?work wdt:P2860 ?cited_work .
  ?cited_work wdt:P50 ?cited_author .
  ?author wdt:P21 ?gender .
  ?cited_author wdt:P21 ?cited_gender .
  FILTER NOT EXISTS { ?work wdt:P2093 ?authorname }
  FILTER NOT EXISTS { ?cited_work wdt:P2093 ?cited_authorname }
}

This query gives only 551.

Sponsor/funders

Sponsors of scientific articles ordered by number of citations.

SELECT ?number_of_citations ?sponsorLabel
WITH {
  SELECT (COUNT(?citing_work) AS ?number_of_citations) ?sponsor
  WHERE {
    ?work wdt:P859 ?sponsor .
    ?work wdt:P31 wd:Q13442814 .
    ?citing_work wdt:P2860 ?work .
  }
  GROUP BY ?sponsor
} AS %result
WHERE {
  INCLUDE %result
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
ORDER BY DESC(?number_of_citations)
LIMIT 5

This query gives National Institute for Occupational Safety and Health, Lundbeck Foundation, The Danish Council for Strategic Research, National Institute of Allergy and Infectious Diseases, University of Wisconsin–Madison.

How to quickly generate word analogy datasets with Wikidata

Posted on Updated on

One popular task in computational linguistics/natural language processing is the word analogy task: Copenhagen is to Denmark as Berlin is to …?

With queries to Wikidata Query Service (WDQS) it is reasonably easy to generate word analogy datasets in whatever (Wikidata-supported) language you like. For instance, for capitals and countries, a WDQS SPARQL query that returns results in Danish could go like this:

select
  ?country1Label ?capital1Label
  ?country2Label ?capital2Label
where { 
  ?country1 wdt:P36 ?capital1 .
  ?country1 wdt:P463 wd:Q1065 .
  ?country1 wdt:P1082 ?population1 .
  filter (?population1 > 5000000)
  ?country2 wdt:P36 ?capital2 .
  ?country2 wdt:P463 wd:Q1065 .
  ?country2 wdt:P1082 ?population2 .
  filter (?population2 > 5000000)
  filter (?country1 != ?country2)
  service wikibase:label
    { bd:serviceParam wikibase:language "da". }  
} 
limit 1000

Follow this link to get to the query and press “Run” to get the results. It is possible to download the table as CSV-formatted (see under “Download”). One issue to note that you have multiple entries for countries with multiple capital cities, e.g., Sydafrika (South Africa) is listed with Pretoria, Kapstaden (Cape Town) and Bloemfontein.

How much does it cost to buy all my scientific articles?

Posted on Updated on

How much does it cost to buy all my scientific articles?

Disregarding the slight difference in exchange rate between the current Euro and USD the answer is around 1’200 USD/Euros. That is the amount of money I would have to pay to download all the scientific articles I have been involved in, – if I did not have access to a university library with subscription. I have signed off the copyright to many articles to a long string of publishers, Elsevier, Wiley, IEEE, Springer, etc., and I no longer control the publication.

I have added a good number of my articles to Wikidata including the price for each article. The SPARQL-based Wikidata Query Service is able to generate a table with the price information, see here. The total sum is also available after a slight modication of the SPARQL query.

The Wikidata Query Service can also generate plots, for instance, of the price per page as a function of the publication date (choose “Graph builder” under “Display”). In the plot below the unit (currency) is mixed USD and Euro. (there seem to be  an issue with the shapes in the legend)

article-prices

Something like 3 to 4 USD/Euros per page seems to what an eyesight averaging comes to.

Among the most expensive articles are the ones from the journal Neuroinformatics published by Springer: 43.69 Euros for each article. Wiley articles cost 38 USD and the Elsevier articles around 36 USD. The Association for Computing Machinery sells their articles for only 15 USD. A bargain.

It may be difficult to find the price of the articles. Science claims that “Science research is available free with registration one year after initial publication.” However, I was not able to get to the full text for The Real Power of Artificial Markets on the Science website. On one page you can stubble onto this: “Purchase Access to this Article for 1 day for US$30.00” and that is what I put into Wikidata. The article is fairly short so this price makes it the priciest article per page.

science-price

I ought to write something discerning about the state of scientific publishing. However, I will instead redirect you to a recent blog post by Tal Yarkoni.

“Overzealous business types”?

Posted on Updated on

The University of Copenhagen and its problematic dismissal of notable scientist Hans Thybo have now landed in an editorial of Nature: “Corporate culture spreads to Scandinavia“. Their concluding claim is that “the threat is the colonization of universities by overzealous business types” (against academic freedom).

Interestingly, though the majority of the university board members is required by law to be from outside the university (not necessarily business), the university management has usually an academic background. And this is also the case for the management around Hans Thybo:

  1. The head of department for Hans Thybo is Claus Beier, see “Hans Thybos institutleder om fyringssagen“. Beier is a PhD and a professor with a long series of publications in climate change as can be studied on Google Scholar.
  2. Dean is John Renner Hansen, see “KU spildte ½ million på konsulentundersøgelse af Thybo for misbrug af forskningsmidler“. He is also researcher and claims to have “Approximately 600 publications in international refereed journals”
  3. Head of the university is Ralf Hemmingsen that I know as a notable researcher in psychiatry.

I am not convinced by the arguments in the Nature editorial which sets up “business types” against academics. I think that the case should rather be seen against the background of the case with Milena Penkowa and another story around the possible abuse of research funds on the Copenhagen University Hospital, see “Ny sag om fusk med penge til forskning“.

The Wikidata scholarly profile page

Posted on Updated on

my_coauthors

Recently Lambert Heller wrote an overview piece on websites for scholarly profile pages: “What will the scholarly profile page of the future look like? Provision of metadata is enabling experimentation“. There he tabularized the features of the various online sites having scholarly profile pages. These sites include (with links to my entries): ORCID, ResearchGate, Mendeley, Pure and VIVO (don’t know these two), Google Scholar and Impactstory. One site missing from the equation is Wikidata. It can produce scholarly profile pages too. The default Wikidata editing interface may not present the data in a nice way – Magnus Manske’s Reasonator – better, but very much of the functionality is there to make a scholarly profile page.

In terms of the features listed by Heller, I will here list the possible utilization of Wikidata:

  1. Portrait picture: The P18 property can record Wikimedia Commons image related to a researcher. For instance, you can see a nice photo of neuroimaging professor Russ Poldrack.
  2. Researchers alternative names: This is possible with the alias functionality in Wikidata. Poldrack is presently recorded with the canonical label “Russell A. Poldrack” and the alternative names “Russell A Poldrack”, “R. A. Poldrack”, “Russ Poldrack” and “R A Poldrack”. It is straightforward to add more variations
  3. IDs/profiles in other systems: There are absolutely loads of these links in Wikidata. To name a few deep linking posibilities: Twitter, Google Scholar, VIAF, ISNI, ORCID, ResearchGate, GitHub and Scopus. Wikidata is very strong in interlinking databases.
  4. Papers and similar: Papers are presented as items in Wikidata and these items can link to the author via P50. The reverse link is possible with a SPARQL query. Futhermore, on the researcher’s items it is possible to list main works with the appropriate property. Full texts can be linked with the P953 property. PDF of papers with an appropriate compatible license can be uploaded to Wikimedia Commons and/or included in Wikisource.
  5. Uncommon research product: I am not sure what this is, but the developer of software services is recorded in Wikidata. For instance, for the neuroinformatics database OpenfMRI it is specified that Poldrack is the creator. Backlinks are possible with SPARQL queries.
  6. Grants, third party funding. Well there is a sponsor property but how it should be utilized for researchers is not clear. With the property, you can specify that paper or research project were funded by an entity. For the paper The Center for Integrated Molecular Brain Imaging (Cimbi) database you can see that it is funded by the Lundbeck Foundation and Rigshospitalet.
  7. Current institution: Yes. Employer and affiliation property is there for you. You can see an example of an incomplete list of people affiliated with research sections at my department, DTU Compute, here, – automagically generated by the Magnus Manske’s Listeria tool.
  8. Former employers, education etc.: Yes. There is a property for employer and for affiliation and for education. With qualifiers you can specify the dates of employment.
  9. Self assigned keywords: Well, as a Wikidata contributor you can create new items and you can use these items for specifying field of work of to label you paper with main theme.
  10. Concept from controlled vocabulary: Whether Wikidata is a controlled vocabulary is up for discussion. Wikidata items can be linked to controlled vocabularies, e.g., Dewey’s, so there you can get some controlness. For instance, the concept “engineer” in Wikidata is linked the BNCF, NDL, GND, ROME, LCNAF, BNF and FAST.
  11. Social graph of followers/friends: No, that is really not possible on Wikidata.
  12. Social graph of coauthors: Yes, that is possible. With Jonas Kress’ work on D3 enabling graph rendering you got on-the-fly graph rendering in the Wikidata Query Service. You can see my coauthor graph here (it is wobbly at the moment, there is some D3 parameter that need a tweak).
  13. Citation/attention metadata from platform itself: No, I don’t think so. You can get page view data from somewhere on the Wikimedia sites. You can also count the number of citations on-the-fly, – to an author, to a paper, etc.
  14. Citation/attention metadata from other sources: No, not really.
  15. Comprehensive search to match/include own papers: Well, perhaps not. Or perhaps. Magnus Manske’s sourcemd and quickstatement tools allow you to copy-paste a PMID or DOI in a form field press two buttons to grap bibliographic information from PubMed and a DOI source. One-click full paper upload is not well-supported, – to my knowledge. Perhaps Daniel Mietchen knows something about this.
  16. Forums, Q&A, etc.: Well, yes and no. You can use the discussion pages on Wikidata, but these pages are perhaps mostly for discussion of editing, rather than the content of the described item. Perhaps Wikiversity could be used.
  17. Deposit own papers: You can upload appropriately licensed papers to Wikimedia Commons or perhaps Wikisource. Then you can link them from Wikidata.
  18. Research administration tools: No.
  19. Reuse of data from outside the service: You better believe! Although Wikidata is there to be used, a mass download from the Wikidata Query Service can run into timeout problems. To navigate the structure of individual Wikidata item, you need programming skills, – at least for the moment. If you are really desperate you can download the Wikidata dump and Blazegraph and try to setup your own SPARQL server.