SPARQL

Find titles of all works published by DTU Cognitive Systems in 2017

Posted on Updated on

Find titles of all works published by DTU Cognitive Systems in 2017! How difficult can that be? To identify all titles of works from a research organization? With Wikidata and the Wikidata Query Service (WDQS) at hand it shouldn’t be that difficult to do? Nevertheless, I ran into a few hatches:

  1. There is what we can call the “Nathan Churchill Problem”: Nathan Churchill was at one point affiliated with our research section Cognitive Systems and wrote papers, e.g., together with our Morten Mørup. One paper clearly identifies him as affiliated with our section. Searching the DTU website yields no homepage for him though. He is now at St. Michael’s Hospital, Toronto according to a newer paper. So is he no longer affiliated with the Cognitive Systems section? That’s somewhat difficult to establish with credible and citable sources. If he is not, then any simple SPARQL query on the WDQS for Cognitive Systems papers will yield his new papers which shouldn’t be counted as Cognitive Systems section papers. If we could point to a source that indicates whether his affiliation at our section is stopped we could add a qualifier to the P1416 property in his Wikidata entry and extend the SPARQL query. What I ended up doing, was to explicitly filter out two of Churchill’s publications with the ugly line “FILTER(?work != wd:Q42595201 && ?work != wd:Q36384548)“. The problem is of course not just confined to Churchill. For instance, Scholia currently lists new publications by our Søren Hauberg at the Scholia page for DIKU, – a department where he has previously been affiliated. We discussed the affiliation problem a bit in the Scholia paper, see page 253 (page 17).
  2. Datetime datatype conversion with xsd:dateTime. The filter on date is with this line: “FILTER(?publication_datetime >= "2017-01-01"^^xsd:dateTime)“. Something like “FILTER(?publication_datetime >= xsd:dateTime(2017))” does not work.
  3. Missing data. It is difficult to establish how complete the Wikidata listing is for our section with respect to publications. Scraping Google Scholar, PubMed and our local university database of publications could be a possibility, but this is far from streamlined with the tools I have developed.

The full query is listed below and the result is available from this link. Currently, 48 results are returned.

#defaultView:Table
SELECT ?workLabel 
WITH {
  SELECT 
    ?work (MIN(?publication_datetime) AS ?datetime)
  WHERE {
    # Find CogSys work
    ?researcher wdt:P108 | wdt:P463 | wdt:P1416/wdt:P361* wd:Q24283660 .
    ?work wdt:P50 ?researcher .
    ?work wdt:P31 wd:Q13442814 .
    
    # Nathan Churchill seems not longer to be affiliated!?
    FILTER(?work != wd:Q42595201 && ?work != wd:Q36384548)
    
    # Filter to year 2017
    ?work wdt:P577 ?publication_datetime .
    FILTER(?publication_datetime >= "2017-01-01"^^xsd:dateTime)
  }
  GROUP BY ?work 
} AS %results
WHERE {
  INCLUDE %results
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en,da,de,es,fr,jp,nl,nl,ru,zh". }
}

 

Advertisements

Some statistics on scholarly data in Wikidata

Posted on Updated on

The Wikicite initiative have spawned a lot of work on bibliographic/source information in Wikidata. Particularly scholarly bibliographic information has been added to Wikidata. Recently James Hare announced that we have over 3 million citations recorded in Wikidata, – mostly due to automated additions made by Hare himself.

With the tools of Magnus Manske and James Hare that are presently central to the growth of scholarly bibliographic data on Wikidata, we do not get a direct link to the authors items of Wikidata. Such information presently needs to be added manually or in a semi-automated fashion. Sponsor/funding information is neither added automatically, – except for a US organization where James Hare added this information.

So how much data do we have in Wikidata when we ask if the data is linked to other Wikidata items? Below are a few queries to the Wikidata Query Service that attempt to answer some aspects of this question.

Scientific articles

How many items do we have in Wikidata that describe a scientific article and that is linked to an author item?

SELECT (COUNT(DISTINCT ?work) AS ?count)
WHERE {
  ?work wdt:P31 wd:Q13442814 .
  ?work wdt:P50 ?author .
}

The query returns 45’253.

How many scientific articles with one or more author items and no author name string (indicating that the author linking may be complete).

SELECT (COUNT(DISTINCT ?work) AS ?count)
WHERE {
  ?work wdt:P31 wd:Q13442814 .
  ?work wdt:P50 ?author .
  FILTER NOT EXISTS { ?work wdt:P2093 ?authorname }
}

This query gives 3’567.

How many items do we have in Wikidata that is claimed to be a scientific article?

SELECT (COUNT(DISTINCT ?work) AS ?count)
WHERE {
  ?work wdt:P31 wd:Q13442814 .
}

This query gives 677’630.

Scientific authors

How many authors are in Wikidata that have written a scientific article?

SELECT (COUNT(DISTINCT ?author) AS ?count)
WHERE {
  ?work wdt:P31 wd:Q13442814 .
  ?work wdt:P50 ?author .
}

The query returns 10’193.

How many authors are in Wikidata that have written a scientific article and where the gender is indicated?

SELECT (COUNT(DISTINCT ?author) AS ?count)
WHERE {
  ?work wdt:P31 wd:Q13442814 .
  ?work wdt:P50 ?author .
  ?author wdt:P21 ?gender .
}

This query gives 8’853.

How many authors are there in Wikidata that have written a scientific article and where the scientific article is recorded having made one or more citations.

SELECT (COUNT(DISTINCT ?author) AS ?count)
WHERE {
  ?work wdt:P31 wd:Q13442814 .
  ?work wdt:P50 ?author .
  ?work wdt:P2860 ?cited_work .
}

This query returns 6’586.

How many authors are there in Wikidata that have written a scientific article and where the scientific article is recorded having made one or more citations and the cited work is recorded with one or more author items.

SELECT (COUNT(DISTINCT ?author) AS ?count)
WHERE {
  ?work wdt:P31 wd:Q13442814 .
  ?work wdt:P50 ?author .
  ?work wdt:P2860 ?cited_work .
  ?cited_work wdt:P50 ?cited_author .
}

This query returns 5’614.

How many authors are there in Wikidata that have written a scientific article and where the scientific article is recorded having made one or more citations and the cited work is recorded with one or more author items and where the genders of both the citing and the cited author are known.

SELECT (COUNT(DISTINCT ?author) AS ?count)
WHERE {
  ?work wdt:P31 wd:Q13442814 .
  ?work wdt:P50 ?author .
  ?work wdt:P2860 ?cited_work .
  ?cited_work wdt:P50 ?cited_author .
  ?author wdt:P21 ?gender .
  ?cited_author wdt:P21 ?cited_gender .
}

This query gives 4,730.

How many authors are there in Wikidata that have written a scientific article and where the scientific article is recorded having made one or more citations and the cited work is recorded with one or more author items and where the genders of both the citing and the cited author are known and where there is no author name string in neither the work nor the cited work (indicating that the work and the cited work may be completely linked with respect to author name.

SELECT (COUNT(DISTINCT ?author) AS ?count)
WHERE {
  ?work wdt:P31 wd:Q13442814 .
  ?work wdt:P50 ?author .
  ?work wdt:P2860 ?cited_work .
  ?cited_work wdt:P50 ?cited_author .
  ?author wdt:P21 ?gender .
  ?cited_author wdt:P21 ?cited_gender .
  FILTER NOT EXISTS { ?work wdt:P2093 ?authorname }
  FILTER NOT EXISTS { ?cited_work wdt:P2093 ?cited_authorname }
}

This query gives only 551.

Sponsor/funders

Sponsors of scientific articles ordered by number of citations.

SELECT ?number_of_citations ?sponsorLabel
WITH {
  SELECT (COUNT(?citing_work) AS ?number_of_citations) ?sponsor
  WHERE {
    ?work wdt:P859 ?sponsor .
    ?work wdt:P31 wd:Q13442814 .
    ?citing_work wdt:P2860 ?work .
  }
  GROUP BY ?sponsor
} AS %result
WHERE {
  INCLUDE %result
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
ORDER BY DESC(?number_of_citations)
LIMIT 5

This query gives National Institute for Occupational Safety and Health, Lundbeck Foundation, The Danish Council for Strategic Research, National Institute of Allergy and Infectious Diseases, University of Wisconsin–Madison.

How to quickly generate word analogy datasets with Wikidata

Posted on Updated on

One popular task in computational linguistics/natural language processing is the word analogy task: Copenhagen is to Denmark as Berlin is to …?

With queries to Wikidata Query Service (WDQS) it is reasonably easy to generate word analogy datasets in whatever (Wikidata-supported) language you like. For instance, for capitals and countries, a WDQS SPARQL query that returns results in Danish could go like this:

select
  ?country1Label ?capital1Label
  ?country2Label ?capital2Label
where { 
  ?country1 wdt:P36 ?capital1 .
  ?country1 wdt:P463 wd:Q1065 .
  ?country1 wdt:P1082 ?population1 .
  filter (?population1 > 5000000)
  ?country2 wdt:P36 ?capital2 .
  ?country2 wdt:P463 wd:Q1065 .
  ?country2 wdt:P1082 ?population2 .
  filter (?population2 > 5000000)
  filter (?country1 != ?country2)
  service wikibase:label
    { bd:serviceParam wikibase:language "da". }  
} 
limit 1000

Follow this link to get to the query and press “Run” to get the results. It is possible to download the table as CSV-formatted (see under “Download”). One issue to note that you have multiple entries for countries with multiple capital cities, e.g., Sydafrika (South Africa) is listed with Pretoria, Kapstaden (Cape Town) and Bloemfontein.