Wikidata and ChatGPT

Posted on

ChatGPT has a tendency to hallucinate, i.e., write text that have no bearing in reality, inventing facts of various kinds, e.g., publications that do not exist and dates that are incorrect. Knowledge graphs, such as Wikidata, tends to record true statements, so perhaps Wikidata can support ChatGPT with specific facts.

A simple way to combine ChatGPT and Wikidata is to prompt ChatGPT to write SPARQL queries for the Wikidata Query Service at https://query.wikidata.org.

Here is my first prompt:

“Write a SPARQL query for the Wikidata Query Service returning publications in the Danish language ordering by publication date with the most recent publication presented first.”

The response is reasonable:

“Sure, here’s a SPARQL query that should do the trick:

SELECT ?publication ?publicationLabel ?date
WHERE {
  ?publication wdt:P31 wd:Q13442814 ; # find items that are instances of "publication"
               wdt:P407 wd:Q9035 ; # find items with a Danish language statement
               wdt:P577 ?date . # find items with a publication date statement
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
ORDER BY DESC(?date)

This query finds all publications that have a language statement of Danish (wdt:P407 wd:Q9035) and a publication date (wdt:P577). It also includes a SERVICE wikibase:label block to retrieve the English label for each publication.

The results are ordered by the publication date (ORDER BY DESC(?date)) with the most recent publication presented first.”

The line ‘wdt:P31 wd:Q13442814 ; # find items that are instances of “publication”‘ is somewhat incorrect though. Q13442814 is not publications, but scientific articles and it should – in my opinion – have used a path query with “subclass of” in that line. Otherwise it is fine. It is nicely formatted and commented and the variable names are fine. ChatGPT knows the Q-identifiers of some of the central properties and items, e.g., that Q9035 is the Danish language. And the generated SPARQL is syntactically correct. When copy-pasting the query to Wikidata Query Service, the service yields 28,735 rows with the screenshot of the top ones here:

Lets try a query that is more complicated:

“Write a SPARQL query to the Wikidata Query Service listing the employment history of Finn Årup Nielsen with employer and dates.”

The response is again syntactically correct, but the query is wrong

“Here’s a SPARQL query that retrieves the employment history of Finn Årup Nielsen with employer and dates from Wikidata:

SELECT ?employerLabel ?startDate ?endDate
WHERE {
  wd:Q18712960 wdt:P108 ?employer .
  ?employer wdt:P571 ?startDate .
  OPTIONAL { ?employer wdt:P576 ?endDate . }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
ORDER BY DESC(?startDate)

Explanation:

  • SELECT ?employerLabel ?startDate ?endDate: We want to retrieve the label of the employer, the start date and end date (if it exists) of the employment.
  • WHERE { wd:Q18712960 wdt:P108 ?employer . ?employer wdt:P571 ?startDate . OPTIONAL { ?employer wdt:P576 ?endDate . } }: We specify the conditions for the query. We use the Wikidata ID for Finn Årup Nielsen (wd:Q18712960) and the property P108 to retrieve the employer. We also use the property P571 to retrieve the start date of the employment. We use the OPTIONAL keyword to retrieve the end date (if it exists) using the property P576.
  • SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }: We use the SERVICE keyword to retrieve the label of the employer. We also specify that we want the label to be in English (en).
  • ORDER BY DESC(?startDate): We order the results by the start date in descending order (most recent employment first).

Note: Please keep in mind that the availability and completeness of data on Wikidata may vary, so it is possible that this query may not return any results.

The central Q-identifier for me is wrong. Q18712960 is a Finnish horse race. It would also need to use the qualifiers to get date information.

We can help it along first finding the URI for Finn Årup Nielsen:

“Write a SPARQL query to the Wikidata Query Service to find the Q identifier for Finn Årup Nielsen”

The resulting SPARQL is correct:

“The following SPARQL query can be used to retrieve the Q identifier for Finn Årup Nielsen from Wikidata:

SELECT ?person ?personLabel
WHERE
{
  ?person wdt:P31 wd:Q5.
  ?person rdfs:label "Finn Årup Nielsen"@en.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

This query selects the ?person variable, which represents the Wikidata entity corresponding to Finn Årup Nielsen. The wdt:P31 wd:Q5 statement specifies that we are looking for instances of humans (Q5 represents the human class in Wikidata). The rdfs:label statement filters the results to only include entities with the English label “Finn Årup Nielsen”. Finally, the SERVICE wikibase:label statement is used to retrieve the English label for the ?person entity.

When executed in the Wikidata Query Service, this query should return a single row with two columns: person and personLabel. The person column will contain the Q identifier for Finn Årup Nielsen, while the personLabel column will contain the English label “Finn Årup Nielsen”.”

Perhaps it is possible to be more explicit about what it should do:

“Write a SPARQL query to the Wikidata Query Service listing the employment history of Finn Årup Nielsen with employer and dates. The query should first identify the correct Q identifier for Finn Årup Nielsen and then use that identifier to find employers and with qualifiers determine the dates of employment.”

The result is incorrect:

“Certainly! Here is a SPARQL query that should accomplish what you are looking for:

SELECT ?employerLabel ?start_date ?end_date
WHERE {
  BIND(wd:Q616086 as ?person)
  ?person wdt:P108 ?employer .
  ?employer p:P569 ?start_claim .
  OPTIONAL { ?start_claim ps:P569 ?start_date . }
  ?start_claim pq:P580 ?start_date .
  OPTIONAL {
    ?employer p:P570 ?end_claim .
    ?end_claim pq:P582 ?end_date .
    OPTIONAL { ?end_claim ps:P570 ?end_date . }
  }
  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "en" .
    ?employer rdfs:label ?employerLabel .
  }
  FILTER (LANG(?employerLabel) = "en")
}

This query uses the Wikidata Query Service to retrieve the employment history of Finn Årup Nielsen, identified by the Wikidata entity Q616086.

The query first selects the employer (?employer) of Finn Årup Nielsen using the property wdt:P108, which represents the employer. It then uses the p:P569 and p:P570 properties to retrieve the start and end dates of each employment, respectively. The ps and pq qualifiers are used to specify that the start and end dates are values of the claims.

Finally, the SERVICE wikibase:label block is used to retrieve the English labels of the employers.

Please note that some of the employment dates may be missing, in which case only the start date will be displayed.”

The Q-identifier is wrong. It does not search for it. it also uses inappropiate properties.

You can downvote the answer in ChatGPT and it will generate a new answer:

SELECT ?employerLabel ?start_date ?end_date
WHERE {
  # Find the Q identifier for Finn Årup Nielsen
  ?person wikibase:statements ?statements .
  ?person rdfs:label "Finn Årup Nielsen"@en . 
  FILTER (LANG(?person) = "en")

  # Find employment history of Finn Årup Nielsen
  ?person p:P108 ?employment .
  ?employment ps:P108 ?employer .
  ?employer rdfs:label ?employerLabel .
  FILTER (LANG(?employerLabel) = "en")

  # Find the dates of employment with qualifiers
  ?employment pq:P580 ?start_time .
  ?employment pq:P582 ?end_time .

  # Format the dates
  BIND(year(?start_time) AS ?start_date) 
  BIND(year(?end_time) AS ?end_date)
}

Lines 6 is wrong and I am unsure if “?person wikibase:statements ?statements” is useful. But otherwise it is better, – though an optional query on the ?end_time would have been better. Disabling line 6 gives:

Adding an OPTIONAL around line 16 gives 3 result lines in accordance of what is presently available in Wikidata.

One thought on “Wikidata and ChatGPT

    […] This approach cannot “go backwards” in the Wikidata API, e.g., if I ask “Can you with the help of Wikidata find the 3 latest articles published by Finn Årup Nielsen?” there is no relevant result as the articles are not listed on the Finn Årup Nielsen Wikidata page, but rather on the article pages (and the article pages link to the Finn Årup Nielsen Wikidata page). A SPARQL approach might be helpful here. I attempted this back in March 2023, see Wikidata and ChatGPT. […]

Leave a comment