Month: February 2022
What SPARQL keywords do we use in Scholia?
Scaling the Wikidata Query Service seems to be a continuing concern for those that run the service. There is a general fear that we will run into hard restrictions with the BlazeGraph software which is setup as a SPARQL endpoint for the Wikidata Query Service. In February 2022, there have been two video sessions where the community has had a chance to give input to a possible alternative/migration, and tell, for instance, what SPARQL query features are important.
In Scholia, we are using a range of SPARQL features. We had no overview of which features we use, but of specialized Wikidata Query Service/Blazegraph features I remember we are using, is the labeling service, the GAS service and the mwapi service. As most of our SPARQL code uses capital letters for keywords and functions and lowercase for variable, we can get a quick and dirty overview of the keywords and functions we are using with
git clone git@github.com:WDscholia/scholia.git
cd scholia/scholia/app/templates/
cat *.sparql | python -c 'import re; print("\n".join(re.findall("[A-Z_]{2,}", open(0).read())))' | sort | uniq -c | sort -nr
These three command-lines give
1179 AS 688 SELECT 650 WHERE 556 BY 407 BIND 313 INCLUDE 296 WITH 293 ORDER 264 DESC 263 GROUP 255 SERVICE 226 PREFIX 226 OPTIONAL 214 UNION 214 COUNT 195 DISTINCT 190 FILTER 147 LIMIT 145 AUTO_LANGUAGE 100 SAMPLE 82 CONCAT 79 STR 73 VALUES 71 GROUP_CONCAT 65 LANG 41 SUBSTR 40 YEAR 32 COALESCE 30 MIN 30 ID 28 EXISTS 27 IF 26 ENCODE_FOR_URI 26 CHEMICALS 24 REPLACE 23 NOT 22 MAX 17 SUM 14 RESULTS 14 ORCID 13 IRI 13 IK 13 ASC 12 CAS 10 URL 10 JOURNAL 10 _CID 8 INTENTION 7 LEGOLAS 6 NOW 6 HAVING 6 CITEDARTICLE 6 BFS 5 PCID 5 MINUS 5 CASID 5 BD 4 URI 4 ROUND 4 OR 4 DOI 3 STRSTARTS 3 ISSN 3 _ID 3 FFFFFF 3 BC 2 UNRESOLVED 2 TO 2 PC 2 MONTH 2 MOLS 2 LCASE 2 DAY 2 _CIDU 2 CASU 2 BB 2 ASK 2 AP 2 ALLOTROPES 1 TODO 1 STRLEN 1 ISBN 1 ID_T 1 GRID 1 GEPRIS 1 FF 1 END 1 EFFBD 1 EEEEEE 1 DDDDDD 1 CORDIS 1 BLANK 1 BFI 1 ABS
Here CHEMICALS and CITEDARTICLE must be varibles, while, e.g., DDDDDD is a color specification. We are using the WITH Blazegraph-specific keyword a lot. This is usually for efficiency. Currently, we have few ASK and no CONSTRUCT.
University course emails per year
I have previous written about university course emails. The above figure shows the development of the number of received emails saved in my ‘teaching’ folder together with the number of received emails saved in my ‘teaching assistants’ folder. The projected number of emails for the year 2022 may be too large because of an unusually large number of emails in January 2022 from the students. As previously noted the counts do not include the automated emails I receive from our question-answering site. I usually delete such emails.
There might be around 220 working days in Denmark, making the average number of emails per day around 10 or less. One should think that handling 10 emails would not amount to more than an hour of work based on my guesstimate in my previous blogpost though as noted the long tail of the distribution of the handling time may make the estimate quite uncertain.