Month: February 2022

What SPARQL keywords do we use in Scholia?

Posted on Updated on

Scaling the Wikidata Query Service seems to be a continuing concern for those that run the service. There is a general fear that we will run into hard restrictions with the BlazeGraph software which is setup as a SPARQL endpoint for the Wikidata Query Service. In February 2022, there have been two video sessions where the community has had a chance to give input to a possible alternative/migration, and tell, for instance, what SPARQL query features are important.

In Scholia, we are using a range of SPARQL features. We had no overview of which features we use, but of specialized Wikidata Query Service/Blazegraph features I remember we are using, is the labeling service, the GAS service and the mwapi service. As most of our SPARQL code uses capital letters for keywords and functions and lowercase for variable, we can get a quick and dirty overview of the keywords and functions we are using with

git clone git@github.com:WDscholia/scholia.git
cd scholia/scholia/app/templates/
cat *.sparql | python -c 'import re; print("\n".join(re.findall("[A-Z_]{2,}", open(0).read())))' | sort | uniq -c | sort -nr

These three command-lines give

   1179 AS
    688 SELECT
    650 WHERE
    556 BY
    407 BIND
    313 INCLUDE
    296 WITH
    293 ORDER
    264 DESC
    263 GROUP
    255 SERVICE
    226 PREFIX
    226 OPTIONAL
    214 UNION
    214 COUNT
    195 DISTINCT
    190 FILTER
    147 LIMIT
    145 AUTO_LANGUAGE
    100 SAMPLE
     82 CONCAT
     79 STR
     73 VALUES
     71 GROUP_CONCAT
     65 LANG
     41 SUBSTR
     40 YEAR
     32 COALESCE
     30 MIN
     30 ID
     28 EXISTS
     27 IF
     26 ENCODE_FOR_URI
     26 CHEMICALS
     24 REPLACE
     23 NOT
     22 MAX
     17 SUM
     14 RESULTS
     14 ORCID
     13 IRI
     13 IK
     13 ASC
     12 CAS
     10 URL
     10 JOURNAL
     10 _CID
      8 INTENTION
      7 LEGOLAS
      6 NOW
      6 HAVING
      6 CITEDARTICLE
      6 BFS
      5 PCID
      5 MINUS
      5 CASID
      5 BD
      4 URI
      4 ROUND
      4 OR
      4 DOI
      3 STRSTARTS
      3 ISSN
      3 _ID
      3 FFFFFF
      3 BC
      2 UNRESOLVED
      2 TO
      2 PC
      2 MONTH
      2 MOLS
      2 LCASE
      2 DAY
      2 _CIDU
      2 CASU
      2 BB
      2 ASK
      2 AP
      2 ALLOTROPES
      1 TODO
      1 STRLEN
      1 ISBN
      1 ID_T
      1 GRID
      1 GEPRIS
      1 FF
      1 END
      1 EFFBD
      1 EEEEEE
      1 DDDDDD
      1 CORDIS
      1 BLANK
      1 BFI
      1 ABS

Here CHEMICALS and CITEDARTICLE must be varibles, while, e.g., DDDDDD is a color specification. We are using the WITH Blazegraph-specific keyword a lot. This is usually for efficiency. Currently, we have few ASK and no CONSTRUCT.

University course emails per year

Posted on

I have previous written about university course emails. The above figure shows the development of the number of received emails saved in my ‘teaching’ folder together with the number of received emails saved in my ‘teaching assistants’ folder. The projected number of emails for the year 2022 may be too large because of an unusually large number of emails in January 2022 from the students. As previously noted the counts do not include the automated emails I receive from our question-answering site. I usually delete such emails.

There might be around 220 working days in Denmark, making the average number of emails per day around 10 or less. One should think that handling 10 emails would not amount to more than an hour of work based on my guesstimate in my previous blogpost though as noted the long tail of the distribution of the handling time may make the estimate quite uncertain.