- 3. October 2018 in DGI-byen, Copenhagen, Denmark as part of Visuals and Analytics that Matter conference, – the concluding conference for the DEFF-sponsored project Research Output & Impact Analyzed and Visualized (ROIAV).
- 7. November 2018 in Mannheim as part of the Linked Open Citation Database (LOC-DB) 2018 workshop.
- 13. december 2018 at the library of the Technical University of Denmark as part of Wikipedia – a media for sharing knowledge and research, an event for researchers and students (and still in the planning phase).
I recently looked into what we have of Wikipedia research from Denmark and discovered several papers that I did not know about. I have now added some to Wikidata, so that Scholia can show a list of them.
Among the papers was one from Jens-Erik Mai titled Wikipedian’s knowledge and moral duties. Starting from the English Wikipedia’s Neutral Point of View (NPOV) policy, he stresses a dichotomy between the subjective and the object and argues for a rewrite of the policy. Mai claims the policy has an absolutistic center and a relativistic edge, corresponding to an absolutistic majority view and relativistic minority views.
As a long time Wikipedia editor, I find Mai’s exposition is too theoretical. I lack good exemplifications: cases where the NPOV fails, and I cannot see in what concrete way the NPOV policy should be changed to accommodate Mai’s critique. I am not sure that Wikipedians distinguish so much between the objective and the subjective; the key dichotomy is verifiability vs. not veriability, – that the statements in Wikipedia are supported by reliable sources. In terms of center-edge, I came to think of events associated with conspiracy theories. Here the “center” view could be the conventional view while the conspiracy views the edge. It is difficult for me to accommodate a standpoint that conspiracy theories should be accepted as equal as the conventional view. It is neither clear to me that the center is uncontested and uncontroversial. Wikipedia – like a newspaper – has the ability to represent opposing viewpoints. This is done by attributing the viewpoint to the reliable sources that express them. For instance, central in the description of evaluation of films are quotations from reviews of major newspapers and notable reviewers.
I don’t see the support for the claim that the NPOV policy assumes a “politically dangerous ethical position”. On the contrary, Wikipedia is now – after the increase of fake news – been called the “last bastion”. The example given in The Atlantic post is the recent social media fuzz with respect to Sarah Jeong where Wikipedians reach a work with “shared facts about reality.”
“Og så er der fra 2018 og frem øremærket 0,5 mio. kr. til Dansk Sprognævn til at frikøbe Retskrivningsordbogen.”
From Peter Brodersen I hear that the budget of the Danish government for next year allocates funds to Dansk Sprognævn for the release of the Retskrivningsordbogen – the Danish official dictionary for word spelling.
It is mentioned briefly in an announcement from the Ministry of Culture: “Og så er der fra 2018 og frem øremærket 0,5 mio. kr. til Dansk Sprognævn til at frikøbe Retskrivningsordbogen.”: 500.000 DKK allocated for the release of the dataset.
It is not clear under which conditions it is released. An announcement from Dansk Sprognævn writes “til sprogteknologiske formål” (to natural language processing purposes). I trust it is not just for natural language processing purposes, – but for every purpose!?
If it is to be used in free software/databases then a CC0 or better license is a good idea. We are still waiting for Wikidata for Wiktionary, the yet waporware with a multilingual, collaborative and structured dictionary. This ressource is CC0-based. The “old” Wiktionary has surprisingly not been used that much by natural language processing researcher. Perhaps because of the anarchistic structure of Wiktionary. Wikidata for Wiktionary could hopefully help with us with structuring lexical data and improve the size and the utility of lexical information. With Retskrivningsordbogen as CC0 it could be imported into Wikidata for Wiktionary and extended with multilingual links and semantic markup.
With the WikiCite project, the bibliographic information on Wikidata is increasing rapidly with Wikidata describing 9.3 million scientific articles and 36.6 million citations. As far as I can determine most of the work is currently done by James Hare and Daniel Mietchen. Mietchen’s Research Bot is over 11 million edits on Wikidata while Hare has 15 million edits. For entering data into Wikidata from PubMed you can basically walk your way through PMID starting with “1” with the Fatameh tool. Hare’s reference work can take advantage of a webservice provided by National Institute of Health. For instance, a URL such https://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=pmc&linkname=pmc_refs_pubmed&retmode=json&id=5585223 will return a JSON formatted result with citation information. This specific URL is apparently what Hare used to setup P2860 citation information in Wikidata, see, e.g., https://www.wikidata.org/wiki/Q41620192#P2860. CrossRef may be another resource.
Beyond these resources, we could potentially use Google Scholar. A former terms of service/EULA of Google Scholar stated that: “You shall not, and shall not allow any third party to: […] (j) modify, adapt, translate, prepare derivative works from, decompile, reverse engineer, disassemble or otherwise attempt to derive source code from any Service or any other Google technology, content, data, routines, algorithms, methods, ideas design, user interface techniques, software, materials, and documentation; […] “crawl”, “spider”, index or in any non-transitory manner store or cache information obtained from the Service (including, but not limited to, Results, or any part, copy or derivative thereof); (m) create or attempt to create a substitute or similar service or product through use of or access to any of the Service or proprietary information related thereto“. Here is “create or attempt to create a substitute or similar service” a stopping point.
The Google Scholar terms document seems now to have been superseded by the all embracing Google Terms of Service. This document seems less restrictive: “Don’t misuse our Services” and “You may not use content from our Services unless you obtain permission from its owner or are otherwise permitted by law.” So it may be or may not be ok to crawl and/or use/republish the data from Google Scholar. See also a StackExchange question. and another StackExchange question.
The Google robots.txt limits automated access with the following relevant lines:
Disallow: /scholar Disallow: /citations? Allow: /citations?user= Disallow: /citations?*cstart= Allow: /citations?view_op=new_profile Allow: /citations?view_op=top_venues Allow: /scholar_share
“/citations?user=” means that you are allowed to bot access the user profiles. Google Scholar user identifiers may be recorded in Wikidata by a dedicated property, so you could automatically access Google Scholar user profiles from the information in Wikidata.
So if there is some information you can get from Google Scholar is it worth it?
python -m scholia.googlescholar get-user-data gQVuJh8AAAAJ
It is worth remembering that Wikidata has the P4028 property to link to Google Scholar articles. There ain’t no many items using it yet though: 31. It was suggested by Vladimir Alexiev back in May 2017, but it seems that I am the only one using the property. Bot access to the link target provided by P4028 is – as far as I can see from the robots.txt – not allowed.
According to Laura Martin, Franz Boas may have been the first to point to the relative richness of Eskimo words for snow: “Eskimo Words for Snow”: A Case Study in the Genesis and Decay of an Anthropological Example. American Anthropologist, 88(2):418. Boas listed aput, qana, piqsirpoz, and qimuqsuq. English may have snow, hail, sleet, ice, icicle, slush, and snowflake as listed on the English Wikipedia on Eskimo words for snow. There seems to be more than that, e.g., firn. Danish is not (as) polysynthetic as Eskimo, but it has lots of compounds, which make it possible to create a good number of words for snow. Most of these words derive from sne and is.
Update 2017-09-13: Added skosse.
|bræ||large mass of ice|
|bundis||ice at the bottom of the ocean/sea|
|drivis||ice floating on the water, either “havis” og “søis”|
|firn||firn||snow older than a year|
|flodis||ice from a river|
|frostsne||snow below freezing, as oppose to tøsne|
|gletscheris||ice in/from a glacier|
|grå is||first stage of “ungis”, according to DMI|
|gråhvid is||second state of “ungis”, according to DMI|
|hagl||hail||precipitate with small pellets of ice|
|haglkorn||hailstone||small pellet of ice|
|havis||sea ice||ice in the ocean/sea|
|indlandsis||Indlandsisen is the big “iskappe” in Greenland|
|is||ice||frozen water that is (usually) transparent|
|isbarriere||the edge of an “isshelf”, according to DMI|
|isblok||block of ice|
|isflade||sheet of ice|
|isfront||the edge og a “isshelf”|
|isfod||ice frozen to the coast or (second meaning) the ice below the water|
|iskalot||ice-covered area near the poles|
|iskant||the edge of a floe|
|iskappe||ice cap||very large connected mass of snow, e.g., the one in Greenland|
|iskorn||see also “kornsne”|
|isbræ||large mass of ice, the same as “bræ”|
|islag||layer of ice, not the same as “isslag”|
|isrand||the edge og a floe|
|isskorpe||layer of ice on top of water or snow|
|isslag||glaze, black ice, freezing rain||raindrops below freezing that becomes ice when hitting the ground or structure|
|isstykke||a piece of ice|
|isvand||ice water||water with ice in it, usually for drinking|
|julesne||Christmas snow||snow falling or lying during Christmas|
|kunstsne||artificial snow||snow artificially made|
|nysne||snow recently falling, as opposed to firn|
|pakis||“drivis” with a high concentration, according to DMI|
|polaris||sea ice that have survived at least one summer meting|
|puddersne||powder snow||light snow|
|rim||hard rime||“white ice that forms when the water droplets in fog freeze to the outer surfaces of objects.” according to English Wikipedia|
|slud||sleet||a mixture of rain and falling snow|
|sne||snow||used about falling snow and snow on the ground|
|snebold||snowball||snow formed as a ball, of used to through in a snowball fight|
|snebunke||pile of snow|
|snedrys||small amount of precipitation of snow|
|snedække||layer/cover of snow|
|snefygning||snow in strong wind|
|snehule||snow formed as a cave for fun or survival, see also “igloo”|
|snehytte||more or less the same as an “iglo”|
|snelag||layer of snow|
|snemand||snowman||snow formed as a sculpture of a human|
|snemark||field of snow|
|snemasse||mass of snow|
|sneskred||avalanche||snow falling down a slope|
|snevejr||snow||weather with falling snow|
|tøris||(“tøris” is usually “dry ice”)|
|tøsne||melting snow||snow that is melting|
|ungis||Sea ice between “tyndis” and “vinteris”, according to DMI|