Latest Event Updates

Coming Scholia, WikiCite, Wikidata and Wikipedia sessions

Posted on

In the coming months I will have three different talks on Scholia, WikiCite, Wikidata and Wikipedia at al.:

  • 3. October 2018 in DGI-byen, Copenhagen, Denmark as part of Visuals and Analytics that Matter conference, – the concluding conference for the DEFF-sponsored project Research Output & Impact Analyzed and Visualized (ROIAV).
  • 7. November 2018 in Mannheim as part of the Linked Open Citation Database (LOC-DB) 2018 workshop.
  • 13. december 2018 at the library of the Technical University of Denmark as part of Wikipedia – a media for sharing knowledge and research, an event for researchers and students (and still in the planning phase).

In september I presented Scholia as part of the Workshop on Open Citations. The slides with title Scholia as of September 2018 is available here.


Fru Astrid Grib af Thit Jensen

Posted on

Lillesøster Thit gir den hele armen med mord og død i psykologisk portrættering af en kærlighedsbefængt 28-årig kvinde, hvor tiltag til sprog a la storebror aldrig helt letter. Vældig meget kunst og melodrama hvor 40 sider lader en kvinde gå fra forelskelsens vanvid til vanvid. Kærligheden er voldsom, ugengældt, balstyrisk, overdreven men også uudtrykt; ganske kontrastfyldt mod brorens skolemesteragtige forhold til kærlighed.

Fra Librarything.

A viewpoint on a viewpoint on Wikipedia’s neutral point of view

Posted on Updated on

I recently looked into what we have of Wikipedia research from Denmark and discovered several papers that I did not know about. I have now added some to Wikidata, so that Scholia can show a list of them.

Among the papers was one from Jens-Erik Mai titled Wikipedian’s knowledge and moral duties. Starting from the English Wikipedia’s Neutral Point of View (NPOV) policy, he stresses a dichotomy between the subjective and the object and argues for a rewrite of the policy. Mai claims the policy has an absolutistic center and a relativistic edge, corresponding to an absolutistic majority view and relativistic minority views.

As a long time Wikipedia editor, I find Mai’s exposition is too theoretical. I lack good exemplifications: cases where the NPOV fails, and I cannot see in what concrete way the NPOV policy should be changed to accommodate Mai’s critique. I am not sure that Wikipedians distinguish so much between the objective and the subjective; the key dichotomy is verifiability vs. not veriability, – that the statements in Wikipedia are supported by reliable sources. In terms of center-edge, I came to think of events associated with conspiracy theories. Here the “center” view could be the conventional view while the conspiracy views the edge. It is difficult for me to accommodate a standpoint that conspiracy theories should be accepted as equal as the conventional view. It is neither clear to me that the center is uncontested and uncontroversial. Wikipedia – like a newspaper – has the ability to represent opposing viewpoints. This is done by attributing the viewpoint to the reliable sources that express them. For instance, central in the description of evaluation of films are quotations from reviews of major newspapers and notable reviewers.

I don’t see the support for the claim that the NPOV policy assumes a “politically dangerous ethical position”. On the contrary, Wikipedia is now – after the increase of fake news – been called the “last bastion”. The example given in The Atlantic post is the recent social media fuzz with respect to Sarah Jeong where Wikipedians reach a work with “shared facts about reality.”

Scholia is more than scholarly profiles

Posted on Updated on

Scholia, a website originally started as service to show scholarly profiles from data in Wikidata, is actually not just for scholarly data.

Scholia can also show bibliographic information for “literary” authors and journalists.

An example that I have begun on Wikidata is for the Danish writer Johannes V. Jensen whose works pose a very interesting test case for Wikidata, because the interrelation between the works and editions can be quite complicated, e.g., news paper articles being merged into a poem that is then published in an edition that are then expanded and re-printed… Also the scholarly and journalistic work about Johannes V. Jensen can be recorded in Wikidata. Scholia currently records 30 entries about Johannes V. Jensen, – and that does not necessarily includes works about works written by Johannes V. Jensen.

An example of a bibliography of a journalist is that of Kim Wall. Her works are almost always addressing very unique topics, – fairly relevant as sources in Wikipedia articles. Examples include an article on a special modern Chinese wedding tradition in Fairy Tale Romances, Real and Staged and an article on furries It’s not about sex, it’s about identity: why furries are unique among fan cultures.

An interesting feature about most of Wall’s articles, is that she let the interviewee have the final word by adding a quotation as the very final paragraph. That is also the case with the two examples linked above. I suppose that say something of Wall’s generous journalistic approach.



Addressing “addressing age-related bias in sentiment analysis”

Posted on Updated on

Algorithmic bias is one of the hot topics of research at the moment. There are observations of trained machine learning models that display sexism. For instance, the paper “Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings” (Scholia entry) neatly shows one example in its title with bias in word embeddings, –  shallow machine learning models trained on a large corpus of text.

A recent report investigated ageism bias in a range of sentiment analysis method, including my AFINN word list: “Addressing age-related bias in sentiment analysis” (Scholia entry). The researchers scraped sentences from blog posts and extracted those sentences with the word “old” and excluded the sentences where the word did not refer to the age of the person. They then replaced “old” with the word “young” (apparently also “older” and “oldest” was considered somehow). The example sentences they ended up with were, e.g., “It also upsets me when I realize that society expects this from old people” and “It also upsets me when I realize that society expects this from young people”. These sentences (242 in total) were submitted to 15 sentiment analysis tools and statistics was made “using multinomial log-linear regressions (via the R package nnet […])”.

I was happy to see that my AFINN was the only one in Table 4 surviving the test for all regression coefficients being non-significant. However, Table 5 with implicit age analysis showed some bias in my word list.

But after a bit of thought I wondered why there could be any kind of bias in my word list. The paper list an exponentiated intercept coefficient to be 0.733 with a 95%-confidence interval from 0.468 to 1.149 for AFINN. But if I examine what my afinn Python package reports about the words “old”, “older”, “oldest”, “young”, “younger” and “youngest”, I get all zeros, i.e., these words are not scored to be either positive or negative:


>>> from afinn import Afinn
>>> afinn = Afinn()
>>> afinn.score('old')
>>> afinn.score('older')
>>> afinn.score('oldest')
>>> afinn.score('young')
>>> afinn.score('younger')
>>> afinn.score('youngest')

It is thus strange why there can be any form a bias – even non-significant. For instance, for the two example sentences “It also upsets me when I realize that society expects this from old people” and “It also upsets me when I realize that society expects this from young people” my afinn Python package scores them both with the sentiment -2. This value comes solely from the word “upsets”. There can be no difference between any of the sentences when you exchange the word “old” with “young”.

In their implicit analysis of bias where they use a word embedding, there could possibly creep some bias in somewhere with my word list, although it is not clear for me how this happens.

The question is then what happens in the analysis. Does the multinomial log-linear regression give a questionable result? Could it be that I misunderstand a fundamental aspect of the paper? While som data seem to be available here, I cannot identify the specific sentences they used in the analysis.

Hyppige elementer blandt bedste danske film

Posted on Updated on

Bo Green Jensen har skrevet bogen De 25 bedste danske film, hvor man blandt andet finder Vredens Dag, Kundskabens træ, Babettes gæstebud og Den eneste ene. Denne korte liste på 25 film, der blev udgivet i 2002, har jeg lige indtastet i Wikidata via “katalog”-egenskaben. Når det er gjort, kan man benytte Wikidata Query Service til, med en SPARQL-databaseforespørgsel, at finde elementer der går igen blandt filmene. En sådan SPARQL-forespørgsel kunne se sådan ud:

SELECT (COUNT(?item) AS ?count) ?value ?valueLabel WHERE {
  ?item wdt:P972 wd:Q12307844 .
  ?item ?property ?value .
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],da,en". }
GROUP BY ?value ?valueLabel
HAVING (COUNT(?item) > 1)

Denne version tæller film og ordner elementerne efter hvor mange film de enkelte elementer indgår i. Informationen i Wikidata er nok ikke helt komplet. Med Magnus Manskes Listeria-værktøj kan man dog få en tabel konstrueret der viser at hver enkelt film er rimeligt godt dækket ind.

SPARQL’en findes her og resultatet ses her.

Det er ikke overraskende at et af de elementer der findes ved alle de 25 film er at de er oplistet i De 25 bedste danske film. Det er lissom en tautologi… Hvis vi går videre ned i hyppighed finder vi at Bodil Kjer og Anne Marie Helger er de højest placerede personer.

Bodil Kjer forbindes nok mest med gråtonede film fra 1940’erne og 1950’erne – i listen finder man hende som skuespiller i Otte akkorder, John og Irene og Mød mig på Cassiopeia – men i sin senere karriere gjorde hun sig også bemærket, dels som skrøbelig frue i Strømer, dels i den første danske Oscarvindende spillefilm. Hun er ikke en overraskelse.

Hvad jeg finder overraskende er at Anne Marie Helger ligger med 5 elementer, og dermed den næsthøjeste person på listen. Hun er skuespiller i Strømer, Johnny Larsen, selvfølgelig Koks i kulissen, og Erik Clausens De frigjorte. Hun figurerer også som manuskriptforfatter på Christian Braad Thomsens film.

En tak længere nede kommer Erik Balling, Ebbe Rode, Ib Schønberg og Anders Refn. Balling er producent på to film på listen og stod for både instruktion og manuskript på Poeten og Lillemor. Anders Refn er filmklipper på to og var tillige i en dobbeltrolle med instruktion og manuskript til Strømer.

Min navnebror Finn Nielsen er med på listen i forbindelse med tre film: Strømer, Johnny Larsen og Babettes gæstebud. Han gjorde forøvrigt også en fin(n) præstation i Kærlighedens smerte, som ikke kom på listen da instruktøren allerede er repræsenteret med Kundskabens træ.

Sverige står som samproduktionsland på fire film. Det er særligt i de senere års film, men den første film er faktisk Sult som jo er fra 1960’erne.

Og så iøvrigt mangler Bodil Kjer at blive talt med en ekstra gang: Som ekstra 26. emne lister Bo Green Jensen Far til fire-serien. I denne serie indgår der en legetøjselefant ved navn Bodil Kjer…

Journalist af karsken bælg: En bog om Lise Nørgaards journalistik af John Chr Jøgensen

Posted on

Godt niveau og levende sprog fra en akademisk herre der har baggrunden i orden som kyndig i kvindelige journalister skrives om nationalklenodiet, Lise i guldsandalerne, og damens mindre fremdragede side fra tiden som mesterlærling frem til stjerneskribentens Pilestrædetid med ekstraordinær ret til ucomputeriseret skrivemaskine. Jørgensen placerer hende som borgerlig individualist med ben i næsen, en ironisk distance og fornyer af journalistgenrer.

På side 38-39 får vi smag på sprogkunsterens evner: Hendes allerførste nu 81 år og et par uger gamle lederartikel fra den 4. januar 1937 i Roskilde Dagblad. Anledning var udenrigspolitiske forviklinger ved et royalt bryllup mellem en hollandsk prinsesse og en tysk prins og her hedder det om naziregeringen at:

“Den har følt, at noget maatte der gøres, og da det ikke var muligt at faa en Finger med i Spillet i selve Holland, vedtoges det at fratage tre tyske Prinsesser af ædleste Blod, der skulde være Brudepiger ved Formælingen af den formastelige Prins og Prinsessen i det Land, hvor en Fodboldkamp med Tyskland kunde foregaa under andet Flag end det med Hagekorset, deres pas. Naa, da Ilterheden havde lagt sig og en Kurér fra den indeklemte Prins havde været hos Hitler, maatte man fra Naziside være blevet klar over, at saadan skulde det alligevel ikke gribes an.” En indskudt dobbelt bisætning med alliterationer og så deres pas og nå!