Latest Event Updates

Addressing “addressing age-related bias in sentiment analysis”

Posted on Updated on

Algorithmic bias is one of the hot topics of research at the moment. There are observations of trained machine learning models that display sexism. For instance, the paper “Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings” (Scholia entry) neatly shows one example in its title with bias in word embeddings, –  shallow machine learning models trained on a large corpus of text.

A recent report investigated ageism bias in a range of sentiment analysis method, including my AFINN word list: “Addressing age-related bias in sentiment analysis” (Scholia entry). The researchers scraped sentences from blog posts and extracted those sentences with the word “old” and excluded the sentences where the word did not refer to the age of the person. They then replaced “old” with the word “young” (apparently also “older” and “oldest” was considered somehow). The example sentences they ended up with were, e.g., “It also upsets me when I realize that society expects this from old people” and “It also upsets me when I realize that society expects this from young people”. These sentences (242 in total) were submitted to 15 sentiment analysis tools and statistics was made “using multinomial log-linear regressions (via the R package nnet […])”.

I was happy to see that my AFINN was the only one in Table 4 surviving the test for all regression coefficients being non-significant. However, Table 5 with implicit age analysis showed some bias in my word list.

But after a bit of thought I wondered why there could be any kind of bias in my word list. The paper list an exponentiated intercept coefficient to be 0.733 with a 95%-confidence interval from 0.468 to 1.149 for AFINN. But if I examine what my afinn Python package reports about the words “old”, “older”, “oldest”, “young”, “younger” and “youngest”, I get all zeros, i.e., these words are not scored to be either positive or negative:

 

>>> from afinn import Afinn
>>> afinn = Afinn()
>>> afinn.score('old')
0.0
>>> afinn.score('older')
0.0
>>> afinn.score('oldest')
0.0
>>> afinn.score('young')
0.0
>>> afinn.score('younger')
0.0
>>> afinn.score('youngest')
0.0

It is thus strange why there can be any form a bias – even non-significant. For instance, for the two example sentences “It also upsets me when I realize that society expects this from old people” and “It also upsets me when I realize that society expects this from young people” my afinn Python package scores them both with the sentiment -2. This value comes solely from the word “upsets”. There can be no difference between any of the sentences when you exchange the word “old” with “young”.

In their implicit analysis of bias where they use a word embedding, there could possibly creep some bias in somewhere with my word list, although it is not clear for me how this happens.

The question is then what happens in the analysis. Does the multinomial log-linear regression give a questionable result? Could it be that I misunderstand a fundamental aspect of the paper? While som data seem to be available here, I cannot identify the specific sentences they used in the analysis.

Advertisements

Hyppige elementer blandt bedste danske film

Posted on Updated on

Bo Green Jensen har skrevet bogen De 25 bedste danske film, hvor man blandt andet finder Vredens Dag, Kundskabens træ, Babettes gæstebud og Den eneste ene. Denne korte liste på 25 film, der blev udgivet i 2002, har jeg lige indtastet i Wikidata via “katalog”-egenskaben. Når det er gjort, kan man benytte Wikidata Query Service til, med en SPARQL-databaseforespørgsel, at finde elementer der går igen blandt filmene. En sådan SPARQL-forespørgsel kunne se sådan ud:

SELECT (COUNT(?item) AS ?count) ?value ?valueLabel WHERE {
  ?item wdt:P972 wd:Q12307844 .
  ?item ?property ?value .
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],da,en". }
}
GROUP BY ?value ?valueLabel
HAVING (COUNT(?item) > 1)
ORDER BY DESC(?count)

Denne version tæller film og ordner elementerne efter hvor mange film de enkelte elementer indgår i. Informationen i Wikidata er nok ikke helt komplet. Med Magnus Manskes Listeria-værktøj kan man dog få en tabel konstrueret der viser at hver enkelt film er rimeligt godt dækket ind.

SPARQL’en findes her og resultatet ses her.

Det er ikke overraskende at et af de elementer der findes ved alle de 25 film er at de er oplistet i De 25 bedste danske film. Det er lissom en tautologi… Hvis vi går videre ned i hyppighed finder vi at Bodil Kjer og Anne Marie Helger er de højest placerede personer.

Bodil Kjer forbindes nok mest med gråtonede film fra 1940’erne og 1950’erne – i listen finder man hende som skuespiller i Otte akkorder, John og Irene og Mød mig på Cassiopeia – men i sin senere karriere gjorde hun sig også bemærket, dels som skrøbelig frue i Strømer, dels i den første danske Oscarvindende spillefilm. Hun er ikke en overraskelse.

Hvad jeg finder overraskende er at Anne Marie Helger ligger med 5 elementer, og dermed den næsthøjeste person på listen. Hun er skuespiller i Strømer, Johnny Larsen, selvfølgelig Koks i kulissen, og Erik Clausens De frigjorte. Hun figurerer også som manuskriptforfatter på Christian Braad Thomsens film.

En tak længere nede kommer Erik Balling, Ebbe Rode, Ib Schønberg og Anders Refn. Balling er producent på to film på listen og stod for både instruktion og manuskript på Poeten og Lillemor. Anders Refn er filmklipper på to og var tillige i en dobbeltrolle med instruktion og manuskript til Strømer.

Min navnebror Finn Nielsen er med på listen i forbindelse med tre film: Strømer, Johnny Larsen og Babettes gæstebud. Han gjorde forøvrigt også en fin(n) præstation i Kærlighedens smerte, som ikke kom på listen da instruktøren allerede er repræsenteret med Kundskabens træ.

Sverige står som samproduktionsland på fire film. Det er særligt i de senere års film, men det første film er faktisk Sult som jo er fra 1960’erne.

Og så iøvrigt mangler Bodil Kjer at blive talt med en ekstra gang: Som ekstra 26. emne lister Bo Green Jensen Far til fire-serien. I denne serie indgår der en legetøjselefant ved navn Bodil Kjer…

Journalist af karsken bælg: En bog om Lise Nørgaards journalistik af John Chr Jøgensen

Posted on

Godt niveau og levende sprog fra en akademisk herre der har baggrunden i orden som kyndig i kvindelige journalister skrives om nationalklenodiet, Lise i guldsandalerne, og damens mindre fremdragede side fra tiden som mesterlærling frem til stjerneskribentens Pilestrædetid med ekstraordinær ret til ucomputeriseret skrivemaskine. Jørgensen placerer hende som borgerlig individualist med ben i næsen, en ironisk distance og fornyer af journalistgenrer.

På side 38-39 får vi smag på sprogkunsterens evner: Hendes allerførste nu 81 år og et par uger gamle lederartikel fra den 4. januar 1937 i Roskilde Dagblad. Anledning var udenrigspolitiske forviklinger ved et royalt bryllup mellem en hollandsk prinsesse og en tysk prins og her hedder det om naziregeringen at:

“Den har følt, at noget maatte der gøres, og da det ikke var muligt at faa en Finger med i Spillet i selve Holland, vedtoges det at fratage tre tyske Prinsesser af ædleste Blod, der skulde være Brudepiger ved Formælingen af den formastelige Prins og Prinsessen i det Land, hvor en Fodboldkamp med Tyskland kunde foregaa under andet Flag end det med Hagekorset, deres pas. Naa, da Ilterheden havde lagt sig og en Kurér fra den indeklemte Prins havde været hos Hitler, maatte man fra Naziside være blevet klar over, at saadan skulde det alligevel ikke gribes an.” En indskudt dobbelt bisætning med alliterationer og så deres pas og nå!

“En Frygtelig Kvinde” and gender

Posted on Updated on

En Frygtelig kvinde” is a recently premiered Danish film. On this blog I have previously considered how male and female view a film differently: In the case of the Klown movie, there seems to be a slight tendency for female reviewer to be less enthusiastic.

En Frygtelig kvinde” portrays a man and a woman as they fall in love and move together. Keeping in mind the title, “A terrible woman”, would it mean that male film reviewers grade it higher than female reviewers? Below is a small sample – by no means complete – of published film reviews from assorted venues. Danish grades typically go from 1 to 6.

Grade Gender Venue Reviewer
4 Female/Male Berlingske Sarah-Iben Almbjerg, Kristian Lindberg
4 Male BT Michael Lind
5 Male Ekko. Claus Christensen
4 Male Ekstra Bladet Henrik Queitsch
4 Male Filmland P1 Per Juul Carlsen
5 Male Politiken iflg Kino.dk Kim Skotte
 5 Male Soundvenue Rasmus Friis
 4 Male Moovy Elliot Peter Torres
 5 Female Den korte Avis Lone Nørgaard
 2 Male CinemaZone Daniel Skov Madsen
 1 Male Filmz Morten Vejlgaard Just
 4 Female Jyllands-Posten iflg. Kino.dk Katrine Sommer Boysen
 ? (fairly negative, which perhaps can be translated to “3”)  Female Information  Lone Nikolajsen

There are too few reviews for us to make any firm conclusions. A notable issue is two very negative reviews by two males.

A few samples: While Anne-Grethe Bjarup Riis finds it very funny (“skidesjov” og “pissesjov”) the male Filmz reviewer views it as “a misogynist crappy movie” (“en kvindefjendsk lortefilm”).  Two fourth-wave female feminists have opposite views: “a fantastic movie” vs. “really disappointed“.

Even the woman in the movie generates opposite views. The actress, Amanda Collin, are generally praised, but for POV International the character is “faceted” while Louise Kjølsen finds it stereotypical. Lone Nikolajsen characterizes the two main characters as “two well-known sex role clichés”.

According to Ekko, Directory Christian Tafdrup’s previous film sold only 1’603 tickets(!) but was generally praised and received several awards. “En frygtelig kvinde” is produced for just 4 million Danish kroner and the theater was packed when I viewed it.

Code for love: algorithmic dating

Posted on

One of the innovative Danish TV channels, DR3, has a history of dating programs with Gift ved første blik as, I believe, the initial program. A program with – literally – an arranged marriage between to participants matched by what was supposed to be relationship experts. Exported internationally as Married at First Sight the stability of the marriages has been low as very few of the couples have stayed together, – if one is to trust the information on the English Wikipedia.

Now my colleagues at DTU Compute has been involved in a new program called Koden til kærlighed (the code for love). Contrary to Gift ved første blik the participants are not going to get married during the program, but will live together for a month, – and as the perhaps most interesting part – the matches are determined by a learning algorithm: If you view the streamed program of the first episode you will have the delight of seeing glimpses of data mining Python code with Numpy (note the intermixed camelcase and underscore :).

The program seems to have been filmed with smartphone cameras for the most part. The participants are four couples of white heterosexual millenials. So far we have seen their expectations and initial first encounters, – so we are far from knowing whether my colleagues have done a good job with the algorithmic matching.

According to the program, the producers and the Technical University of Denmark have collected information from 1’400 persons in “well-functioning” relationships. There must have been pairs among the 1’400 so the data scientist can train the algorithm using pairs as the positive examples and persons that are not pairs as negative examples. The 350 new singles signed up for the program can then be matched together with the trained algorithm. And four couples of – I suppose – the top ranking matches were selected for the program.

Our Professor Jan Larsen was involved in the program and explained a bit more about the setup in the radio. The collected input was based on responses to 104 questions for 667 couples (apparently not quite 1’400). Important questions may have been related to sleep and education.

It will be interesting to follow the development of the couples. There are 8 episodes in this season. It would have been nice with more technical background: What are the questions? How exactly is the match determined? How is the importance of the questions determined? Has the producers done any “editing” in the relationships? (For instance, why are all participants in the age range 20-25 years?). When people matches how is the answer to the question matching: Are the answers homophilic or heterophilic? During the program there are glimpses of questions, that might have been used. Some examples are “Do you have a tv-set?”, “Which supermarket do you use?”and “How many relationships have you ended?” It is a question whether a question such as “Do you have a tv-set?” is a any use. 667 couples compared to 104 questions are not that much to train a model and one should think that less relevant questions could confuse the algorithm more than it would help.

“Og så er der fra 2018 og frem øremærket 0,5 mio. kr. til Dansk Sprognævn til at frikøbe Retskrivningsordbogen.”

Posted on Updated on

From Peter Brodersen I hear that the budget of the Danish government for next year allocates funds to Dansk Sprognævn for the release of the Retskrivningsordbogen – the Danish official dictionary for word spelling.

It is mentioned briefly in an announcement from the Ministry of Culture: “Og så er der fra 2018 og frem øremærket 0,5 mio. kr. til Dansk Sprognævn til at frikøbe Retskrivningsordbogen.”: 500.000 DKK allocated for the release of the dataset.

It is not clear under which conditions it is released. An announcement from Dansk Sprognævn writes “til sprogteknologiske formål” (to natural language processing purposes). I trust it is not just for natural language processing purposes, – but for every purpose!?

If it is to be used in free software/databases then a CC0 or better license is a good idea. We are still waiting for Wikidata for Wiktionary, the yet waporware with a multilingual, collaborative and structured dictionary. This ressource is CC0-based. The “old” Wiktionary has surprisingly not been used that much by natural language processing researcher. Perhaps because of the anarchistic structure of Wiktionary. Wikidata for Wiktionary could hopefully help with us with structuring lexical data and improve the size and the utility of lexical information. With Retskrivningsordbogen as CC0 it could be imported into Wikidata for Wiktionary and extended with multilingual links and semantic markup.

The problem with Andreas Krause

Posted on Updated on

I first seem to have run into the name “Andreas Krause” in connection with NIPS 2017. Statistics with the Wikidata Query Service shows “Andreas Krause” to be one of the most prolific authors for that particular conference.

But who is “Andreas Krause”?

Google Scholar lists five “Andreas Krause”. An ETH Zürich machine learning researcher, a pharmacometrics researcher, a wood researcher, a economish networkish researcher  working from Bath and a Dresden-based battery/nano researcher. All the NIPS Krause works should likely be attributed to the machine learning researcher, and a read of the works reveal the affiliation to be to ETH Zürich.

An ORCID search reveals six “Andrea Krause”. Three of the Andreas Krause have no or almost no further information about them beyond the name and the ORCID identifier.

There is a Immanuel Krankenhaus Berlin rheumatologist which does not seem to be in Google Scholar.

There may even be more than these six “Andreas Krause”. For instance, the article Emotional Exhaustion and Job Satisfaction in Airport Security Officers – Work–Family Conflict as Mediator in the Job Demands–Resources Model has affiliation with “School of Applied Psychology, University of Applied Sciences and Arts Northwestern Switzerland, Olten, Switzerland”, thus topic and affiliation do not quite fit in with any of the previously mentioned “Andreas Krause”.

One interesting ambiguity is for Classification of rheumatoid joint inflammation based on laser imaging – which obviously is a rheumatology work but also has some machine learning aspects. There is computer scientist/machine learner Volker Tresp as co-author and the work is published in an IEEE venue. There is no affiliation on the “Andreas Krause” on the paper. It is likely the work of the rheumatologist, but you could also guess on the machine learner.

Yet another ambiguity is Biomarker-guided clinical development of the first-in-class anti-inflammatory FPR2/ALX agonist ACT-389949. The topic somewhat overlap between the pharmacokinetics and the domain of the Berlin researcher. The affiliation is to “Clinical Pharmacology, Actelion”, but interestingly, Google Scholar does not associate this paper with the pharmacokinetics researcher.

In conclusion, author disambiguation may be very difficult.

Scholia will can show the six Andreas Krause. But I am not sure that helps us very much.