Roberta’s +5-fine workshop on natural language processing

Posted on

Interacting Minds Centre (Scholia) at Aarhus University (Scholia) held a finely organized workshop, NLP workshop @IMC, Fall 2019, in November 2019 where I gave a talk title Detecting the odd-one-out among Danish words and phrases with word embeddings.

Fritz Günther (Scholia) keynoted from his publication Vector-Space Models of Semantic Representation From a Cognitive Perspective: A Discussion of Common Misconceptions (Scholia). A question is whether the distributed semantic models/vector models we identify from computing on large corpora makes sense with respect to cognitive theories: “Although DSMs might be valuable for engineering word meanings, this does not automatically qualify them as plausible psychological models”.

Two Årups displayed their work on SENTIDA “a new tool for sentiment analysis in Danish”. In its current form it is an R package. It has been described in the article SENTIDA: A New Tool for Sentiment Analysis in Danish (Scholia). According to their evaluation, SENTIDA beats my AFINN tool for Danish sentiment analysis.

word-intrusion
From our paper Combining embedding methods for a word intrusion task.

My own talk on Detecting the odd-one-out among Danish words and phrases with word embeddings was based on the distributional semantics representation evaluation work together with Lars Kai Hansen (Scholia): Our 2017 paper Open semantic analysis: The case of word level semantics in Danish (Scholia) and our newer 2019 paper Combining embedding methods for a word intrusion task (Scholia). The idea is to look on a Danish textual odd-one-odd task/word intrusion task and see what models trained on various corpora can do. Our current state-of-the-art is a combination of embedding models with fastText as the primary one and using Wembedder for proper nouns.

Two Aarhus students, Jan Kostkan and Malte Lau Petersen (Scholia) are downloading European parliament text data and analyzing them. A text corpora from Folketinget, the Danish Parliament may be available with 10s of millions of sentences.

Ulf Berthelsen, whom I share the Teaching platform for developing and automatically tracking early stage literacy skill (Scholia) research project with, spoke on late state literary skill.

Natalie Schluter (Scholia) spoke on glass ceiling effects in the natural language processing field. She has an associated paper The glass ceiling in NLP (Scholia) from EMNLP 2018.

Matthew Wilkens (Scholia) spoke on “Geography and gender in 20.000 British novels”, – large-scale analysis of how geography was used in British novels. This part fell much in alignment with some work I did a few years ago with geographically mapping narrative locations of Danish literature with the Littar website. and the paper Literature, Geolocation and Wikidata (Scholia) from the Wiki Workshop 2016.

Nielsen2019Detecting
Screenshot from Littar: Narrative locations of Danish literature.

There was a number of other contributions in the workshop.

The second day of the workshop featured hands-on text analysis with among others Rebekah Baglini and Matthew Wilkens getting participants to work on prepared Google Colab notebooks.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s