Month: November 2021

Sprogteknologisk Konference 2021

Posted on

Sprogteknologisk Konference 2021 (language technology conference 2021) ran 16 November 2021 at the University of Copenhagen. The program is available at the conference homepage and I have also set it up in Scholia as an event. Mostly with Danish talks with a few English talks from abroad and English posters.

The CEO of Certainly, formerly BotXO, spoke. The company became known in Denmark after publishing a well-trained Danish BERT model that now has been used in several contexts. The company has expanded considerable from only supporting Danish, to now supporting many other languages with their conversational technology. The have published Nordic language BERTs.

Professor Barbara Plank (Scholia) gave an overview of natural language processing and Nora Hollenstein (Scholia) gave a talk about research that combines the studies on eye movements and language processing.

Klaus Bjørn Larsen from Roskilde Kommune presented the language technology work in the municipality. On the municipality homepage they have a written language Danish chatbot, “Kommune-Kiri” (lower right corner). If you ask the bot “where can I borrow a book?” (“hvor kan jeg låne en bog?”) it cannot identity a suitable answer. It suggest using fewer works, but “book” (“bog”) neither works. If I ask “What are the opening hours for the library” (“hvornår har biblioteket åbnet”), then it can correctly point to a suitable webpage with anwser to the question.

Larsen mentioned that an advantage with bots is that users are not afraid to ask “stupid” questions.

An audio version of Kommune-Kiri is available on the phone number +45 89 87 17 52. It only works for three categories of questions. I phoned and asked in Danish whether food and bio rubbish should be separated from the “rest rubbish” (German: “restmüll”) and with a bit of back and forth I go the impression that it should. Browsing the homepage, it seems to be correct.

Christian Plaschke, one of the organizers, spoke about sprogteknologi.dk and the efforts around it. Among the projects under the umbrella is Det Centrale Ord Register (COR) – the central word register, – a lexicographic effort to give each Danish word an identifier so for instance Retstavningsordbogen and Den Danske Ordbog (Scholia) can be linked. As I understand, they promise to make the index sufficiently open so it can be used in Wikidata. For the Wikidata lexemes, we already have a word identifier for Danish word. It is not yet so comprehensive as the big standard reference works. We only have around 12,000 Danish lexemes. Around 7,000 Danish Wikidata lexemes are now linked to the DanNet, the Danish wordnet resource. The central person in the COR project is Peter Juel Henrichsen. Some natural language processing is happening around the semantic part of the COR project.

Claus Thornby Larsen from the EU Commission spoke about the machine translation at the European Union institutions. eTranslation is the neural machine translator from the European Union. It is possible to get access to the web application, – as far as I understood.

Thierry Declerck (Scholia) spoke about European Language Resource Coordination and Professor Dr. Phil. Georg Rehm (Scholia) about The European Language Grid. I maintain the awesome-danish list for Danish language technology and the European Language Grid catalogue is a kind of “awesome-europe” list with more metadata.

The Python package DaNLP for Danish natural language processing has been extended. There is now coreference resolution and named entity disambiguation (NED). When I tried the NED part yesterday, I could not get it to work, but the developer Ophélie Lacroix (Scholia) has been fixing it quickly today so it might work now.

There were several other interesting contributions. I do not think there is (going to be) a proceedings.