society

Valg til Wikimedia Foundation-bestyrelsen af affiliates-valgte medlemmer

Posted on Updated on

De såkaldte affiliates, hvilket er Wikimedia chapters, User groups og Thematic groups, har mulighed for at vælge to pladser til Wikimedia Foundations (WMF) bestyrelse (Board of Trustees). Tidligere har det blot været Chapters der har haft mulighed for at vælge medlemmer, men fra januar 2019 er det nu også det betydelige antal af User groups der får indflydelse. Som jeg forstår er det for at få en bredere fundering, måske specielt af hvad der betegnes “emerging communities”.

De to nuværende affiliates-valgte er tidligere formand Christophe Henner fra Frankrig og ukrainske Nataliia Tymkiv. Communities vælger tre bestyrelsesmedlemmer. Disse medlemmer er James Heilman, Canada, Dariusz Jemielniak, Polen og spanske María Sefidari der i øjeblikket er formand. I forhold til affiliates-valgte synes der at være en fornemmelse for at community-valgte er fra store communities: Engelsk Wikipedia, Spansk Wikipedia. Det gælder så ikke helt for den polsk-valgte Jemielniak, der dog har gjort sig bemærket med en engelsk-sproget bog.

Affiliates-valget vil ske hurtigt i løbet af foråret 2019, hvor der først er en periode med nominereringer og derefter det egentlige valg. En håndfuld Wikimedianere fungerer som facilitatorer for valget. Disse facilitatorer kan ikke samtidig være nominerede, men hvis de fratræder facilitatorrollen kan de godt stille op. Jeg har indtryk af at de to nuværende medlemmer genopstiller.

Wikimedia Danmark skal deltage i afstemningen og spørgsmålet er så hvem vi skal stemme på og hvilke kriterier vi skal benytte. Henner og Tymkiv virker udmærkede og har jo erfaring. I hvilken grad de har evner til at banke i bordet og komme med originale levedygtige visioner står mindre klart for mig. Af andre der muligvis vil nomineres kan være Shani Evenstein. Hun virker også udmærket.

En person der stiller op bør ud over det formelle krav om bestyrelsesværdighed, have vægtig bestyrelseserfaring, forståelse for Wikimedia-bevægelsen og være et rimeligt tilgængeligt ansigt i det internationale Wikimediamiljø. Derudover være indstillet på at lægge en god portion ulønnet arbejdstimer på skæver timer af døgnet, og være opmærksom på at man arbejder for WMF, – ikke for affiliates, community eller Wikipedia. Hvis man kigger på sammensætningen i WMF er Europa & Nordamerika godt repræsenteret, dog ingen fra Nordeuropa. Der er en læge (James Heilman), akademikere, grundlæggeren Jimmy Wales, en med økonomierfaring (Tanya Capuano) og forskellige andre erfaringer. Henner synes at være den eneste med teknisk erfaring (et element jeg ville værdsætte) og derudover kan man sige at der mangler repræsentation fra Latinamerika (omend Seridari jo taler spansk), Afrika og Østasien (Esra’a Al Shafei har rod i Bahrain).

Afstemningen koordineres på Meta ved Affiliates-selected Board seats. Der findes vejledning til vælgere på Primer for user groups. Den hollandske formand Frans Grijzenhout har oploadet en handy scorematrix for kandidaterne. Nomineringen har også sin egen side. Nomineringerne er åbne indtil 30. April 2019. Efter at nomineringerne er indkommet er der kort tid i april og lidt af maj til at udfritte de nominerede.

 

 

 

 

Advertisements

Luftige spørgsmål til Wikimedia Strategi 2030

Posted on

Wikimedia forsøger at tænke langsigtet og lægge en strategi der sigter mod året 2030. Et udkast er tilgængelig fra https://meta.wikimedia.org/wiki/Strategy/Wikimedia_movement/2017/Direction

Her er nogle luftige spørgsmål der måske ville kunne få folk til at tænke over tingene:

  1. Hvorfor skal vi ha’ en strategi? Bør Wikimedia ikke blot udvikling sig organisk? Kan vi overhovedet forsige meget til 2030? Hvis vi ikke allerede kender vores strategi sidder vi så ikke allerede fast?
  2. Sidder vi fast i wiki-interfacet?
  3. Skal vi fortsætte med PHP MediaWiki interfacet som det primære software?
  4. Hvorfor er Wikiversity ikke blevet større, og slet ikke eksisterende på dansk? Er det fordi folk ikke gide lave Wikiversity? Er det fordi vi ikke ved hvad wikiversity er eller skal være? Er det fordi wiki-tekniske ikke fungerer i undervisningssammenhæng. Hvad skal vi ændre for at få det til at fungere?
  5. Hvorfor laver folk ikke flere video? Er det fordi at det er teknisk for besværlig? Er det for produktionsmæssigt for besværligt? Hvordan kunne Wikimedia hjælpe?
  6. Hvorfor er Stackoverflow det primære sted for faglige spørgsmål og svar? Burde det ikke have været Wikimedia der var det?
  7. Skal Wikimedia Foundation modtage penge fra firmaer så som Google? Vil det kunne skabe et afhængighedsforhold? Ifølge Peter Gøtzsches mening er patientforeninger påvirket i uheldig retning på grund af afhængighed til medicinalfirmaer. Kan Wikimedia-bevægelsen løbe ind i samme problem? Skaber det problemer med pengedonation, for eksempel i forbindelse med lobbyvirksomhede til EU’s ophavsretsdirektiv?
  8. Hvorfor kan OpenStreetMap kører med et mindre budget? Skyldes det langt mindre server load? Burde Wikimedia neddrosle og vælge en slags OpenStreetMap-model med hvor server værket bliver bedre distribueret til andre?
  9. “Knowledge equity” er et af to centrale begreber i Wikimedia Foundations strategi og noget svært at oversætte. Financial equity er hvad der på danske betegnes egenkapital. Et latinsk ord der nærmer sig findes i Den Store Danske, ellers er min nærmeste tanke det forældede udtryk “billighed”, – “ret og billighed” som det hedder i en dansk sang. Et sådant ord kan vi næppe bruge. Hvad kan vi på dansk forstå som “knowledge equity”?
  10. Kan Wikimedia komme i en situation som man har set Cochrane Collaboration hvor den professionaliserede del af organisationen kommer til at udmanøvrere græsrødderne? Hvad gør vi for at det ikke ske?
  11. Skal vi være stolt af at den danske Wikipedia stort set er opbygget gratis? Sidst jeg spurgte på den danske Wikipedias Landsbybrønd om Wikimedia Strategi blev det nævnt.
  12. Knowledge as a service følger en as-a-service-mønster man ser i datalogi. Her kan det hedder Platform-as-a-service e software-as-a-service. Hvad skal vi egentlig ligge i det? Jeg selv har skabt Scholia, et websted der viser videnskabelige data fra Wikidata via SPARQL-forespørgsler til Wikidata Query Service og Ordia, der gør det samme for leksikografiske data. Som sådan falder tanker om knowledge as a service fint i slag, – og jeg har da også forgæves forsøgt at erindre om det var mig der var med til at foreslå begrebet ved et internationalt Wikimedia-møde i 2017.
  13. Skal Wikimedia engagere sig i aktivisme, så som det sås til afstemningen om EU’s nye ophavsretsdirektiv? Har vi nogen succeshistorier på at det hjælper?
  14. Wikimedia Danmark har fået penge af Wikimedia Foundation til blandt andet et roll-up-banner. Det har været brugt i nogle få sammenhænge og vist været i tv. Er det sådan at Wikimedia Foundation skal bruge dets penge?
  15. Den visuelle editor synes at kunne hjælpe mange nye brugere, men er redigering af Wikipedia på en smartphone ikke meget besværlig? Kan man overhoved gøre noget ved det?
  16. Skal Wikimedia Foundation støtte forskere der bygger værktøjer eller undersøger fænomener på Wikimedia’s wikier?
  17. Normalt fungerer Wikipedia hurtigt, men hvis man kommer til et net der er langsomt oplever man at der kan være frustrerende at arbejde med, for eksempel Wikidata. Er det mon ikke frustrere at arbejde med wikier fra lande som ikke har hurtigt Internet? HVad kan der gøres ved det?
  18. Linux udvikles med en distribueret model, og sådan gør man med mange andre software systemer. Hvor er Wikipedia og andre Wikimedia wikier ikke distribuerede hvor fork og pull requests er nemt?
  19. Hvor mange af Wikimedia Foundations indsamlede midler skal anvendes på events, så som Wikimania?

Open questions for the EU copyright directive

Posted on Updated on

I am wondering if there are any good sources for the scope and effect. I was interviewed by a Danish radio channel and I must admit that it was difficult for me to say much in that respect.

The proposal for the directive says that “not-for-profit online encyclopedia” and makes an exception. To me it is clear that the lawmakers have had Wikipedia in mind, – and thanks for that. But there are several issue:

  1. Would Wikipedia be characterized as not-for-profit when the typical license is the Creative Commons with no clause for the non-commercial?
  2. Would Wikimedia Commons fall in under the “not-for-profit online encyclopedia”? Some of my photos are used in commercial online news sites which makes at least Wikimedia Commons commercial in some sense. I wouldn’t characterize Wikimedia Commons as an encyclopedia, but rather as a media archive.
  3. What is the scope with respect to other Wikimedia sites, Wikiquote, Wikibooks, Wikiversity, Wikisource, Wikivoyage and possibly others? It seems to me that yet again there is an issue, – as I would not characterize them as encyclopedias.
  4. What other site are likely to be hit by either Article 15 or Article 17? For instance, Wikia, Referata, Soundcloud, Reddit, Bandcamp, WordPress, 500px.com? Referata, I imagine, is under 10 mio Euro but over 3 years and hit? Reddit would be hit by both articles? Soundcloud Article 17? (Back in June 2018, WordPress noted their concern: https://transparency.automattic.com/2018/06/12/were-against-bots-filtering-and-the-eus-new-copyright-directive/ Reddit has this Wednesday published an article https://redditblog.com/2019/03/20/error-copyright-not-detected-what-eu-redditors-can-expect-to-see-today-and-why-it-matters/

Are we able to say something about the possible outcomes we would see if the directive proposal is approved, for instance:

  1. Large 10+ mio Euro companies, particularly Google by their ownership of YouTube, regularly paying rights organizations to address Article 17?
  2. Large parts of YouTube not being available to Europeans?
  3. Twitter and Facebook stop showing snippets from linked sites?
  4. European newspapers paying Twitter and Facebook to display snippets in Twitter and Facebook?
  5. Twitter and Facebook paying European newspapers to allow the display of snippets?
  6. Websites such as Soundcloud needing to implement advanced copyright detection systems for audio?
  7. Some American Web 2.0 companies blacklisting access from Europe?
  8. Widespread implementation of identity verifications in Web 2.0 systems?
  9. Widespread implementation of plagiarism-like detection on Web 2.0 platforms where users may not be able to upload content, even if it is legal?
  10. Google using Article 17 against Facebook wrt. freebooting? See https://www.youtube.com/watch?v=t7tA3NNKF0Q (via YourWeirdEx@reddit)
  11. Small Internet forum owners needing to subscribe to the services of upload filter service providers?
  12. Google News shutting down in Europe? See https://www.theguardian.com/technology/2018/nov/18/google-news-may-shut-over-eu-plans-to-charge-tax-for-links

Scholia is more than scholarly profiles

Posted on Updated on

Scholia, a website originally started as service to show scholarly profiles from data in Wikidata, is actually not just for scholarly data.

Scholia can also show bibliographic information for “literary” authors and journalists.

An example that I have begun on Wikidata is for the Danish writer Johannes V. Jensen whose works pose a very interesting test case for Wikidata, because the interrelation between the works and editions can be quite complicated, e.g., news paper articles being merged into a poem that is then published in an edition that are then expanded and re-printed… Also the scholarly and journalistic work about Johannes V. Jensen can be recorded in Wikidata. Scholia currently records 30 entries about Johannes V. Jensen, – and that does not necessarily includes works about works written by Johannes V. Jensen.

An example of a bibliography of a journalist is that of Kim Wall. Her works are almost always addressing very unique topics, – fairly relevant as sources in Wikipedia articles. Examples include an article on a special modern Chinese wedding tradition in Fairy Tale Romances, Real and Staged and an article on furries It’s not about sex, it’s about identity: why furries are unique among fan cultures.

An interesting feature about most of Wall’s articles, is that she let the interviewee have the final word by adding a quotation as the very final paragraph. That is also the case with the two examples linked above. I suppose that say something of Wall’s generous journalistic approach.

 

 

Code for love: algorithmic dating

Posted on

One of the innovative Danish TV channels, DR3, has a history of dating programs with Gift ved første blik as, I believe, the initial program. A program with – literally – an arranged marriage between to participants matched by what was supposed to be relationship experts. Exported internationally as Married at First Sight the stability of the marriages has been low as very few of the couples have stayed together, – if one is to trust the information on the English Wikipedia.

Now my colleagues at DTU Compute has been involved in a new program called Koden til kærlighed (the code for love). Contrary to Gift ved første blik the participants are not going to get married during the program, but will live together for a month, – and as the perhaps most interesting part – the matches are determined by a learning algorithm: If you view the streamed program of the first episode you will have the delight of seeing glimpses of data mining Python code with Numpy (note the intermixed camelcase and underscore :).

The program seems to have been filmed with smartphone cameras for the most part. The participants are four couples of white heterosexual millenials. So far we have seen their expectations and initial first encounters, – so we are far from knowing whether my colleagues have done a good job with the algorithmic matching.

According to the program, the producers and the Technical University of Denmark have collected information from 1’400 persons in “well-functioning” relationships. There must have been pairs among the 1’400 so the data scientist can train the algorithm using pairs as the positive examples and persons that are not pairs as negative examples. The 350 new singles signed up for the program can then be matched together with the trained algorithm. And four couples of – I suppose – the top ranking matches were selected for the program.

Our Professor Jan Larsen was involved in the program and explained a bit more about the setup in the radio. The collected input was based on responses to 104 questions for 667 couples (apparently not quite 1’400). Important questions may have been related to sleep and education.

It will be interesting to follow the development of the couples. There are 8 episodes in this season. It would have been nice with more technical background: What are the questions? How exactly is the match determined? How is the importance of the questions determined? Has the producers done any “editing” in the relationships? (For instance, why are all participants in the age range 20-25 years?). When people matches how is the answer to the question matching: Are the answers homophilic or heterophilic? During the program there are glimpses of questions, that might have been used. Some examples are “Do you have a tv-set?”, “Which supermarket do you use?”and “How many relationships have you ended?” It is a question whether a question such as “Do you have a tv-set?” is a any use. 667 couples compared to 104 questions are not that much to train a model and one should think that less relevant questions could confuse the algorithm more than it would help.

“Og så er der fra 2018 og frem øremærket 0,5 mio. kr. til Dansk Sprognævn til at frikøbe Retskrivningsordbogen.”

Posted on Updated on

From Peter Brodersen I hear that the budget of the Danish government for next year allocates funds to Dansk Sprognævn for the release of the Retskrivningsordbogen – the Danish official dictionary for word spelling.

It is mentioned briefly in an announcement from the Ministry of Culture: “Og så er der fra 2018 og frem øremærket 0,5 mio. kr. til Dansk Sprognævn til at frikøbe Retskrivningsordbogen.”: 500.000 DKK allocated for the release of the dataset.

It is not clear under which conditions it is released. An announcement from Dansk Sprognævn writes “til sprogteknologiske formål” (to natural language processing purposes). I trust it is not just for natural language processing purposes, – but for every purpose!?

If it is to be used in free software/databases then a CC0 or better license is a good idea. We are still waiting for Wikidata for Wiktionary, the yet waporware with a multilingual, collaborative and structured dictionary. This ressource is CC0-based. The “old” Wiktionary has surprisingly not been used that much by natural language processing researcher. Perhaps because of the anarchistic structure of Wiktionary. Wikidata for Wiktionary could hopefully help with us with structuring lexical data and improve the size and the utility of lexical information. With Retskrivningsordbogen as CC0 it could be imported into Wikidata for Wiktionary and extended with multilingual links and semantic markup.

Female GitHubbers

Posted on Updated on

In Wikidata, we can record the GitHub user name with the P2037 property. As we typically also has the gender of the person we can make a SPARQL query that yields all female GitHub users recorded in Wikidata. There ain’t no many. Currently just 27.

The Python code below gets the SPARQL results into a Python Pandas DataFrame and queries the GitHub API for followers count and adds the information to a dataframe column. Then we can rank the female GitHub users according to follower count and format the results in a HTML table

Code

 

import re
import requests
import pandas as pd

query = """
SELECT ?researcher ?researcherLabel ?github ?github_url WHERE {
  ?researcher wdt:P21 wd:Q6581072 .
  ?researcher wdt:P2037 ?github .
  BIND(URI(CONCAT("https://github.com/", ?github)) AS ?github_url)
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
"""
response = requests.get("https://query.wikidata.org/sparql",
                        params={'query': query, 'format': 'json'})
researchers = pd.io.json.json_normalize(response.json()['results']['bindings'])

URL = "https://api.github.com/users/"
followers = []
for github in researchers['github.value']:
    if not re.match('^[a-zA-Z0-9]+$', github):
        followers.append(0)
        continue
    url = URL + github
    try:
        response = requests.get(url,
                                headers={'Accept':'application/vnd.github.v3+json'})
        user_followers = response.json()['followers']
    except: 
        user_followers = 0
    followers.append(user_followers)
    print("{} {}".format(github, followers))
    sleep(5)

researchers['followers'] = followers

columns = ['followers', 'github.value', 'researcherLabel.value',
           'researcher.value']
print(researchers.sort(columns=['followers'], ascending=False)[columns].to_html(index=False))

Results

The top one is Jennifer Bryan, a Vancouver statistician, that I do not know much about, but she seems to be involved in R-studio.

Number two is Jessica McKellar is a well-known figure in the Python community. Number four and five, Olga Botvinnik and Vanessa Sochat, are bioinformatician and neuroinformatician, respectively (or was: Sochat has apparently left the Poldrack lab in 2016 according to her CV). Further down the list we have people from the wikiworld, Sumana Harihareswara, Lydia Pintscher and Lucie-Aimée Kaffee.

I was surprised to see that Isis Agora Lovecruft is not there, but there is no Wikidata item representing her. She would have been number three.

Jennifer Bryan and Vanessa Sochat are almost “all-greeners”. Sochat has just a single non-green day.

I suppose the Wikidata GitHub information is far from complete, so this analysis is quite limited.

followers github.value researcherLabel.value researcher.value
1675 jennybc Jennifer Bryan http://www.wikidata.org/entity/Q40579104
1299 jesstess Jessica McKellar http://www.wikidata.org/entity/Q19667922
475 triketora Tracy Chou http://www.wikidata.org/entity/Q24238925
347 olgabot Olga B. Botvinnik http://www.wikidata.org/entity/Q44163048
124 vsoch Vanessa V. Sochat http://www.wikidata.org/entity/Q30133235
84 brainwane Sumana Harihareswara http://www.wikidata.org/entity/Q18912181
75 lydiapintscher Lydia Pintscher http://www.wikidata.org/entity/Q18016466
56 agbeltran Alejandra González-Beltrán http://www.wikidata.org/entity/Q27824575
22 frimelle Lucie-Aimée Kaffee http://www.wikidata.org/entity/Q37860261
21 isabelleaugenstein Isabelle Augenstein http://www.wikidata.org/entity/Q30338957
20 cnap Courtney Napoles http://www.wikidata.org/entity/Q42797251
15 tudorache Tania Tudorache http://www.wikidata.org/entity/Q29053249
13 vedina Nina Jeliazkova http://www.wikidata.org/entity/Q27061849
11 mkutmon Martina Summer-Kutmon http://www.wikidata.org/entity/Q27987764
7 caoyler Catalina Wilmers http://www.wikidata.org/entity/Q38915853
7 esterpantaleo Ester Pantaleo http://www.wikidata.org/entity/Q28949490
6 NuriaQueralt Núria Queralt Rosinach http://www.wikidata.org/entity/Q29644228
2 rongwangnu Rong Wang http://www.wikidata.org/entity/Q35178434
2 lschiff Lisa Schiff http://www.wikidata.org/entity/Q38916007
1 SigridK Sigrid Klerke http://www.wikidata.org/entity/Q28152723
1 amrapalijz Amrapali Zaveri http://www.wikidata.org/entity/Q34315853
1 mesbahs Sepideh Mesbah http://www.wikidata.org/entity/Q30098458
1 ChristineChichester Christine Chichester http://www.wikidata.org/entity/Q19845665
1 BinaryStars Shima Dastgheib http://www.wikidata.org/entity/Q42091042
1 mollymking Molly M. King http://www.wikidata.org/entity/Q40705344
0 jannahastings Janna Hastings http://www.wikidata.org/entity/Q27902110
0 nmjakobsen Nina Munkholt Jakobsen http://www.wikidata.org/entity/Q38674430