programming

How to quickly generate word analogy datasets with Wikidata

Posted on Updated on

One popular task in computational linguistics/natural language processing is the word analogy task: Copenhagen is to Denmark as Berlin is to …?

With queries to Wikidata Query Service (WDQS) it is reasonably easy to generate word analogy datasets in whatever (Wikidata-supported) language you like. For instance, for capitals and countries, a WDQS SPARQL query that returns results in Danish could go like this:

select
  ?country1Label ?capital1Label
  ?country2Label ?capital2Label
where { 
  ?country1 wdt:P36 ?capital1 .
  ?country1 wdt:P463 wd:Q1065 .
  ?country1 wdt:P1082 ?population1 .
  filter (?population1 > 5000000)
  ?country2 wdt:P36 ?capital2 .
  ?country2 wdt:P463 wd:Q1065 .
  ?country2 wdt:P1082 ?population2 .
  filter (?population2 > 5000000)
  filter (?country1 != ?country2)
  service wikibase:label
    { bd:serviceParam wikibase:language "da". }  
} 
limit 1000

Follow this link to get to the query and press “Run” to get the results. It is possible to download the table as CSV-formatted (see under “Download”). One issue to note that you have multiple entries for countries with multiple capital cities, e.g., Sydafrika (South Africa) is listed with Pretoria, Kapstaden (Cape Town) and Bloemfontein.

Mixed indexing with integer index in Pandas DataFrame

Posted on Updated on

Indexing in Python’s Pandas can at times be tricky. Here is an example with mixed indexing (.ix) with integer index:

I ran into the issue when I wanted index with integer for DataFrame representing EEG data in one of its methods

Hull level coloring of a cortical surface representation

Posted on Updated on

 

Hull level coloring of a cortical surface representation constructed by Heather Drury and David Van Essen.
Hull level coloring of a cortical surface representation constructed by Heather Drury and David Van Essen.

I have just rediscovered by old surface coloring function from the 2003 version of the Brede Toolbox. It can color a surface according to hull level. Here it is with a modified cortical surface representation provided by Heather Drury and David Van Essen.

Matlab code with the Brede Toolbox:

S = brede_sur_drury;
color = brede_sur_color(S, 'style', 'rgb');figure, 
brede_ta3_frame, brede_ta3_sur(S, 'color', color);

and then followed by

print -dpng hulllevelcoloring.png

Zipf plot for word counts in Brown corpus

Posted on

Image

There are various ways of plotting the distribution of highly skewed (heavy-tailed) data, e.g., with a histogram with logarithmically-spaced bins on a log-log plot, or by generating a Zipf-like plot (rank-frequency plot) like the above. This figure uses token count data from the Brown corpus as made available in the NLTK package.

For fitting the Zipf-curve a simple Scipy-based approach is suggested on Stackoverflow by “Evert”. More complicated power-law fitting is implemented on the Python package powerlaw described in Powerlaw: a Python package for analysis of heavy-tailed distributions that is based on the Clauset-paper.

Git: multiuser and multiple accounts

Posted on Updated on

We are still struggling somewhat with Git for multiple developers development of the Smartphone Brainscanner code. It may well be a RTFM-problem. We presumably have the Github Smartphone Brain Scanner code setup.

However, we also have a private department git accounts working with gitolite which brings some problems.

If you got multiple computers each with a different public key then you need extra tricks to be able to clone, push and pull from all computers. Here are the steps that got me working:

  1. I send one of my public keys to our department system administrators who then sets up an account with there specially developed script.
  2. With the account setup I can clone my gitolite-admin repository. git clone <git username>@<department git server>:gitolite-admin.
  3. The keydir at gitolite-admin/keydir/<git username>.pub is supposed to contain my public key from one of the computers. In a subdirectory I can put my public key from another computer, e.g., cp id_rsa.pub gitolite-admin/keydir/<name of other computer>/<git username>.pub
  4. Followed by the git commands git add, git commit and git push.
  5. Specify in the ~/.ssh/config the username of the git server. Under Host <department git server> put User <git username>.

To have other users access the repository I create I have tried:

  1. In gitolite-admin/conf/gitolite.conf added the a line such as @sbs2 = <my git username> <another user's git username> <a third user> and then under repo sbs2-Brain3D I added RW+ = @sbs2.

One user has reported that it now allows him to read and write in the repository, while cloning still is a problem for another user…

(originally published on Tumblr two months ago: Git: multiuser and multiple accounts)

Hack4dk contributions

Posted on Updated on

Ten projects was shown at the final showdown:

  • Kræn‘s (with help from Emma) Natmus Mosaic with autocropping and search facility. Code available from Github
  • Kim Bach and … Game with Daell’s Varehus catelogue: try to guess the decade
  • Henrik, Andreas and …(?), mail art mail box.
  • Mobil app with public art. List of nearby art shown with photos and on maps.
  • Public art search engine
  • Search engine and viewer for Copenhagen Police “Mandtaller”. Machine vision in Javascript.
  • Rasmus Erik: Wikipedia link visualization, image extract with classification og the Police images, quiz with decade. http://www.solsort.com
  • Heat map movie through time of Copenhageners
  • SMK image visualization.
  • Steen Thomassen: Join the Danish Wikipedia and the Danish film database, e.g., for Tommy Kenter. The useful tool is running from Wikimedia Labs server.

Winner became the heatmap movie with Kræn’s Natmus Mosaic viewer as runner up.

Reading data from the smartphone brainscanner

Posted on Updated on

With the NeurofeedbackWindow smartphone brainscanner application, hook up the emotiv to the smartphone, start the application press “Rec. Baseline”. After 5 minutes the phone has recorded some data from the electrodes and it is now available from a directory under “/sdcard”.

Hook the phone to a development computer with Necessitas and look for the file with

$ /opt/NecessitasQtSDK/android-sdk/platform-tools/adb shell

And copy the file with:

$ /opt/NecessitasQtSDK/android-sdk/platform-tools/adb pull “/sdcard/smartphonebrainscanner2_readings/” .

I then got two files: sbs2data_2013_09_11_13_47_59_.raw and sbs2data_2013_09_11_13_47_59_.meta. I am unaware of the format.