Month: October 2013

Zipf plot for word counts in Brown corpus

Posted on

Image

There are various ways of plotting the distribution of highly skewed (heavy-tailed) data, e.g., with a histogram with logarithmically-spaced bins on a log-log plot, or by generating a Zipf-like plot (rank-frequency plot) like the above. This figure uses token count data from the Brown corpus as made available in the NLTK package.

For fitting the Zipf-curve a simple Scipy-based approach is suggested on Stackoverflow by “Evert”. More complicated power-law fitting is implemented on the Python package powerlaw described in Powerlaw: a Python package for analysis of heavy-tailed distributions that is based on the Clauset-paper.


from __future__ import division
from itertools import *
from pylab import *
from nltk.corpus import brown
from string import lower
from collections import Counter
# The data: token counts from the Brown corpus
tokens_with_count = Counter(imap(lower, brown.words()))
counts = array(tokens_with_count.values())
tokens = tokens_with_count.keys()
# A Zipf plot
ranks = arange(1, len(counts)+1)
indices = argsort(-counts)
frequencies = counts[indices]
loglog(ranks, frequencies, marker=".")
title("Zipf plot for Brown corpus tokens")
xlabel("Frequency rank of token")
ylabel("Absolute frequency of token")
grid(True)
for n in list(logspace(-0.5, log10(len(counts)), 20).astype(int)):
dummy = text(ranks[n], frequencies[n], " " + tokens[indices[n]],
verticalalignment="bottom",
horizontalalignment="left")
show()

view raw

brownzipf.py

hosted with ❤ by GitHub

The mystery of the rotating building

Posted on Updated on

For hack4dk I found old images on Wikimedia Commons of scenes in central Copenhagen and took photos of the same places. One of old images was called Bibliotekspladsen 1913. The image is supposedly from 1913 and printed in “Forskønnelsen” magazine in 1923. It has been uploaded to Wikimedia Commons by user Saddhiyama, who writes that the photo is from the library garden. I went to the garden, took a photo and blended the two images with the result shown on Bibliotekspladsen remixed. It was slightly difficult to match my photo with the old image as trees covered the buildings in the new photo.

What is strange with the old photo is the small building to the left:

  1. In old maps you see buildings parallel to the long building, see, e.g., a reconstruction(?) of a 1725 map
  2. In a photo from 1880 a building is now orthogonal to the long building (if this is the same spot!?)
  3. In the photo from 1913 a building (the “provianthusforvalterbolig”) is now parallel to the long building.
  4. Now you can go (also with Google Maps) and see that a building is orthogonal to the long builing.

So what is the most likely: That the images do not display the same location or that they have demolished the building, rebuilt, demolished, rebuilt, demolish and rebuilt?

To make it more strange you can go the red house hiding in the corner (with Google Maps). There you can read above the door that the building is from 1818. Looking at the 1913 image it is difficult to imaging how you can get room for the red house between the other houses.

Git: multiuser and multiple accounts

Posted on Updated on

We are still struggling somewhat with Git for multiple developers development of the Smartphone Brainscanner code. It may well be a RTFM-problem. We presumably have the Github Smartphone Brain Scanner code setup.

However, we also have a private department git accounts working with gitolite which brings some problems.

If you got multiple computers each with a different public key then you need extra tricks to be able to clone, push and pull from all computers. Here are the steps that got me working:

  1. I send one of my public keys to our department system administrators who then sets up an account with there specially developed script.
  2. With the account setup I can clone my gitolite-admin repository. git clone <git username>@<department git server>:gitolite-admin.
  3. The keydir at gitolite-admin/keydir/<git username>.pub is supposed to contain my public key from one of the computers. In a subdirectory I can put my public key from another computer, e.g., cp id_rsa.pub gitolite-admin/keydir/<name of other computer>/<git username>.pub
  4. Followed by the git commands git add, git commit and git push.
  5. Specify in the ~/.ssh/config the username of the git server. Under Host <department git server> put User <git username>.

To have other users access the repository I create I have tried:

  1. In gitolite-admin/conf/gitolite.conf added the a line such as @sbs2 = <my git username> <another user's git username> <a third user> and then under repo sbs2-Brain3D I added RW+ = @sbs2.

One user has reported that it now allows him to read and write in the repository, while cloning still is a problem for another user…

(originally published on Tumblr two months ago: Git: multiuser and multiple accounts)