Page rank of scientific papers with citation in Wikidata – so far

Posted on Updated on

A citation property has just be created a few hours ago, – and as of writing still not been deleted. It means we can describe citation network, e.g., among scientific papers.

So far we have added a few citations, – mostly from papers about Zika. And now we can plot the citation network or compute the network measures such as page rank.

Below is a Python program using everything with Sparql, Pandas and NetworkX:

statement = """
select ?source ?sourceLabel ?target ?targetLabel where {
  ?source wdt:P2860 ?target .
  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "en" .

service = sparql.Service('')
response = service.query(statement)
df = DataFrame(response.fetchall(),

df.sourceLabel = df.sourceLabel.astype(unicode)
df.targetLabel = df.targetLabel.astype(unicode)

g = nx.DiGraph()
g.add_edges_from(((row.sourceLabel, row.targetLabel)
    for n, row in df.iterrows()))

pr = nx.pagerank(g)
sorted_pageranks = sorted((rank, title)
    for title, rank in pr.items())[::-1]

for rank, title in sorted_pageranks[:10]:
    print("{:.4} {}".format(rank, title[:40]))

The result:

0.02647 Genetic and serologic properties of Zika
0.02479 READemption-a tool for the computational
0.02479 Intrauterine West Nile virus: ocular and
0.02479 Internet encyclopaedias go head to head
0.02479 A juvenile early hominin skeleton from D
0.01798 Quantitative real-time PCR detection of 
0.01755 Zika virus. I. Isolations and serologica
0.01755 Genetic characterization of Zika virus s
0.0175 Potential sexual transmission of Zika vi
0.01745 Zika virus in Gabon (Central Africa)--20

Mixed indexing with integer index in Pandas DataFrame

Posted on Updated on

Indexing in Python’s Pandas can at times be tricky. Here is an example with mixed indexing (.ix) with integer index:

I ran into the issue when I wanted index with integer for DataFrame representing EEG data in one of its methods

Virtual machine for Python with Vagrant

Posted on Updated on

I may have managed to setup a virtual machine with Vagrant, partially following instructions from the vagrant homepage.

$ sudo aptitude purge vagrant virtualbox virtualbox-dkms virtualbox-qt
$ locate Vagrantfile
$ rm -r ~/.vagrant.d/
$ rm -r ~/virtual
$ rm ~/Vagrantfile 

$ sudo aptitude install vagrant
$ vagrant init hashicorp/precise32
$ vagrant up
There was a problem with the configuration of Vagrant. The error message(s)
are printed below:

* The box 'hashicorp/precise32' could not be found.

$ vagrant box add precise32
$ ls -l .vagrant.d/boxes/precise32
total 288340
-rw------- 1 fnielsen fnielsen 295237632 Oct  3 15:58 box-disk1.vmdk
-rw------- 1 fnielsen fnielsen     14103 Oct  3 15:58 box.ovf
-rw-r--r-- 1 fnielsen fnielsen       505 Oct  3 15:58 Vagrantfile

$ vagrant up
There was a problem with the configuration of Vagrant. The error message(s)
are printed below:

* The box 'hashicorp/precise32' could not be found.

$ vagrant box remove precise32
$ vagrant box add precise
$ rm Vagrantfile
$ vagrant init precise
$ vagrant up
$ vagrant ssh 
$ uname -a
Linux vagrant-ubuntu-precise-32 3.2.0-69-virtual #103-Ubuntu SMP Tue Sep 2 05:28:41 UTC 2014 i686 i686 i386 GNU/Linux
$ whoami

$ sudo aptitude install python-pip
$ sudo pip install numpy
$ sudo aptitude install python-dev
$ sudo pip install numpy
$ python
>>> import numpy
>>> f = open('Hello, virtual world.txt', 'w')
>>> f.write('Hello, virtual world')
>>> f.close()
>>> exit()
$ strings ~/VirtualBox\ VMs/fnielsen_1412345235/box-disk1.vmdk | grep 'Hello, virtual world.txt'
Hello, virtual world.txt
Hello, virtual world.txt

Somewhere inbetween I erased old Virtualbox files in “VirtualBox VMs” directory: “rm -r test_1406195091/” and “rm -r pythoner/”.

Musing over Muse

Posted on

This account details the process of getting a Muse talking:

In Ubuntu’s ‘Bluetooth New Device Setup’ I see after initiating the pairing by pressing 6 seconds on the Muse device button:

Device: 00-06-66-68-9f-ae
Type: Unknown.

I.e., no name and it continues to show ‘Searching for devices…’ with the ‘continue’ button disabled.

With sudo hcidump I get (among the results)

> HCI Event: Extended Inquiry Result (0x2f) plen 255
 bdaddr 00:06:66:68:9F:AE mode 2 clkoffset 0x5b13 class 0x240704 rssi -40
 Unknown type 0x4d with 8 bytes data
 Unknown type 0x00 with 6 bytes data
 Unknown type 0x04 with 9 bytes data

Ubuntu Forum has “11.04 Bluetooth Scanning Endlessly and Not Finding my Phone” where an answer suggests “modprobe btusb sco rfcomm bnep l2cap”. I have btusb, rfcomm and bnep, but not sco and l2cap. Inspired by another web page we can do:

$ hcitool scan
00:06:66:68:9F:AE Muse
$ sdptool records 00:06:66:68:9F:AE 
Service Name: RN-iAP
Service RecHandle: 0x10000
Service Class ID List:
 "Serial Port" (0x1101)
Protocol Descriptor List:
 "L2CAP" (0x0100)
 "RFCOMM" (0x0003)
 Channel: 1
 "" (0x1200)
Service Name: Wireless iAP
Service RecHandle: 0x10001
Service Class ID List:
 UUID 128: 00000000-deca-fade-deca-deafdecacaff
Protocol Descriptor List:
 "L2CAP" (0x0100)
 "RFCOMM" (0x0003)
 Channel: 2
Language Base Attr List:
 code_ISO639: 0x656e
 encoding: 0x6a
 base_offset: 0x100
$ sudo hcitool info 00:06:66:68:9F:AE
Requesting information ...
 BD Address: 00:06:66:68:9F:AE
 Device Name: Muse
 LMP Version: 3.0 (0x5) LMP Subversion: 0x1a31
 Manufacturer: Cambridge Silicon Radio (10)
 Features page 0: 0xff 0xff 0x8f 0xfe 0x9b 0xff 0x59 0x83
 <3-slot packets> <5-slot packets> <encryption> <slot offset> 
 <timing accuracy> <role switch> <hold mode> <sniff mode> 
 <park state> <RSSI> <channel quality> <SCO link> <HV2 packets> 
 <HV3 packets> <u-law log> <A-law log> <CVSD> <paging scheme> 
 <power control> <transparent SCO> <broadcast encrypt> 
 <EDR ACL 2 Mbps> <EDR ACL 3 Mbps> <enhanced iscan> 
 <interlaced iscan> <interlaced pscan> <inquiry with RSSI> 
 <extended SCO> <EV4 packets> <EV5 packets> <AFH cap. slave> 
 <AFH class. slave> <3-slot EDR ACL> <5-slot EDR ACL> 
 <sniff subrating> <pause encryption> <AFH cap. master> 
 <AFH class. master> <EDR eSCO 2 Mbps> <EDR eSCO 3 Mbps> 
 <3-slot EDR eSCO> <extended inquiry> <simple pairing> 
 <encapsulated PDU> <non-flush flag> <LSTO> <inquiry TX power> 
 <extended features> 
 Features page 1: 0x01 0x00 0x00 0x00 0x00 0x00 0x00 0x00
$ hcitool name 00:06:66:68:9F:AE
$ sudo hcitool cc 00:06:66:68:9F:AE

The ‘Bluetooth New Device Setup’ now manages to get through: It claims that “Paired” is “Yes”, but “Type” is still “Unknown”. The address is set correctly.

With 32-bit bluetooth library installed the muse-io in the SDK can now start:

$ sudo aptitude install libbluetooth3:i386 
$ ./muse-io --preset 14 --device 00:06:66:68:9F:AE --osc osc.udp://localhost:5000

oscdump does not directly work due to Matt hard-coding a directory name.

But still no apparently data in MuseLab. muse-player does not work out of the box.

$ ./muse-player
ImportError: /home/fnielsen/projects/Muse/ wrong ELF class: ELFCLASS32

Moving the libraries provided by the SDK and trying again.

$ mkdir attick
$ mv libl* attick/
$ ./muse-player 
 from google.protobuf.internal import enum_type_wrapper
ImportError: cannot import name enum_type_wrapper

The Ubuntu 12.04 version of protobuf apparently does not work. google.__version__ is not set and there is no version number in the code! “dpkg -l python-protobuf” reports 2.4.1-1ubuntu2. “sudo aptitude remove python-protobuf” seems shaky because there are a range of dependencies that looks important, though they only seem to be related to Ubuntu One. pip install protobuf gets into trouble because of version dependencies, so within a virtualenv environment we can do

$ pip install protobuf
$ pip install pyliblo

This may require:

$ sudo aptitude install liblo-dev

Executing muse-player directly in the virtualenv produces an error because hardcoding of the python path (/usr/bin/env python should have been used). Then there is a dependency on Scipy, so Numpy and Scipy should be install in virtualenv:

$ pip install numpy scipy

The bad news is that muse-io requires 32-bit version of libliblo while my muse-player through Python requires 64-bit. The solution seems to be to move muse-io to a directory independent of the Python files and in that directly also put the SDK-provided liblo library files.

$ ./muse-io --preset 14 --device 00:06:66:68:9F:AE --osc osc.udp://localhost:5000
$ python ~/projects/Muse/muse-player -l udp://localhost:5000

These commands produce a continuous output like:

Playback Time: 12.3s : Sending Data 1410548303.53 /muse/acc fff 222.66 976.56 50.78
Playback Time: 12.4s : Sending Data 1410548303.55 /muse/acc fff 222.66 976.56 54.69
Playback Time: 12.4s : Sending Data 1410548303.57 /muse/acc fff 222.66 980.47 54.69

Zipf plot for word counts in Brown corpus

Posted on


There are various ways of plotting the distribution of highly skewed (heavy-tailed) data, e.g., with a histogram with logarithmically-spaced bins on a log-log plot, or by generating a Zipf-like plot (rank-frequency plot) like the above. This figure uses token count data from the Brown corpus as made available in the NLTK package.

For fitting the Zipf-curve a simple Scipy-based approach is suggested on Stackoverflow by “Evert”. More complicated power-law fitting is implemented on the Python package powerlaw described in Powerlaw: a Python package for analysis of heavy-tailed distributions that is based on the Clauset-paper.

You can’t fool Python

Posted on

There is this funny thing with Python that allows you to have static variables in functions by putting a mutable object as the default argument.

In Ruby default arguments are evaluated each time the function is called (I am told), so you can make recursive calls with two ruby functions calling each other with the default input arguments:

Ruby complains that the stack level becomes too deep.

In Python the default argument is evaluated once when the function is defined, so the result of calling one of the Python functions will be different than calling one of the Ruby functions.

Graph spectra with NetworkX

Posted on Updated on


I was looking for a value of how clustered a network is. I thought that somewhere in graph spectrum was a good place to start and that in the Python package NetworkX there would be some useful methods. However, I couldn’t immediately see any good methods in NetworkX. Then Morten Mørup mentioned something about community detection and modularity and I became diverged, but now I am back again at the graph spectrum.

The second smallest eigenvalue of the Laplacian matrix of the graph seems to represent reasonably well what I was looking for. Apparently that eigenvalue is called the Algebraic connectivity.

NetworkX has a number of graph generators, and for small test cases the algebraic connectivity seems to give an ok value for how clustered the network is, – or rather how non-clustered it is.