Month: October 2012

Small solutions for big data and python shelve concurrency

Posted on Updated on

I am still on the lookout for a good database system: Movable, big, concurrent, fast, flexible and not necessarily requiring root access.

MySQL, good in many aspects, lacks flexibility: An ALTER TABLE can take hours.

MongoDB has a 2GB size limit on 32-bit.

For some reason I thought that SQLite was limited to 2GB on 32-bit (where on earth did I get that idea from?). But SQLite can potential store 140 terabytes. It may be limited by OS/filesystem. So what is that? 32-bit ext3 file size limit is from 16GiB to 2TiB says Wikipedia. Apparently my block sizes are 4KiB (reported with $ sudo /sbin/dumpe2fs /dev/sda7 | grep “Block size”), so if we can trust this online encyclopedia that anyone can edit it may be that I can have 2TiB SQLite databases. SQLite still has the ALTER TABLE problem, but my first attempt used SQLite as a key-value store with the values as JSON. News on Wikipedia also reports that Mr. Hipp is working on document-oriented UnQLite.

I was also considering the Python key-value store ‘shelve’ and its underlying databases (e.g., bsddb). However, somewhere in the documentation you can read that “The shelve module does not support concurrent read/write access”. I was slightly surprised by how wrong it goes when I executed the code below.

CherryPy vs Tornado benchmarking

Posted on Updated on

CherryPy is a Python-based web framework enabling you to make a dynamic web service without much setup and configuration. It comes with its own web server and a “Hello, World” can be constructed in six lines. The default setup might not be that fast, but it may be possible to speed it up, see Running CherryPy behind Apache using Mod_WSGI. I haven’t tried that.

Another Python-based web framework is Tornado. Its “Hello, World” is around 17 lines.

Below I have listed the results with Tornado and CherryPy default “Hello, World” based on ab, – Apache HTTP server benchmarking tool.

It seems that Tornado works well with concurrent connections being considerably faster than CherryPy, and on non-concurrent requests Tornado is around double as fast.