Small solutions for big data and python shelve concurrency

Posted on Updated on

I am still on the lookout for a good database system: Movable, big, concurrent, fast, flexible and not necessarily requiring root access.

MySQL, good in many aspects, lacks flexibility: An ALTER TABLE can take hours.

MongoDB has a 2GB size limit on 32-bit.

For some reason I thought that SQLite was limited to 2GB on 32-bit (where on earth did I get that idea from?). But SQLite can potential store 140 terabytes. It may be limited by OS/filesystem. So what is that? 32-bit ext3 file size limit is from 16GiB to 2TiB says Wikipedia. Apparently my block sizes are 4KiB (reported with $ sudo /sbin/dumpe2fs /dev/sda7 | grep “Block size”), so if we can trust this online encyclopedia that anyone can edit it may be that I can have 2TiB SQLite databases. SQLite still has the ALTER TABLE problem, but my first attempt used SQLite as a key-value store with the values as JSON. News on Wikipedia also reports that Mr. Hipp is working on document-oriented UnQLite.

I was also considering the Python key-value store ‘shelve’ and its underlying databases (e.g., bsddb). However, somewhere in the documentation you can read that “The shelve module does not support concurrent read/write access”. I was slightly surprised by how wrong it goes when I executed the code below.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s