Aug 26, 2009

Mobile (not so) open standards

Yesterday I promised I'll talk about why I hate mobile phones. Of course I didn't mean all of them. Just the ones I have to deal with. Why? Well my mobile phone kind of died few days ago. I have a Nokia N73 and it's really quite good phone even if it's a bit old by today's standards. You control the phone by using "joystick" kind of thing in the upper part of keyboard. I decided to include image so you don't have to look for it :-)

So this joystick stopped working (even slightest touch would be evaluated as pushing it, therefore it was unusable). I didn't have my backup phone with me, but one friend gave me her battered Siemens S55. So what was the problem? Well I have the same sim card for almost 10 years now. Back then only 100 contacts would fit on it. I have almost 300 contacts in my N73. So how do I get all contacts from one phone to the other? Normally I could just send them through bluetooth, but since I couldn't really control my N73 this was out of question. I was barely able to turn on the bluetooth. I thought that I'll use SyncML interface to get vCards from N73 to my computer and then sync them again to the S55. In the end I kind of did, but boy was that an unpleasant experience!

So what exactly happened? I installed OpenSync libraries and tools and using multisyncgui I created sync group with one side being file-sync plugin and other was syncml-obex-client plugin. Configuraion of file-sync plugin was mostly just changing path to directory where I wanted to sync. Final version looked like this:

<?xml version="1.0"?>

Configuration for syncml-obex-client appeared to be much more challenging. It appears that Nokia N73 has two quirks:

  • It only talks to SyncML client if it says its name is "PC Suite"
  • It contains a bug that causes it to freeze after certain amount of data if configuration is not correct

First of these quirks is mentioned in almost every tutorial on data synchronization in Linux. However the second one caused me to lose quite some time. My Nokia N73 would freeze after synchronizing approximately 220-240 contacts. To continue working I had to restart the whole phone.In the end I found out that I need to set parameter recvLimit to 10000 in order to synchronize everything. Final setting for syncml-obex-client looks like this:

<?xml version="1.0"?>
<!-- Nokia Obex server channel -->
<identifier>PC Suite</identifier>

So after all that I was able to get vCards from my N73 to my notebook. For every vCard OpenSync created file in directory /tmp/sync. Now came the interesting part. How to get these vCards to Siemens S55?

Simple Google search on Siemens S55 and synchronization in Linux seemed to suggest that tool most suited to do the job was scmxx. This little app is specialized on certain Siemens phones. According to some manuals it was supposed to be able to upload vCards themselves, however I couldn't get it to work as scmxx was complaining about invalid command line arguments.After some testing I found out that it could access and change sim card phone numbers.

Unfortunately for me, my sim card has limit of 100 phone numbers, each with 14 character identifier (name). This meant I needed to convert vCards from N73 to special format that scmxx used. Mentioned format looked something like this:

1,"09116532168","Jones Rob"
2,"09223344567","Moore John"

First column being number of slot that will be overwritten by new information, second column is number and third one name of contact (less than 15 characters).

So I fired up vim and started coding conversion script. It didn't take long and I had my contact in the old-new phone. There are a lot of hard-coded things in that script since I don't plan to ever use it again but you can download it from my dropbox. Consider it public domain, and if anyone asks I didn't have anything to do with it :-)

import os
import re


class PbEntry(object):

def __init__(self, name, tel, year, month, day): = name = tel
self.year = year
self.month = month = day

def cmp_pb(e1, e2):
if e1.year > e2.year:
return -1
elif e1.year < e2.year:
return 1
if e1.month > e2.month:
return -1
elif e1.month < e2.month:
return 1
return 0

telRe = re.compile('TEL(;TYPE=\w+)*:([*#+0-9]+)', re.M)
revRe = re.compile('REV:(\d{4})(\d{2})(\d{2}).*', re.M)
nameRe = re.compile('^N:(.*);(.*);;;', re.M)
def get_entry_from_text(text):
ret =
surname = None
name = None
tel = None
rev = None
if ret:
surname =
name =

ret =
if ret:
tel =

if surname and name:
fn = "%s %s" % (surname,name)
elif surname:
fn = surname
fn = name

if fn:
ret ='(.{0,14}).*', fn)
fn =

ret =
year =
month =
day =

return PbEntry(fn, tel, year, month, day)

entries = []

files = os.listdir('/tmp/sync')
for file in files:
fh = open('/tmp/sync/%s' % file, 'r')
content =
entry = get_entry_from_text(content)

entries = sorted(entries, cmp=cmp_pb)

i = 1
for entry in entries:
print '%d,"%s","%s"' % (i,,
i = i + 1

I had my share of incompatibilities between mobile phones, computers and other devices. Fortunately most of devices being sold today use open communication protocols for sharing of data (and other stuff). Too bad people had to put so much energy into reverse engineering proprietary solutions in the past. Just ranting about this vendor lock-in could be spread on quite a few pages. Imagine having 300+ contacts and calendar information in your phone of brand X. When you are buying your new phone, you would be able to synchronize your data only if you bought new phone also from brand X. Would that affect your decision? It sure would affect mine.

Now I have a choice. After fixing my old N73 I will start looking into new phone. So far HTC Hero looks pretty cool and reviews are not half bad.

Aug 25, 2009
So this year's Google Summer of Code is officially over. Today 19:00 UTC was deadline for sending in evaluations for both mentors and students. Therefore I think some kind of summary what was happening and what I was doing is in order.

I was working on implementing neat idea that would allow previously impossible things for Gentoo users. Original name for the idea was "Tree-wide collision checking and provided files database". You can find it on Gentoo wiki still. I later named the project collagen (as in collision generator). Of course implemented system is quite a bit different from original wiki idea. Some things were added, some were removed. If you want to relive how I worked on my project, you can read my weekly reports at gentoo-soc mailing list (I will not repeat them here). Some information was aggregated also on As final "pencils down" date approached I created final bugreports of features not present in delivered release (and bugs there were present for that matter). Neither missing features, nor present bugs are a real show-stopper, they mostly affect performance. And more importantly I plan to continue my work on this project and perhaps other Gentoo projects. I guess some research what those projects are is in order :-)

Before GSoC I kind of had an idea how open-source projects work since I've participated with some to a degree. However I underestimated a lot of things, and now I would do them differently. But that's a good thing. I kind of like the idea that no project is a failed one as long as you learn something from it. It reminds me of recent Jeff Atwood's post about Microsoft Bob and other disasters of software engineering. To quote him:
The only truly failed project is the one where you didn't learn anything along the way.
I believe I have learned a lot. I believe that if I started collagen now, it would be much better in the end. And the best thing is that I can still do that. I get to continue my project and learn some more. If I learned anything during my work on collagen it's this:
If you develop something in language without strong type checking CREATE THE DAMN UNIT TESTS! It will make you life later on much easier.
In next episode: Why I think Gmail is corrupting minds of people and why I hate mobile phones

Jul 26, 2009

Technical decorating that makes sense

I have been using Python for several small-ish project in past year or two. In that time I have never found reason to really use decorators. You might have seen them in Java source codes in form of
void getVal()
Python has same syntax, but because it's scripting language certain things are different (read: more flexible :-) ).

Now to the main thing. Where did I use it? As I was deciding what ORM library/tool to use in my GSoC project, I came to conclusion that I could probably use Django DB backend to work with database. This way I would avoid doing the same thing (ORM) twice. Once for backend and once again for web interface later on. As always, things are not as straightforward as they seem in the beginning. Django is web framework and it's tied with its database backend. In other words, the backend was not created with standalone usage in mind. Some things get a bit hairy because of that. These things are mostly connection management and exception handling.

Stackoverflow to the rescue (once again). There was already a question regarding use of only db part of django framework. Normally function like this in django:

would have to become this:

def add_package(self, name):
p = Package.objects.filter(name=name)
if len(p) > 0:
return p[0].id
p = Package(name=name)
Imagine that this exception handling, rollbacks and connection closing would have to be in every function. A bit ugly isn't it? We cannot really use inheritance to our advantage, but we could use metaclass(es). I like look of decorators a bit better, so that's what I used. So final code looks like this:

def dbquery(f):
def newfunc(*args, **kwargs):
return f(*args, **kwargs)
except Exception, e:
raise e
return newfunc

def add_package(self, name):
p = Package.objects.filter(name=name)
if len(p) > 0:
return p[0].id
p = Package(name=name)
For other functions we could just add simple @dbquery in the beginning and voila, problem solved. Maybe there are even cleaner and/or better ways to do the same thing but at least I finally found non-trivial use for decorators.

Jun 30, 2009
First a bold note. I already have repository on Gentoo infrastructure for working on my GSoC project. Check it out if you want.

Last time I mentioned I won't go into technical details of my GSoC project any more on this blog. For that you can keep an eye on my project on gentooexperimental and/or gentoo mailing lists, namely gentoo-qa and gentoo-soc. But there is one interesting thing I found out while working on Collagen.

One part of my project was automation of creating of chroot environment for compiling packages. For this I created simple shell script that you can see in my repository. I will pick out one line out of previous version of this script:
mount -o bind,ro "$DIR1" "$DIR2"
What does this line do? Or more specifically what should it do? It should create a virtual copy of conents of directory DIR1 inside directory DIR2. Copy in DIR2 should be read-only, that means no creating new files, no changing of files and so on. This command succeeds and we as far as we know everything should work OK right? Wrong!

Command mentioned above actually fails silently. There is a bug in current linux kernels (2.6.30 as of this day). When you execure mount with "-o bind,ro" as arguments, the "ro" part is silently ignored. Unfortunately it is added to /etc/mtab even if it was ignored. Therefore you would not see that DIR2 is writable unless you tried writing to it yourself. Current proper way to create read-only bind mounts is therefore this:
mount -o bind "$DIR1" "$DIR2"
mount -o remount,ro "$DIR2"
There is issue of race conditions with this approach, but in most situations that should not be a problem. You can find more information about read-only bind mounts in LWN article about the topic.

Jun 11, 2009

Reinventing the wheel

It's been a week and something since my last post and again quite a lot of things happened. I had several tasks from my last week:
  • get in touch with gentoo-infra(structure) team
  • improve way build logs were handled
  • look at possible ways to create tinderbox chroot environments
  • make it possible to test all versions of dependencies
I managed to get acquainted with gentoo-infra team a bit, and get a few answers too. Remember last time when I was talking about security issues of pickle module when used over untrusted connection?  That's not an issue anymore apparently since we'll be using encrypted connection in final version. We can consider route between Matchbox and Tinderboxen to be friendly environment.

Few people suggested that I look into few similar project Gentoo, namely catalyst and AutotuA. Main feature of catalyst is release engineering, not testing per se. But it can also create tinderboxes (effectively chroot environments). Perhaps some ideas could be used for my project, but so far it seems that catalyst is not the one and only for me. AutotuA is much more similar to collagen (did I mention that's name of my project yet?). There is master server (web application accepting jobs) and slaves (processing jobs). There were quite a few interesting design decisions (such as keeping jobs in git repository) and some of it I will at least reuse. Integration would be possible, but for now I have a feeling that such integration would be just as complicated as writing my own master/slaves. That is because AutotuA is generic system for jobs and their processing not specific for package compilation and testing. I'll keep both projects in mind during my future endeavours.

As far as build log handling goes, my last POC (proof of concept) code simply grabbed stdout/stderr of whole install process. It also used higher-level interfaces for installing packages in gentoo, I switched to lower APIs because I need to do few things higher-level APIs did not offer. Most of these things had to do with dependency handling. Best way to explain what I have to do is using example. But first a little Gentoo package installation introduction. Package "recipes" called ebuilds reside in so-called portage tree. Most packages have more than one ebuild because there are always older and newer versions supported simultaneously. Each of these package versions has its own set of dependencies, that is other packages that need to be installed for package to compile/run. These dependencies look something like this:
samba? ( >=net-fs/samba-3.0.0 )
This means that package would need any version of glib-2 library, and if samba feature (USE flag) is enabled then also samba version 3 or higher would be required. My task is to verify that package can be compiled with ALL allowed versions of ALL dependencies. Now the promised example.

Lets assume that we want to install package mc (midnight commander). There are currently 2 versions of app-misc/mc in portage: 4.6.2_pre1 and 4.6.1-r4. List of their dependencies is quite long, but to show you principle I'll use just one dependency, namely sys-libs/ncurses. Version 4.6.2 of mc depends on sys-libs/ncurses and version 4.6.1-r4 depends on >=sys-libs/ncurses-5.2-r5. There are currently 2 versions of sys-libs/ncurses in portage: 5.7 and 5.6-r2. Based on these dependencies it should be possible to install package mc (both versions) with either ncurses-5.7 or 5.6-r2. From this point on there is ping-pong of installing ncurses-5.6-r2, then mc-4.6.1-r4/4.6.2_pre1 followed by uninstalling them all and installing ncurses-5.7 and installing mc-4.6.1-r4/4.6.2_pre1. If mc-4.6.2_pre1 fails to compile with ncurses-5.6-r2 we will know that ebuild needs to be modified with dependency >=sys-libs/ncurses-5.7. All this has to be repeated for every dependency for every version of every package in portage tree. Currently there are 26623 ebuilds in portage tree. Now imagine that some of them will have to be compiled even 30-50 times to test all dependency versions. Good thing we will have dedicated tinderboxes for compiling all those ebuilds.

One more thing for now. Gentoo has project management website based on redmine for all of GSoC students on From now on I will aggregate all of documentation for my project there. This blog will go on will less technical details and I will link to documentation where needed.

Jun 2, 2009

First commit for GSoC

Recently I finished all of my duties as a student for this term and I could therefore spend the weekend catching up on GSoC (since I am one week behind schedule). In the end it turned out to be pretty productive weekend.

I'll summarize basic architecture without any images (I'll create them later this week probably when everything will settle down). There are two core packages:
  • Matchbox
  • Tinderbox
Matchbox is master server that knows what still needs to be compiled and collects all information. There is always only one Matchbox. There can however be more Tinderboxes. These machines connect to Matchbox and ask for next package to emerge (compile). After emerging package they collect information about files in the package, use flags, emerge environment and error logs from compile phase. This information is then sent back to Matchbox. Tinderbox then asks for another file to emerge. repeat while true.

First thing I did was create basic data model for storing data about compiled packages. What use flags were used, error logs and stuff like that. Lot of things are not in the model, for example information about tinderboxes, but for now this will do. UML diagram is on following picture:

This model should allow efficient storage of data and a lot of flexibility to boot. There can be more versions of the same package (of course) and also packages can change package category (happens quite often). We can also collect different data sets based on USE flags.

With basic data model in place it was time for some serious prototyping :-) Naturally I decided to split implementation into two parts, one for each core modules (more to come later). Matchbox is simple listening server waiting for incoming connections. I wanted to simplify network communication for myself, so I used python module pickle. This module is able to create string representation of classes/functions and basic data types. Because of this I was able to use objects as  network messages. Objects representing Matchbox command set:

class MatchboxCommand(object): pass

class GetNextPackage(MatchboxCommand):

class AddPackageInfo(MatchboxCommand):
    def __init__(self, package_info):
        self.package_info = package_info

On the other side Tinderbox understands these commands (for now):
class MatchboxReply(object): pass

class GetNextPackageReply(MatchboxReply):
    def __init__(self, package_name, version, use_flags):
        self.package_name = package_name
        self.version = version
        self.use_flags = use_flags

Communication (simplified) goes something like this:
msg = GetNextPackage()
msg_pickled = pickle.dumps(msg)

data = sock.recv()
command = pickle.loads(data)
if type(command) is GetNextPackage:
        package = get_next_package_to_emerge()
        msg = GetNextPackageReply(package)
        msg_pickled = pickle.dumps(msg)

There is one BIG caveat to this kind of communication. It is very easy tampered with. This is directly from pickle documentation:

Warning: The pickle module is not intended to be secure against erroneous or maliciously constructed data. Never unpickle data received from an untrusted or unauthenticated source.

We will have to decide whether to reimplement this part, or trust Gentoo infrastructure. So what do we have for now?
  • Basic communication between Matchbox/Tinderbox
  • Compiling works with file list/emerge environment/stdout/stderr/etc being send back to Matchbox
There is still much more ahead of us:
  • package selection on Matchbox side
  • block resolution on Tinderbox
  • rest of services (web interface, client, etc)
Since GSoC students didn't get git repositories on gentoo servers just yet you can see the code in gentoo-collagen@github. So long and thanks for all the fish (for now)

Apr 27, 2009

Accepted for GSoC 2009

Few weeks ago I mentioned that I applied for GSoC 2009 as a student. Things have cleared up a bit and I can now say that I've been accepted (YAY!). Soon I'll start working to improve quality of my beloved Linux distribution. How best to do that than to scratch my own itch? I am now going to quote Eric S. Raymond:
Every good work of software starts by scratching a developer's personal itch.
This quote is from one of most interesting books written on Software engineering, specifically with Open-Source in mind. C'mon! I know you know the book! Yes you guessed right, it's "The Cathedral and the Bazaar".

Now the obvious question is...what's my itch? I've been using Gentoo happily for over 4 years now and it's getting better and better. One thing is still missing though. When emerging (that is installing) new application I never know how much space it will occupy once it's on my hard drive. I only know download size. I say that's not enough! I want to know at least a ballpark figure on size before I try to install some work of devil. However, this is not exactly focus of my project "Tree-wide collision checking and files database", but it could easily become byproduct of solution for my GSoC task. I will most probably keep blogging about my work on GSoC project. This will make it easier to sort various thoughts and make it easier to create progress reports in future. Oh..just so that I won't forget. My mentor is Andrey Kislyuk, apparently a bioinformatics PhD student interested in privacy and security. I better get to know him better, seems like and interesting person :-).

Apr 11, 2009

Applying for Gentoo project in GSoC

I don't know if you've heard of Google Summer of Code, but most probably yes. Basic description is
Google Summer of Code (GSoC) is a global program that offers student developers stipends to write code for various open source software projects...Through Google Summer of Code, accepted student applicants are paired with a mentor or mentors from the participating projects, thus gaining exposure to real-world software development scenarios and the opportunity for employment in areas related to their academic pursuits. In turn, the participating projects are able to more easily identify and bring in new developers. Best of all, more source code is created and released for the use and benefit of all.
It sure is a great idea and I was always intrigued to participate. This year I finally decided to try it out. Gentoo, my distribution of choice was chosen as one of projects to mentor students. I read the ideas page and some of them seemed pretty interesting. One of them was "Tree-wide collision checking and files database". What was the idea? Basically, Gentoo is a source based distribution so noone knows what files will get installed with a certain package. This can be a problem for quality assurance team (QA). Emerge checks for file collisions before installing so you will never screw up your system, but installing some less known packages can give you a headache. Implementing this idea should help fix this situation and most probably be used for other purposes as well. One of them could be approximation of size of package to be installed. This is common in binary distributions, because they know size of package, but Gentoo has a lot of small "gotcha's" in this department. One of them is USE flags, great way to customize your distribution, but a nightmare for this sort of thing. Good thing emerge is such a great package manager :-). I will not repeat things I've said elsewhere so if you want to read more, you can read my application.

If I get accepted (I should know on April the 20th), there is a long way from where I am to making a great project for Gentoo. Hopefully I'll be able to keep up and help out a bit.

Mar 2, 2009

Power to the masses

Few months back I had a rant about participation in opensource. Things have moved a bit since then. I had a few more commits to the mob branch of gstfs repository on More importantly Bob Copeland got in touch with me and one more developer who voiced his interest in the project and offered to hand over gstfs to us. Understandably he doesn't have as much free time to spend on such project as an average university student :-).

During the weekend I created project page, new code repository with my patches included, developer and user mailing lists and few issue tickets. I just hope I will be able to keep up the work on the project at least a bit better than this blog...Fingers crossed.


HDD failure imminent

I suppose people who work with computers for a few years saw similar message at least once. Unfortunately it's quite common for hard drives to fail. There is early warninig system that can predict a lot of these misfortunes. It's called S.M.A.R.T. and it is in fact quite smart :-) A lot of HDDs come with this monitoring disabled for (to me) unknown reason. Maybe it's performance reasons, maybe manufacturers don't want users to know their HDDs fail. Aaah...conspiracy theories :)

Enough of being smart though (pun intended). Recently, over 5 years old computer of my parents refused to boot when one of HDDs (320GB WD Caviar) was connected. No matter what I did, Windows wouldn't boot with that HDD connected. The HDD was (still is actually) under warranty, but I really wanted to save the data. Most important files were backed up elsewhere, but my music collection and some movies waiting to be seen were not. I'll skip the boring stuff. Since the computer had other problems my parents decided to buy new one. With 320GB WD Caviar connected even Vista would not boot (old computer was XP).

I made one final attemt to save data. I booted Ubuntu live cd. To my big surprise, Ubuntu did not just "see" the hard drive. It was able to mount it without problems. It didn't even complain. I just backed up the hard drive, did low level format (e.g. dd if=/dev/zero of=/dev/sda bs=1M) and suddenly windows was able to boot without problems. I had one other problematic 80GB Seagate HDD that I remembered and the outcome was the same. Windows was able to see it after low level format. These HDDs were not system HDDs, so even if MBR was corrupted I shouldn't have mattered. I couldn't find anything final on the Internet about this type of HDD "failures" so any info is welcome. S.M.A.R.T. is not complaining so it seems that I have 2 good HDDs in my hands now. Linux saves the day! :-)

Jan 16, 2009

2B Free || ! 2B Free

Recently ext4 filesystem was marked as stable with release of Linux 2.6.28. Since I like bleeding edge from time to time and backup my files regularly anyway I decided to give it a spin. As far as performance is concerned, I have nothing to report yet, since I haven't been using it that long. But as usuall I found certain annoyance :-)

I was going through my filesystems and converting them one-by-one (after doing one more backup). When it came to /var I hit the wall though. df showed that there is free space (more than 400MB) but tar was telling me there is not enough space on the filesystem to create a directory (ENOSPC). So what was it? I was looking around and finally found the problem. Since the size of /var is only 1GB on my computer, mkfs.ext4 decided I will never use more than ~65000 inodes. Problem is that I have a lot of small files on the filesystem. Ebuilds, git and svn repositories and standard /var stuff. This together meant that I hit the 65000 mark quite easily whithout filling up the filesystem.

Solution to my problem was obvious from this point on. Recreate /var filesystem while manually overriding mkfs.ext4's choice for maximum inode count. Voila, ext4 seems working well from this point on.

Jan 13, 2009
First of all...All hail our new overlord. And by overlord I mean year 2009. I hope you all will have a great time. I know I will :-). I didn't write for some time, because I was travelling then I was celebrating holidays with my family and friends. All in all I didn't have so much time to keep my information up to date not to mention doing anything resembling work. That's changing NOW!.

I recently bought new camera (lovely Nikon D90) and also decided I need to backup my previous photos to more than 2 places. I realized you can never have enough backups after a few failed HDDs. So what were the options I was considering?
  • Google's Picasa: 20$/year for 10GB  storage space
  • Flickr: 25$/year for unlimited storage and better sharing/privacy settings, presentation options etc.
I didn't consider other services because...well because I didn't.

Now the issue was...How to upload all of my photos (several gigabytes)? Flickr has client for Windows/MacOS, but not for Linux (The orignal client appears to work through wine though). Kflickr to the rescue! I started uploading photos in no time. But I wouldn't be writing this blog entry if everything went according to plan now would I?

Everything seemed to work, the photos were on the web. I could see them, organize them, tag name it. Then I wanted to download original file from certain photo (for reason I don't remember). How great was my surprise when the file was <1MB in size. The originals I had were ~3 MB. Something rotten in here. The files were obviously recompressed with lower jpeg quality settings before being uploaded. Not all of them were this way though. It seemed like it has something to do with license I used for the files. Power is in the source, Luke so there I was. I wanted to investigate the problem and maybe fix it.

Unfortunately opening Kflickr project files with Kdevelop and trying to debug didn't work. For some reason the gdb was ignoring my breakpoins as if the application was compiled without debugging information. It was however compiled with -g3 (all debugging info). So far I was unable to properly diagnose the orignal bug, but I wrote to author of Kflickr asking for information. Now let's wait.