Jun 30, 2009
First a bold note. I already have repository on Gentoo infrastructure for working on my GSoC project. Check it out if you want.

Last time I mentioned I won't go into technical details of my GSoC project any more on this blog. For that you can keep an eye on my project on gentooexperimental and/or gentoo mailing lists, namely gentoo-qa and gentoo-soc. But there is one interesting thing I found out while working on Collagen.

One part of my project was automation of creating of chroot environment for compiling packages. For this I created simple shell script that you can see in my repository. I will pick out one line out of previous version of this script:
mount -o bind,ro "$DIR1" "$DIR2"
What does this line do? Or more specifically what should it do? It should create a virtual copy of conents of directory DIR1 inside directory DIR2. Copy in DIR2 should be read-only, that means no creating new files, no changing of files and so on. This command succeeds and we as far as we know everything should work OK right? Wrong!

Command mentioned above actually fails silently. There is a bug in current linux kernels (2.6.30 as of this day). When you execure mount with "-o bind,ro" as arguments, the "ro" part is silently ignored. Unfortunately it is added to /etc/mtab even if it was ignored. Therefore you would not see that DIR2 is writable unless you tried writing to it yourself. Current proper way to create read-only bind mounts is therefore this:
mount -o bind "$DIR1" "$DIR2"
mount -o remount,ro "$DIR2"
There is issue of race conditions with this approach, but in most situations that should not be a problem. You can find more information about read-only bind mounts in LWN article about the topic.



Share/Save/Bookmark
Jun 11, 2009

Reinventing the wheel

It's been a week and something since my last post and again quite a lot of things happened. I had several tasks from my last week:
  • get in touch with gentoo-infra(structure) team
  • improve way build logs were handled
  • look at possible ways to create tinderbox chroot environments
  • make it possible to test all versions of dependencies
I managed to get acquainted with gentoo-infra team a bit, and get a few answers too. Remember last time when I was talking about security issues of pickle module when used over untrusted connection?  That's not an issue anymore apparently since we'll be using encrypted connection in final version. We can consider route between Matchbox and Tinderboxen to be friendly environment.

Few people suggested that I look into few similar project Gentoo, namely catalyst and AutotuA. Main feature of catalyst is release engineering, not testing per se. But it can also create tinderboxes (effectively chroot environments). Perhaps some ideas could be used for my project, but so far it seems that catalyst is not the one and only for me. AutotuA is much more similar to collagen (did I mention that's name of my project yet?). There is master server (web application accepting jobs) and slaves (processing jobs). There were quite a few interesting design decisions (such as keeping jobs in git repository) and some of it I will at least reuse. Integration would be possible, but for now I have a feeling that such integration would be just as complicated as writing my own master/slaves. That is because AutotuA is generic system for jobs and their processing not specific for package compilation and testing. I'll keep both projects in mind during my future endeavours.

As far as build log handling goes, my last POC (proof of concept) code simply grabbed stdout/stderr of whole install process. It also used higher-level interfaces for installing packages in gentoo, I switched to lower APIs because I need to do few things higher-level APIs did not offer. Most of these things had to do with dependency handling. Best way to explain what I have to do is using example. But first a little Gentoo package installation introduction. Package "recipes" called ebuilds reside in so-called portage tree. Most packages have more than one ebuild because there are always older and newer versions supported simultaneously. Each of these package versions has its own set of dependencies, that is other packages that need to be installed for package to compile/run. These dependencies look something like this:
=dev-libs/glib-2*
samba? ( >=net-fs/samba-3.0.0 )
This means that package would need any version of glib-2 library, and if samba feature (USE flag) is enabled then also samba version 3 or higher would be required. My task is to verify that package can be compiled with ALL allowed versions of ALL dependencies. Now the promised example.

Lets assume that we want to install package mc (midnight commander). There are currently 2 versions of app-misc/mc in portage: 4.6.2_pre1 and 4.6.1-r4. List of their dependencies is quite long, but to show you principle I'll use just one dependency, namely sys-libs/ncurses. Version 4.6.2 of mc depends on sys-libs/ncurses and version 4.6.1-r4 depends on >=sys-libs/ncurses-5.2-r5. There are currently 2 versions of sys-libs/ncurses in portage: 5.7 and 5.6-r2. Based on these dependencies it should be possible to install package mc (both versions) with either ncurses-5.7 or 5.6-r2. From this point on there is ping-pong of installing ncurses-5.6-r2, then mc-4.6.1-r4/4.6.2_pre1 followed by uninstalling them all and installing ncurses-5.7 and installing mc-4.6.1-r4/4.6.2_pre1. If mc-4.6.2_pre1 fails to compile with ncurses-5.6-r2 we will know that ebuild needs to be modified with dependency >=sys-libs/ncurses-5.7. All this has to be repeated for every dependency for every version of every package in portage tree. Currently there are 26623 ebuilds in portage tree. Now imagine that some of them will have to be compiled even 30-50 times to test all dependency versions. Good thing we will have dedicated tinderboxes for compiling all those ebuilds.

One more thing for now. Gentoo has project management website based on redmine for all of GSoC students on soc.gentooexperimental.org. From now on I will aggregate all of documentation for my project there. This blog will go on will less technical details and I will link to documentation where needed.




Share/Save/Bookmark
Jun 2, 2009

First commit for GSoC

Recently I finished all of my duties as a student for this term and I could therefore spend the weekend catching up on GSoC (since I am one week behind schedule). In the end it turned out to be pretty productive weekend.

I'll summarize basic architecture without any images (I'll create them later this week probably when everything will settle down). There are two core packages:
  • Matchbox
  • Tinderbox
Matchbox is master server that knows what still needs to be compiled and collects all information. There is always only one Matchbox. There can however be more Tinderboxes. These machines connect to Matchbox and ask for next package to emerge (compile). After emerging package they collect information about files in the package, use flags, emerge environment and error logs from compile phase. This information is then sent back to Matchbox. Tinderbox then asks for another file to emerge. repeat while true.

First thing I did was create basic data model for storing data about compiled packages. What use flags were used, error logs and stuff like that. Lot of things are not in the model, for example information about tinderboxes, but for now this will do. UML diagram is on following picture:


This model should allow efficient storage of data and a lot of flexibility to boot. There can be more versions of the same package (of course) and also packages can change package category (happens quite often). We can also collect different data sets based on USE flags.

With basic data model in place it was time for some serious prototyping :-) Naturally I decided to split implementation into two parts, one for each core modules (more to come later). Matchbox is simple listening server waiting for incoming connections. I wanted to simplify network communication for myself, so I used python module pickle. This module is able to create string representation of classes/functions and basic data types. Because of this I was able to use objects as  network messages. Objects representing Matchbox command set:

class MatchboxCommand(object): pass

class GetNextPackage(MatchboxCommand):
    pass

class AddPackageInfo(MatchboxCommand):
    def __init__(self, package_info):
        self.package_info = package_info

On the other side Tinderbox understands these commands (for now):
class MatchboxReply(object): pass

class GetNextPackageReply(MatchboxReply):
    def __init__(self, package_name, version, use_flags):
        self.package_name = package_name
        self.version = version
        self.use_flags = use_flags

Communication (simplified) goes something like this:
Tinderbox
msg = GetNextPackage()
msg_pickled = pickle.dumps(msg)
sock.sendall(msg_pickled)

Matchbox
data = sock.recv()
command = pickle.loads(data)
if type(command) is GetNextPackage:
        package = get_next_package_to_emerge()
        msg = GetNextPackageReply(package)
        msg_pickled = pickle.dumps(msg)
        sock.sendall(msg_pickled)

There is one BIG caveat to this kind of communication. It is very easy tampered with. This is directly from pickle documentation:

Warning: The pickle module is not intended to be secure against erroneous or maliciously constructed data. Never unpickle data received from an untrusted or unauthenticated source.

We will have to decide whether to reimplement this part, or trust Gentoo infrastructure. So what do we have for now?
  • Basic communication between Matchbox/Tinderbox
  • Compiling works with file list/emerge environment/stdout/stderr/etc being send back to Matchbox
There is still much more ahead of us:
  • package selection on Matchbox side
  • block resolution on Tinderbox
  • rest of services (web interface, client, etc)
Since GSoC students didn't get git repositories on gentoo servers just yet you can see the code in gentoo-collagen@github. So long and thanks for all the fish (for now)

Share/Save/Bookmark