Apr 8, 2011

Introduction to packaging Java

Packaging Java libraries and applications in Fedora has been my daily bread for almost a year now. I realized now is the time to share some of my thoughts on the matter and perhaps share a few ideas that upstream developers might find useful when dealing with Linux distributions.

This endeavour is going to be split into several posts, because there are more sub-topics I want to write about. Most of this is going to be based on my talk I did @ FOSDEM 2011. Originally I was hoping to just post the video, but it seems to be taking more time than I expected :-)

If you are not entirely familiar with status of Java on Linux systems it would be a good idea to first read a great article by Thierry Carrez called The real problem with Java in Linux distros. A short quote from that blog:
The problem is that Java open source upstream projects do not really release code. Their main artifact is a complete binary distribution, a bundle including their compiled code and a set of third-party libraries they rely on.
There is no simple solution and my suggestions are only mid-term workarounds and ways to make each other's (upstream ↔ downstream) lives easier. Sometimes I am quite terse in suggestions, but if need be I'll expand them later on.


Part 1: General rules of engagement

Today I am going to focus on general rules that apply to all Java projects wishing to be packaged in Linux distributions:
  • Making source releases
  • Handling Dependencies
  • Bugfix releases
For full understanding a short summary of general requirements for packages to be added to most Linux distributions:
  • All packages have to be built from source
  • No bundled dependencies used for building/running
  • Have single version of each library that all packages use
There are a lot of reasons for these rules and they have been flogged to death multiple times in various places. It mostly boils down to severe maintenance and security problems when these rules are not followed.

Making source releases

As I mentioned previously most Linux distributions rebuild packages from source even when there is an upstream release that is binary compatible. To do this we need sources obviously :-) Unfortunately quite a few (mostly Maven) projects don't do source release tarballs. Some projects provide source releases without build scripts (build.xml or pom.xml files). Most notable examples are Apache Maven plugins. For each and every update of one of these plugins we have to checkout the source from upstream repository and generate the tarball ourselves.
All projects using Maven build system can simply make packagers' lives easier by having following snippet in their pom.xml files:
    <build>
      <plugins>
 ...
 <plugin>
   <artifactId>maven-assembly-plugin</artifactId>
   <configuration>
     <descriptorRefs>
       <descriptorRef>project</descriptorRef>
     </descriptorRefs>
   </configuration>
   <executions>
     <execution>
       <id>make-assembly</id>
       <phase>package</phase>
       <goals>
         <goal>single</goal>
       </goals>
     </execution>
   </executions>
 </plugin>
 ...
      </plugins>
    </build>
  
This will create -project.zip/tar.gz files containing all the files needed to rebuild package from source. I have no real advice for projects using Ant for now, but I'll summarise them next time.

Handling dependencies

I have a feeling that most Java projects don't spend too much time thinking about dependencies. This should change so here are a few things to think about when adding new dependencies to your project.

Verify if the dependency isn't provided by JVM

Often packages contain unnecessary dependencies that are provided by all recent JVMs. Think twice if you really need another XML parser.

Try to pick dependencies from major projects

Major projects (apache-commons libraries, eclipse, etc.) are much more likely to be packaged and supported properly in Linux distributions. If you use some unknown small library packagers will have to package that first and this can sometimes lead to such frustrating dependency chains they will give up before packaging your software.

Do NOT patch your dependencies

Sometimes a project A does almost exactly what you want, but not quite...So you patch it and ship it with your project B as a dependency. This will cause problems for Linux distributions because you basically forked the original project A. What you should do instead is work with the developers of project A to add features you need or fix those pesky bugs.

Bugfix releases

Every software project has bugs, so sooner or later you will have to do a bugfix release. As always there are certain rules you should try to uphold when doing bugfix releases.

Use correct version numbers

This depends on your versioning scheme. I'll assume you are using standard X.Y.Z versions for your releases. Changes in Z are smallest released changes of your project. They should mostly contain only bugfixes and unobtrusive and simple feature additions if necessary. If you want to add bigger features you should change Y part of the version.

Backward compatible

Bugfix releases have to be backwards compatible at all times. No API changes are allowed.

No changes in dependencies

You should not change dependencies or add new ones in bugfix releases. Even updating dependency to a new version can cause massive recursive need for updates or new dependencies. The only time it's acceptable to change/add dependency version in bugfix release is when new dependency is required to fix the bug.

An excellent example of how NOT to do things was Apache Maven update from 3.0 to 3.0.1. This update changed requirements from Aether 1.7 to Aether 1.8. Aether 1.8 had new dependency on async-http-client. Async-http-client depends on netty, jetty 7.x and more libraries. So what should have been simple bugfix update turned into need for major update of 1 package and 2 new package additions. If this update contained security fixes it would cause serious problems to resolve in timely manner.

Summary

  • Create source releases containing build scripts
  • Think about your dependencies carefully
  • Handle micro releases gracefully
Next time I'll look into some Ant and Maven specifics that are causing problems for packagers and how to resolve them in your projects.

Share/Save/Bookmark

2 comments:

Post a Comment
  1. Thank you for broaching that - problem-rich - issue.

    When I considered what to do about my then bread-and-butter project (~40 packages, 2000 classes, many indirect dependencies on Apache packages) I looked at Maven and Ivy.

    I chose Ivy, because the dependency graph was deep (rather than flat), and Ivy's transitive treatment of dependencies suited that situation uniquely well.

    Would you plan to discuss Ivy, in your Guide? I, for one, would be interested.
    With best regards
    ghh

  2. Perhaps I am misunderstanding but Maven is actually ideal for that kind of dependency tree. Maven does transitive dependencies (e.g if I require package A and package A requires package B I will get both in the classpath). So I don't see why you should chose Ivy based solely on that aspect.

    Frankly I haven't had THAT much experience with packaging Ivy, and I am viewing it as an extension to Ant that adds dependency solving but also leaves other Ant problems untouched. I plan to touch on the subject, but probably not as much as you'd like. I might have to look into it a bit more...