Python Packaging woes - part 1

This is a spinoff from my Europython 2012 talk (video, slides).

Some days ago I got involved in a brief discussion on the topic, so here's what I think about Python packaging and dependency management: it's in a very bad state, requires a lot of manual intervention to work properly and heavily limits code reuse.

Why? Let's see.

The Packaging Conspiracy

Aim

No, it's not a conspiracy to force you to learn something you don't care about. The whole point of packaging is to make code reuse easy enough, both for the developer's own sake (e.g. shared code between different projects) and for sharing it with the public.

Packaging usually involves dependency management; this happens in most Linux distributions as well as other packaging systems (e.g. Apache Maven, Rubygems), because you don't just release your code in "in the dark": you are releasing version 1.0.0 of your code on January, 1st 2013, and that usually matches (or should match) an exact revision on your source control system, and a precise set of dependencies, because your own software or library may have its own requirements.

It's an overkill!

Some developers think all of this is a waste of their time.

D: Hey Alan, are you really saying say I can't just copy and paste some source code in some directory and just go on? How much time should I spend at this tedious task?!?

A: Sure, just copy and paste. And remember copying and pasting all transitive dependencies as well. And since this is Python, start hoping all your transitive dependencies are pure Python, otherwise good luck at copypasting C source code into your repo and remembering to compile it every time. And whenever you need a bugfix in your upstream dependency, good luck at remembering where you've fetched that code from, and at which revision. Oh, did you write the URL and the revision somewhere in the VCS commit message? Fine, you're doing by hand the work that a good dependency management tool is supposed to do. If you like that.

D: OK, ok. I got it. So now I've added my dependencies. But what the heck, does the version really matter? I don't care! If a new version of something I rely on is released, it will contain bugfixes and maybe new features, but won't break my code.

A: Dear developer, I think your great trust in the Developer From The Next Door is something that feeds your heart but hinders your brain. You should depend on Only One Version for your production code; when it's time to upgrade to a later one, you need to do a manual test run and verify everything is fine. If you forget about doing that, your imagination will be the only limit to whatever mess can happen.

D: OK, ok, now I've set versions for all my deps. What version should I use for my own software, though? Can I just pick any version I like? First version 1.0, then 1.1, until I get to 1.99 and i move to 2.0?

A: Try doing something meaningful, and stick to your plan. You can pick your own versioning policy. Usually your version numbers should distinguish at least between major releases, where you can break backwards compatibility (if you must, not just because you enjoy) and add a lot of new things, and bugfix releases, where you try to minimize the changes to what really needs to be fixed, in order not to mess up client code. Many softwares used to use a two-dotted version number, something like MAJOR.MINOR.REVISION, with REVISION being used for bugfixes only, but the distinction between MAJOR and MINOR is blurred and often just something "emotional". If you perform small incremental changes on your codebase - a-la-Chrome - you can stick with just MAJOR.REVISION. Every feature-adding release increases the MAJOR, every bugfix increases the REVISION. Of course, remember that your MAJOR should be strictly increasing, while REVISION is usually reset at 0 for every MAJOR and then strictly increasing.

D: So, what should I do now? Where do I release my code? Should I copypaste my code somewhere? I know of a great pastebin!

A: You should usually aggregate your build into an artifact (= product of the build) and you should upload it to a shared repository, where you and other people agree uploading your software to and fetching other people software from.

D: YEAH! I've just released version 0.0.2 to the public. Now I'm deleting 0.0.1, right?

A: No. Never. Ideally, unless you must - by must I mean something like "there's a password for my bank account in the source code" or "I mistakenly copypasted code for which I hadn't the copyright" - you should never, ever, ever, remove a released version. You never know who's using it - if my software works fine with 0.0.1, why should I be forced to upgrade? You can shout at me if I ever ask anything about it to you, according to your version policy, but you shouldn't care about old versions otherwise. If tagging is supported on the repository you might want to tag old versions as "deprecated", "bugged", or maybe "donut" if they contain a huge security hole.

D: I like fancy names. Can I name my next release 2.0.5sarah? You know, it's my first daughter and she was born in the very day of such release.

A: I'm happy of hearing that, but please stick with numbers only. Is 2.0.5sarah higher or lower than 2.0.6? No, "lexicographical" is not what you and I want to hear in such context. You may want to use a suffix/prefix/tag to mark non-final releases, e.g. alpha/beta/development releases, preferably a SINGLE suffix, which is widely recognized, for all of those, and decide once-for-all if it means lower or higher. Usually it means "lower", so that 2.0.0beta is < 2.0.0, but that's not always the case - a very good idea is to deny that the same version of a package can exist both for a development and production version. I.e. if 2.1.'' is beta, 2.0.5 can be a production release, but not 2.1.0 or 2.1.1.

D: There're a lot of things to remember about dependency management and packaging. Should I learn everything by heart?

A: What the heck, no. Your packaging tool should guide you in such practices and prevent you from going stray unless it's what you really mean to do! That why I'm complaining in the first place about the way Python does packaging!

In part 2 I dig into Python-specific packaging issues.