Python Packaging Woes - part 2
Check out part 1 for my thoughts on what packaging and dependency
management should be. I forgot about this post and left it in a draft state for quite a lot of time, I'm sorry!
How Python messed it up
A Diaspora of Tools
There's a plethora of packaging tools for Python! Let's take a look at them.
The original distutils is part of the Python standard library, but
it's not really feature-packed, and it's considered unsuitable for many production usages. As an example,
keyword seems to do
absolutely nothing else
than documenting requirements. Nothing will happen if you install a package requiring another one;
you still need to install dependencies manually.
Distutils is not inherently bad, but since it's a standard library tool it's quite stable - it changes
very little and in very predictable ways, so it wasn't updated for a long time and got quite old and
tired, and needs an overhaul now (more on that later).
The original setuptools package, which was created
by a clever guy called Phillip J. Eby, is what people used for a lot of time, and that it's unmantained
right now. It builds on distutils and extends/monkeypatches it, adding a lot of functionalities.
It includes easy_install which is probably the
first tool that allowed people to install packages straight from pypi (or other URLs, more on that later)
on the command line, without a lot of manual intervention; also, if the downloaded package is
setuptools-aware and correctly specifies dependencies with
install_requires it will
install all transitive dependencies.
It includes pkg_resources as well; that last
one is one of the great Python underdogs, IMHO; it allows Python programmers to access any kind of
data which is installed in Python libraries, regardless of how it was installed - is it an egg? A
flat install? Did the install became part of the standard library? - don't worry: just ask for the
resource name and the package name, and pkg_resources will look up the file object for you and open
Unluckily the API was not so easy to use - that's something I'm trying to address in
pydenji with its resourceloader URL-parsing based system - and many
people to handle resource in python just "look for them" from the current directory using
__file__. Too bad.
In the end, setuptools had little maintenance applied and was forked; distribute is dropin
replacement which shares functions and dysfunctions of its ancestor.
Recently it has been re-merged into setuptools.
This wouldn't be inherently a problem, but sometimes you can find projects that are distribute-unaware and try forcing
a specific-and-old version of setuptools, or you can find projects that know that setuptools is "old" and force
installing distribute (which is an issue since setuptools is now an actually updated distribute) :-/
The Next Big Thing in Python packaging, which has never happened so far - should have been available in Python 3.3, but
development seems to have stalled.
Pip is among the best tools you could use and works fine, but it's Yet Another Tool. It works
as his author thinks it's well done - which is usually a good thing, since Ian Bicking knows what he's doing.
Dependency management is hard with distribute; you can't handle diamond dependencies nor do overrides.
Pypi repository is mutable (versions can be deleted by package owners) and isn't got a real API, even though, today, there's
a way to prevent the packages from being fetched from their homepage and just rely on pypi - see PEP 438
One good way of handling dependency management in such cases is to use Pip's own requirements.txt file, which - of course - is not a standard, and may not work in all environments, or simply the project you love may not provide it at all.
It's very easy to find packages from pypi that don't work out of the box, either because of wrong dependencies,
or because somehow pypi changed hosted versions since the time the package was built. Replicating a deployment
is not easy.
And different tools just don't play well together.
What should be done?
- Focusing more on the problem. Packaging and dependency management seem to be underdogs in the Python
- Choose a tool or an API (Maven repos became de-facto standard for non-Maven users as well) that can
work, and if it doesn't, evolve it, don't just let people invent their in-house solution. It's not
important if it's not part of the standard library.
- Define best practices for packaging and dependencies (package layout, version numbers, metadata,
uploaded package formats, etc) and be sure the user doesn't override them unless he knows what and why
he does it.
- Have a good doc linked from the official documentation. The current packaging
guide is not complete and it's not linked from anything
"official" - it's just another guide.
There's a light at the end of the gallery, and it's not a train! We'll talk about it in part 3. Which I hope should happen soon enough :-)