[Numpy-discussion] Announcing toydist, improving distribution and packaging situation

Sitting down with Tarek(who is one of the current distutils
maintainers) in Berlin we had a little discussion about packaging over
pizza and beer... and he was quite mindful of OS packagers problems
and issues.

This has been said many times on distutils-sig, but no concrete action
has ever been taken in that direction. For example, toydist already
supports the FHS better than distutils, and is more flexible. I have
tried several times to explain why this matters on distutils-sig, but
you then have the peanuts gallery interfering with unrelated nonsense
(like it would break windows, as if it could not be implemented
independently).

Also, retrofitting support for --*dir in distutils would be *very*
difficult, unless you are ready to break backward compatibility (there
are 6 ways to install data files, and each of them has some corner
cases, for example - it is a real pain to support this correctly in
the convert command of toydist, and you simply cannot recover missing
information to comply with the FHS in every case).

However these systems were developed by the zope/plone/web crowd, so
they are naturally going to be thinking a lot about zope/plone/web
issues.

Agreed - it is natural that they care about their problems first,
that's how it works in open source. What I find difficult is when our
concern are constantly dismissed by people who have no clue about our
issues - and later claim we are not cooperative.

Debian, and ubuntu packages for them are mostly useless
because of the age.

That's where the build farm enters. This is known issue, that's why
the build service or PPA exist in the first place.

I think
perhaps if toydist included something like stdeb as not an extension
to distutils, but a standalone tool (like toydist) there would be less
problems with it.

That's pretty much how I intend to do things. Currently, in toydist,
you can do something like:

from toydist.core import PackageDescription

pkg = PackageDescription.from_file("toysetup.info")
# pkg now gives you access to metadata, as well as extensions, python
modules, etc...

I think this gives almost everything that is needed to implement a
sdist_dsc command. Contrary to the Distribution class in distutils,
this class would not need to be subclassed/monkey-patched by
extensions, as it only cares about the description, and is 100 %
uncoupled from the build part.

yes, I have also battled with distutils over the years. However it is
simpler than autotools (for me... maybe distutils has perverted my
fragile mind), and works on more platforms for python than any other
current system.

Autotools certainly works on more platforms (windows notwhistanding),
if only because python itself is built with autoconf. Distutils
simplicity is a trap: it is simpler only if you restrict to what
distutils gives you. Don't get me wrong, autotools are horrible, but I
have never encountered cases where I had to spend hours to do trivial
tasks, as has been the case with distutils. Numpy build system would
be much, much easier to implement through autotools, and would be much
more reliable.

However
distutils has had more tests and testing systems added, so that
refactoring/cleaning up of distutils can happen more so.

You can't refactor distutils without breaking backward compatibility,
because distutils has no API. The whole implementation is the API.
That's one of the fundamental disagreement I and other scipy dev have
with current contributors on distutils-sig: the starting point
(distutils) and the goal are so far away from each other that getting
there step by step is hopeless.

I agree with many things in that post. Except your conclusion on
multiple versions of packages in isolation. Package isolation is like
processes, and package sharing is like threads - and threads are evil!

I don't find the comparison very helpful (for once, you can share data
between processes, whereas virtualenv cannot see each other AFAIK).

Science is supposed to allow repeatability. Without the same versions
of packages, repeating experiments is harder. This is a big problem
in science that multiple versions of packages in _isolation_ can help
get to a solution to the repeatability problem.

I don't think that's true - at least it does not reflect my experience
at all. But then, I don't pretend to have an extensive experience
either. From most of my discussions at scipy conferences, I know most
people are dissatisfied with the current python solutions.

Plenty of good work is going on with python packaging.

That's the opposite of my experience. What I care about is:
- tools which are hackable and easily extensible
- robust install/uninstall
- real, DAG-based build system
- explicit and repeatability

None of this is supported by the tools, and the current directions go
even further away. When I have to explain at length why the
command-based design of distutils is a nightmare to work with, I don't
feel very confident that the current maintainers are aware of the
issues, for example. It shows that they never had to extend distutils
much.

All agreed! I'd add to the list parallel builds/tests (make -j 16),
and outputting to native build systems. eg, xcode, msvc projects, and
makefiles.

Yep - I got quite far with numscons already. It cannot be used as a
general solution, but as a dev tool for my own work on numpy/scipy, it
has been a huge time saver, especially given the top notch dependency
tracking system. It supports // builds, and I can build full debug
builds of scipy < 1 minute on a fast machine. That's a real
productivity booster.

How will you handle toydist extensions so that multiple extensions do
not have problems with each other? I don't think this is possible
without isolation, and even then it's still a problem.

By doing it mostly the Unix way, through protocols and file format,
not through API. Good API is hard, but for build tools, it is much,
much harder. When talking about extensions, I mostly think about the
following:
- adding a new compiler/new platform
- adding a new installer format
- adding a new kind of source file/target (say ctypes extension,
cython compilation, etc...)

Instead of using classes for compilers/tools, I am considering using
python modules for each tool, and each tool would be registered
through a source file extension (associate a function to ".c", for
example). Actual compilation steps would be done through strings ("$CC
...."). The system would be kept simple, because for complex projects,
one should forward all this to a real build system (like waf or
scons).

There is also the problem of post/pre hooks, adding new steps in
toymaker: I have not thought much about this, but I like waf's way of
doing it, and it may be applicable. In waf, the main script (called
wscript) defines a function for each build step:

def configure():
   pass

def build():
   pass

....

And undefined functions are considered unmodified.

What I know for sure is that the distutils-way of extending through
inheritance does not work at all. As soon as two extensions subclass
the same base class, you're done.

Yeah, cool. Many other projects have their own servers too.
pygame.org, plone, etc etc, which meet their own needs. Patches are
accepted for pypi btw.

Yes, but how long before the patch is accepted and deployed ?

What type of enforcements of meta data, and how would they help? I
imagine this could be done in a number of ways to pypi.
- a distutils command extension that people could use.
- change pypi source code.
- check the metadata for certain packages, then email their authors
telling them about issues.

First, packages with malformed metadata would be rejected, and it
would not be possible to register a package without uploading the
sources. I simply do not want to publish a package which does not even
have a name or a version, for example.

The current way of doing things in pypi in insane if you ask me. For
example, if you want to install a package with its dependencies, you
need to download the package, which may be in another website, and you
need to execute setup.py just to know its dependencies. This has so
many failures modes, I don't understand how this can seriously be
considered, really. Every other system has an index to do this kind of
things (curiously, both EPD and pypm have an index as well AFAIK).
Again, a typical example of NIH, with inferior solutions implemented
in the case of python.

yeah, cool. That would let you develop things incrementally too, and
still have toydist be useful for the whole development period until it
catches up with the features of distutils needed.

Initially, toydist was started to show that writing something
compatible with distutils without being tight to distutils was
possible.

If you execute build tools on arbitrary code, then arbitrary code
execution is easy for someone who wants to do bad things.

Well, you could surely exploit built tools bugs. But at least, I can
query metadata and packages features in a safe way - and this is very
useful already (cf my points about being able to query packages
metadata in one "query").

and many times I still
get errors on different platforms, despite many years of multi
platform coding.

Yes, that's a difficult process. We cannot fix this - but having
automatically built (and hopefully tested) installers on major
platforms would be a significant step in the right direction. That's
one of the killer feature of CRAN (whenever you submit a package for
CRAN, a windows installer is built, and tested).

cheers,

David

···

On Wed, Dec 30, 2009 at 8:15 PM, René Dudfield <renesd@...149...> wrote: