[Numpy-discussion] Announcing toydist, improving distribution and packaging situation

Hi,

In the toydist proposal/release notes, I would address 'what does
toydist do better' more explicitly.

**** A big problem for science users is that numpy does not work with
pypi + (easy_install, buildout or pip) and python 2.6. ****

Working with the rest of the python community as much as possible is
likely a good goal.

Yes, but it is hopeless. Most of what is being discussed on
distutils-sig is useless for us, and what matters is ignored at best.
I think most people on distutils-sig are misguided, and I don't think
the community is representative of people concerned with packaging
anyway - most of the participants seem to be around web development,
and are mostly dismissive of other's concerns (OS packagers, etc...).

I want to note that I am not starting this out of thin air - I know
most of distutils code very well, I have been the mostly sole
maintainer of numpy.distutils for 2 years now. I have written
extensive distutils extensions, in particular numscons which is able
to fully build numpy, scipy and matplotlib on every platform that
matters.

Simply put, distutils code is horrible (this is an objective fact) and
flawed beyond repair (this is more controversial). IMHO, it has
almost no useful feature, except being standard.

If you want a more detailed explanation of why I think distutils and
all tools on top are deeply flawed, you can look here:

numpy used to work with buildout in python2.5, but not with 2.6.
buildout lets other team members get up to speed with a project by
running one command. It installs things in the local directory, not
system wide. So you can have different dependencies per project.

I don't think it is a very useful feature, honestly. It seems to me
that they created a huge infrastructure to split packages into tiny
pieces, and then try to get them back together, imaganing that
multiple installed versions is a replacement for backward
compatibility. Anyone with extensive packaging experience knows that's
a deeply flawed model in general.

Plenty of good work is going on with python packaging.

That's the opposite of my experience. What I care about is:
  - tools which are hackable and easily extensible
  - robust install/uninstall
  - real, DAG-based build system
  - explicit and repeatability

None of this is supported by the tools, and the current directions go
even further away. When I have to explain at length why the
command-based design of distutils is a nightmare to work with, I don't
feel very confident that the current maintainers are aware of the
issues, for example. It shows that they never had to extend distutils
much.

There are build farms for windows packages and OSX uploaded to pypi.
Start uploading pre releases to pypi, and you get these for free (once
you make numpy compile out of the box on those compile farms). There
are compile farms for other OSes too... like ubuntu/debian, macports
etc. Some distributions even automatically download, compile and
package new releases once they spot a new file on your ftp/web site.

I am familiar with some of those systems (PPA and opensuse build
service in particular). One of the goal of my proposal is to make it
easier to interoperate with those tools.

I think Pypi is mostly useless. The lack of enforced metadata is a big
no-no IMHO. The fact that Pypi is miles beyond CRAN for example is
quite significant. I want CRAN for scientific python, and I don't see
Pypi becoming it in the near future.

The point of having our own Pypi-like server is that we could do the following:
- enforcing metadata
- making it easy to extend the service to support our needs

pypm: Perl Package Manager Index (PPM) | ActiveState Code

It is interesting to note that one of the maintainer of pypm has
recently quitted the discussion about Pypi, most likely out of
frustration from the other participants.

Documentation projects are being worked on to document, give tutorials
and make python packaging be easier all round. As witnessed by 20 or
so releases on pypi every day(and growing), lots of people are using
the python packaging tools successfully.

This does not mean much IMO. Uploading on Pypi is almost required to
use virtualenv, buildout, etc.. An interesting metric is not how many
packages are uploaded, but how much it is used outside developers.

I'm not sure making a separate build tool is a good idea. I think
going with the rest of the python community, and improving the tools
there is a better idea.

It has been tried, and IMHO has been proved to have failed. You can
look at the recent discussion (the one started by Guido in
particular).

pps. some notes on toydist itself.
- toydist convert is cool for people converting a setup.py . This
means that most people can try out toydist right away. but what does
it gain these people who convert their setup.py files?

Not much ATM, except that it is easier to write a toysetup.info
compared to setup.py IMO, and that it supports a simple way to include
data files (something which is currently *impossible* to do without
writing your own distutils extensions). It has also the ability to
build eggs without using setuptools (I consider not using setuptools a
feature, given the too many failure modes of this package).

The main goals though are to make it easier to build your own tools on
top of if, and to integrate with real build systems.

- a toydist convert that generates a setup.py file might be cool :slight_smile:

toydist started like this, actually: you would write a setup.py file
which loads the package from toysetup.info, and can be converted to a
dict argument to distutils.core.setup. I have not updated it recently,
but that's definitely on the TODO list for a first alpha, as it would
enable people to benefit from the format, with 100 % backward
compatibility with distutils.

- arbitrary code execution happens when building or testing with
toydist.

You are right for testing, but wrong for building. As long as the
build is entirely driven by toysetup.info, you only have to trust
toydist (which is not safe ATM, but that's an implementation detail),
and your build tools of course.

Obviously, if you have a package which uses an external build tool on
top of toysetup.info (as will be required for numpy itself for
example), all bets are off. But I think that's a tiny fraction of the
interesting packages for scientific computing.

Sandboxing is particularly an issue on windows - I don't know a good
solution for windows sandboxing, outside of full vms, which are
heavy-weights.

- it should be possible to build this toydist functionality as a
distutils/distribute/buildout extension.

No, it cannot, at least as far as distutils/distribute are concerned
(I know nothing about buildout). Extending distutils is horrible, and
fragile in general. Even autotools with its mix of generated sh
scripts through m4 and perl is a breeze compared to distutils.

- extending toydist? How are extensions made? there are 175 buildout
packages which extend buildout, and many that extend
distutils/setuptools - so extension of build tools in a necessary
thing.

See my answer earlier about interoperation with build tools.

cheers,

David

···

On Tue, Dec 29, 2009 at 10:27 PM, René Dudfield <renesd@...149...> wrote: