MEP11: Attempting to deal with the Python library dependency issue

I invite comments for a new MEP about improving the situation with
respect to our bundling of third-party Python dependencies.

In particular, I'd love feedback from the various stakeholders --

those producing binary installers and packages for the various
platforms.

[https://github.com/matplotlib/matplotlib/wiki/MEP11](https://github.com/matplotlib/matplotlib/wiki/MEP11)

Mike

Hi,

could dateutil, pytz, and pyparsing be made optional dependencies? I just tried, all of my own scripts do work without them being installed (one line needed to be removed in axes.py https://github.com/matplotlib/matplotlib/blob/master/lib/matplotlib/axes.py#L19). Only about 10 of matplotlib's examples fail (after some additional changes).

Frankly, I would remove/unbundle all 3rd party Python packages from matplotlib and declare them as dependencies for pip and easy_install, and of course in the documentation.

I think that matplotlib, the library, should not attempt to work around Python's distribution/packaging limitations. Please do not use post-install or run-time scripts to detect and install missing dependencies.

Concerning end user experience, the scipy-stack project seems like a better place to address this.

Optionally, for Windows users that won't touch pip or easy_install (like me), matplotlib could provide separate downloads of installers for dateutil, pytz, pyparsing, and six. They are trivial to create.

It is also easy to create EGGs or MSIs for matplotlib, which are occasionally requested.

Also consider a separate package for the matplotlib tests, which would include 35 MB of baseline images that are of little use to end users.

Christoph

···

On 10/3/2012 9:20 AM, Michael Droettboom wrote:

I invite comments for a new MEP about improving the situation with
respect to our bundling of third-party Python dependencies.

In particular, I'd love feedback from the various stakeholders -- those
producing binary installers and packages for the various platforms.

https://github.com/matplotlib/matplotlib/wiki/MEP11

Mike

A bunch of great stuff:

+1 all around

Another use-case is py2exe, py2app, and friends -- at the moment, you
pretty much have to include the whole dang MPL package to get things
to work. Cleaning up some of these dependencies could improve that.

-Chris

···

On Wed, Oct 3, 2012 at 2:08 PM, Christoph Gohlke <cgohlke@...244...> wrote:

On 10/3/2012 9:20 AM, Michael Droettboom wrote:

I invite comments for a new MEP about improving the situation with
respect to our bundling of third-party Python dependencies.

In particular, I'd love feedback from the various stakeholders -- those
producing binary installers and packages for the various platforms.

https://github.com/matplotlib/matplotlib/wiki/MEP11

Mike

Hi,

could dateutil, pytz, and pyparsing be made optional dependencies? I
just tried, all of my own scripts do work without them being installed
(one line needed to be removed in axes.py
https://github.com/matplotlib/matplotlib/blob/master/lib/matplotlib/axes.py#L19).
Only about 10 of matplotlib's examples fail (after some additional
changes).

Frankly, I would remove/unbundle all 3rd party Python packages from
matplotlib and declare them as dependencies for pip and easy_install,
and of course in the documentation.

I think that matplotlib, the library, should not attempt to work around
Python's distribution/packaging limitations. Please do not use
post-install or run-time scripts to detect and install missing
dependencies.

Concerning end user experience, the scipy-stack project seems like a
better place to address this.

Optionally, for Windows users that won't touch pip or easy_install (like
me), matplotlib could provide separate downloads of installers for
dateutil, pytz, pyparsing, and six. They are trivial to create.

It is also easy to create EGGs or MSIs for matplotlib, which are
occasionally requested.

Also consider a separate package for the matplotlib tests, which would
include 35 MB of baseline images that are of little use to end users.

Christoph

------------------------------------------------------------------------------
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
_______________________________________________
Matplotlib-devel mailing list
Matplotlib-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-devel

--

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception

Chris.Barker@...236...

I invite comments for a new MEP about improving the situation with
respect to our bundling of third-party Python dependencies.

In particular, I'd love feedback from the various stakeholders -- those
producing binary installers and packages for the various platforms.

https://github.com/matplotlib/matplotlib/wiki/MEP11

Mike

I think that matplotlib, the library, should not attempt to work around
Python's distribution/packaging limitations. Please do not use
post-install or run-time scripts to detect and install missing
dependencies.

I whole-heartedly agree here. There are package managers for this job.
I understand there are people less package-literate and, as you point
out below, the development team for each separate dependency can ship
a binary. Though I understand not all do this.

Optionally, for Windows users that won't touch pip or easy_install (like
me), matplotlib could provide separate downloads of installers for
dateutil, pytz, pyparsing, and six. They are trivial to create.

Also consider a separate package for the matplotlib tests, which would
include 35 MB of baseline images that are of little use to end users.

I agree here, too. I think most people who want to use the library
won't ever run or touch the tests. Heck, I only ever ran the tests
after I started contributing back to the community. Perhaps they
should be spawn off to a matplotlib-tests git submodule that Travis
can use for commit-checking.

···

On Wed, Oct 3, 2012 at 10:08 PM, Christoph Gohlke <cgohlke@...244...> wrote:

On 10/3/2012 9:20 AM, Michael Droettboom wrote:

--
Damon McDougall
http://www.damon-is-a-geek.com
B2.39
Mathematics Institute
University of Warwick
Coventry
West Midlands
CV4 7AL
United Kingdom

To expand on this, there's a discussion underway on the scipy-user and
numfocus mailing lists about standardising a set of packages making up
the 'scipy stack', and pointing people to distributions which ship all
of those.

Matplotlib is of course among that set of packages, so hopefully that
will reduce the need for users to install it separately.

Thanks,
Thomas

···

On 3 October 2012 22:08, Christoph Gohlke <cgohlke@...244...> wrote:

Concerning end user experience, the scipy-stack project seems like a
better place to address this.

I help with the openSUSE packaging of mpl. At least as openSUSE
policies go, bundling dependencies is considered a big no-no. RPM has
its own dependency handling, so as long as the dependencies are
documented (ideally with version numbers) then there is no issue,
either at build-time or at run-time. I think that would likely be the
case for any official linux packages.

Anyone on Linux who is trying to install matplotlib from source should
be prepared to handle dependency resolution manually. If they aren't,
then they shouldn't be messing with package installation in the first
place. I think the documentation should clearly state this, although
more diplomatically of course :slight_smile:

So from a Linux standpoint I think bundling is a bad idea. Further,
any solution should be prepared to handle the situation where the
dependencies are already available, and not try to download them under
this situation. It should also be able to handle installation with no
internet connection as long as the dependencies are available, so it
can be compatible with automated build systems and hpc environments
which may not support internet access for security reasons.

For windows, rather than creating independent matplotlib installers,
can't the documentation just point people in the direction of a
pre-existing bundle like python(x,y)? Since there are groups
dedicated to making it easy to install python packages on windows, I
don't see the point of going through all the trouble of making your
own version. If you really wanted to you might even be able to use
their sources to create your own variant that just installs what you
need.

-Todd

···

On Wed, Oct 3, 2012 at 6:20 PM, Michael Droettboom <mdroe@...31...> wrote:

I invite comments for a new MEP about improving the situation with respect
to our bundling of third-party Python dependencies.

In particular, I'd love feedback from the various stakeholders -- those
producing binary installers and packages for the various platforms.

https://github.com/matplotlib/matplotlib/wiki/MEP11

Mike

I invite comments for a new MEP about improving the situation with
respect to our bundling of third-party Python dependencies.

In particular, I'd love feedback from the various stakeholders -- those
producing binary installers and packages for the various platforms.

https://github.com/matplotlib/matplotlib/wiki/MEP11

Mike

Hi,

could dateutil, pytz, and pyparsing be made optional dependencies? I
just tried, all of my own scripts do work without them being installed
(one line needed to be removed in axes.py
https://github.com/matplotlib/matplotlib/blob/master/lib/matplotlib/axes.py#L19).
Only about 10 of matplotlib's examples fail (after some additional
changes).

I think that sort of unbundling is a good idea to do in any case. Would you mind filing a PR with the changes you made? I think we still have to provide some sort of a hand-holding way to help people get dateutil/pytz/pyparsing etc. installed because if we take those features away people will complain.

Frankly, I would remove/unbundle all 3rd party Python packages from
matplotlib and declare them as dependencies for pip and easy_install,
and of course in the documentation.

If we can determine through experimentation that that's a reliable and convenient approach, that's what I would prefer.

Binary installers make things more complicated. I'd prefer to have the installer automatically download the dependencies there, too, if we can work through the technical obstacles.

I think that matplotlib, the library, should not attempt to work around
Python's distribution/packaging limitations. Please do not use
post-install or run-time scripts to detect and install missing
dependencies.

Certainly -- but I don't want to provide people who are used to getting those dependencies from the installer to suddenly have a lesser experience. If some standard tool gives us what we need, great, otherwise we are going to have to make something that works. (All this is conjecture, because I haven't yet experimented with pip inside a Windows installer etc.)

Concerning end user experience, the scipy-stack project seems like a
better place to address this.

I agree -- but I think that's still a ways off. Also, remember there are a number of non-scientific users of matplotlib (data center monitoring is one such application I'm aware of) and those folks may be wary of installing a large scientific stack. Of course, perhaps that crowd is already using a proper package manager.

Optionally, for Windows users that won't touch pip or easy_install (like
me), matplotlib could provide separate downloads of installers for
dateutil, pytz, pyparsing, and six. They are trivial to create.

That's not a bad idea. But telling people they need to run 5 installers where there used to be 1 may be a hard sell. Any way to build a "meta" installer?

It is also easy to create EGGs or MSIs for matplotlib, which are
occasionally requested.

I'm perhaps showing my ignorance here, but is there any "one best way" among those options? I'd rather have one obvious best approach.

Also consider a separate package for the matplotlib tests, which would
include 35 MB of baseline images that are of little use to end users.

I've been wondering about this myself. I think what we probably want there is a separate Python package (i.e. a separate setup.py script), that lives in the same repository. It's important to keep the test data and the code in sync and a separate repository would just add needless complication. But that would easily allow binary installers and tarballs to be built without the test data. I think I'll add this to the MEP as a related task.

Mike

···

On 10/03/2012 05:08 PM, Christoph Gohlke wrote:

On 10/3/2012 9:20 AM, Michael Droettboom wrote:

Christoph

------------------------------------------------------------------------------
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
_______________________________________________
Matplotlib-devel mailing list
Matplotlib-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-devel

Just to reiterate a point I made in an earlier e-mail: I agree that effort is a good one and deserving of support, but not all users of matplotlib want a full scientific stack. There are always going to be those who "just want to plot something", and we need to support that use case at least as well as we do now.

Mike

···

On 10/04/2012 06:16 AM, Thomas Kluyver wrote:

On 3 October 2012 22:08, Christoph Gohlke <cgohlke@...244...> wrote:

Concerning end user experience, the scipy-stack project seems like a
better place to address this.

To expand on this, there's a discussion underway on the scipy-user and
numfocus mailing lists about standardising a set of packages making up
the 'scipy stack', and pointing people to distributions which ship all
of those.

Matplotlib is of course among that set of packages, so hopefully that
will reduce the need for users to install it separately.