Default matplotlib data path

Would it be considered cleaner to embed the mpl data into

    > the matplotlib module? This would make it easier to
    > clean a mpl install. The data path could be expressed
    > fairly easily too, as a one-liner:

    > os.sep.join([os.path.split(matplotlib.__file__)[0],
    > 'matplotlib-data'])

Yes, if you can engineer in a way that works with setup w/ and w/o a
--prefix arg it would be preferable, in my view.

JDH

Alright, I'll give it a shot and let you know.

···

On 12/7/05, John Hunter <jdhunter@...5...> wrote:

    > Would it be considered cleaner to embed the mpl data into
    > the matplotlib module? This would make it easier to
    > clean a mpl install. The data path could be expressed
    > fairly easily too, as a one-liner:

    > os.sep.join([os.path.split(matplotlib.__file__)[0],
    > 'matplotlib-data'])

Yes, if you can engineer in a way that works with setup w/ and w/o a
--prefix arg it would be preferable, in my view.

JDH

Hello,

I somehow missed all the action between matplotlib 0.84 and 0.86.2.
Trying to install v0.86.2 I find data files are not installed correctly
in all cases. Specifically support for '--home=' and '--install-data' is
broken. (We use --install-data for our installations.)

I know this is a very tricky issue (in fact if someone can tell me
how to get this done correctly for bdist_wininst, I'll be very gratefull)
but here's a suggestion which worked for me. I am copying the relevant
part from setup.py and hope someone has a better solution.

Thanks,
Nadia Dencheva

if has_setuptools: # EGG's make it simple
     datapath = os.path.curdir
     datapath = os.sep.join([datapath, 'matplotlib', 'mpl-data']) # This is where mpl data will be installed
# logic from distutils.command.install.finalize_options
elif os.name == 'posix':
     py_version_short = sys.version[0:3]
     #datapath = INSTALL_SCHEMES['unix_prefix']['platlib']
     #datapath = datapath.replace('$platbase/', '').replace('$py_version_short', py_version_short)
     #datapath = os.sep.join(['mpl-data']) # This is where mpl data will be installed
     args = sys.argv
     for a in args:
         if a.startswith('--home='):
             dir = os.path.abspath(a.split('=')[1])
             datapath = os.path.join(dir, 'lib', 'python', 'matplotlib', 'mpl-data')
         elif a.startswith('--prefix='):
             dir = os.path.abspath(a.split('=')[1])
             pythonver = 'python'+py_version_short
             datapath = os.path.join(dir, 'lib', pythonver, 'site-packages', 'matplotlib', 'mpl-data')
         elif a.startswith('--install-data='):
             dir = os.path.abspath(a.split('=')[1])
             datapath = os.path.join(dir, 'mpl-data')
         else:
             pythonlib = distutils.sysconfig.get_python_lib(plat_specific=1)
             datapath = os.path.join(pythonlib, 'matplotlib', 'mpl-data')
else:
     datapath = INSTALL_SCHEMES[os.name]['platlib'].replace('$base/', '')
     datapath = os.sep.join([datapath, 'matplotlib', 'mpl-data']) # This is where mpl data will be installed

Charlie Moad wrote:

···

Alright, I'll give it a shot and let you know.

On 12/7/05, John Hunter <jdhunter@...5...> wrote:

"Charlie" == Charlie Moad <cwmoad@...149...> writes:

   > Would it be considered cleaner to embed the mpl data into
   > the matplotlib module? This would make it easier to
   > clean a mpl install. The data path could be expressed
   > fairly easily too, as a one-liner:

   > os.sep.join([os.path.split(matplotlib.__file__)[0],
   > 'matplotlib-data'])

Yes, if you can engineer in a way that works with setup w/ and w/o a
--prefix arg it would be preferable, in my view.

JDH

-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_idv37&alloc_id865&op=click
_______________________________________________
Matplotlib-devel mailing list
Matplotlib-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-devel

Short explanation:
     Use --prefix instead

Long explanation:
     Without explicitly moving around files in cvs and declaring the
mpl data as package_data, it is pretty hard to be 100% compliant. The
code you put below is basically how distutils now determines where to
stick data_files, and that is why I used it as a guide too faking it.
--home is unix specific I think, and I don't know that it gives you
any power over --prefix. --install-data is useless now since the data
is embedded into the matplotlib module itself.
     I think I wrote a little a while back justifying the move, but
I'll restate. If you look at older versions of the
matplotlib._get_data_path() method, it was becoming a huge collection
of special cases. Those cases are still in cvs, but commented out.
It had the approach of try everything until I find the data. Now you
could acutally write this method in one line, "return
os.sep.join([os.path.dirname(__file__), 'mpl-data'])". However it
does a little more by still checking the MATPLOTLIBDATA env variable
first and verifying that the embedded mpl-data folder actually exists.
     Another strong reason for the move is it makes matplotlib a lot
more embeddable. I personally have some plugins I have developed for
applications that ship with their own python, and it is much easier to
just drop matplotlib into on place, and not have to worry about
installing data files outside the scope of the application. It also
allows for multiple versions/instances of matplotlib to live on one
machine, where as before they would be forced to use a single version
of the dataset unless special care was taken to prevent that.

- Charlie

···

On 1/25/06, Nadezhda Dencheva <dencheva@...31...> wrote:

Hello,

I somehow missed all the action between matplotlib 0.84 and 0.86.2.
Trying to install v0.86.2 I find data files are not installed correctly
in all cases. Specifically support for '--home=' and '--install-data' is
broken. (We use --install-data for our installations.)

I know this is a very tricky issue (in fact if someone can tell me
how to get this done correctly for bdist_wininst, I'll be very gratefull)
but here's a suggestion which worked for me. I am copying the relevant
part from setup.py and hope someone has a better solution.

Thanks,
Nadia Dencheva

if has_setuptools: # EGG's make it simple
     datapath = os.path.curdir
     datapath = os.sep.join([datapath, 'matplotlib', 'mpl-data']) # This is where mpl data
will be installed
# logic from distutils.command.install.finalize_options
elif os.name == 'posix':
     py_version_short = sys.version[0:3]
     #datapath = INSTALL_SCHEMES['unix_prefix']['platlib']
     #datapath = datapath.replace('$platbase/', '').replace('$py_version_short',
py_version_short)
     #datapath = os.sep.join(['mpl-data']) # This is where mpl data will be installed
     args = sys.argv
     for a in args:
         if a.startswith('--home='):
             dir = os.path.abspath(a.split('=')[1])
             datapath = os.path.join(dir, 'lib', 'python', 'matplotlib', 'mpl-data')
         elif a.startswith('--prefix='):
             dir = os.path.abspath(a.split('=')[1])
             pythonver = 'python'+py_version_short
             datapath = os.path.join(dir, 'lib', pythonver, 'site-packages', 'matplotlib',
'mpl-data')
         elif a.startswith('--install-data='):
             dir = os.path.abspath(a.split('=')[1])
             datapath = os.path.join(dir, 'mpl-data')
         else:
             pythonlib = distutils.sysconfig.get_python_lib(plat_specific=1)
             datapath = os.path.join(pythonlib, 'matplotlib', 'mpl-data')
else:
     datapath = INSTALL_SCHEMES[os.name]['platlib'].replace('$base/', '')
     datapath = os.sep.join([datapath, 'matplotlib', 'mpl-data']) # This is where mpl data
will be installed

Charlie Moad wrote:
> Alright, I'll give it a shot and let you know.
>
> On 12/7/05, John Hunter <jdhunter@...5...> wrote:
>
>>>>>>>"Charlie" == Charlie Moad <cwmoad@...149...> writes:
>>
>> > Would it be considered cleaner to embed the mpl data into
>> > the matplotlib module? This would make it easier to
>> > clean a mpl install. The data path could be expressed
>> > fairly easily too, as a one-liner:
>>
>> > os.sep.join([os.path.split(matplotlib.__file__)[0],
>> > 'matplotlib-data'])
>>
>>Yes, if you can engineer in a way that works with setup w/ and w/o a
>>--prefix arg it would be preferable, in my view.
>>
>>JDH
>>
>
>
>
> -------------------------------------------------------
> This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
> for problems? Stop! Download the new AJAX search engine that makes
> searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
> http://ads.osdn.com/?ad_idv37&alloc_id865&op=click
> _______________________________________________
> Matplotlib-devel mailing list
> Matplotlib-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/matplotlib-devel

Charlie Moad wrote:

Short explanation:
     Use --prefix instead

Long explanation:
     Without explicitly moving around files in cvs and declaring the
mpl data as package_data, it is pretty hard to be 100% compliant. The
code you put below is basically how distutils now determines where to
stick data_files, and that is why I used it as a guide too faking it.
--home is unix specific I think, and I don't know that it gives you
any power over --prefix. --install-data is useless now since the data
is embedded into the matplotlib module itself.

There is another problem with the current approach. The current setup.py assumes
that if you have an egg-capable setuptools that you are building an egg and so
sets the data path for that. However, that's not always the case. For example,
the --single-version-externally-managed option should install matplotlib and
company as regular Python packages into site-packages (or wherever) with a
.egg-info/ directory alongside. This is how Debian (and presumably other
distros) is going to install eggified packages. However, the choice for the data
path ends up being incorrect.

I think a general rule might be to say that the innards of distutils are usually
a bad example for *using* distutils. It makes a lot of assumptions inside, and
the current mechanism in mpl's setup.py is fairly fragile. distutils is a piece
of junk, and really, really violates the "There should be one-- and preferably
only one --obvious way to do it," principle everywhere it possibly can, it seems.

The most robust approach seems to be this:

  http://wiki.python.org/moin/DistutilsInstallDataScattered

     I think I wrote a little a while back justifying the move, but
I'll restate. If you look at older versions of the
matplotlib._get_data_path() method, it was becoming a huge collection
of special cases. Those cases are still in cvs, but commented out.
It had the approach of try everything until I find the data. Now you
could acutally write this method in one line, "return
os.sep.join([os.path.dirname(__file__), 'mpl-data'])". However it
does a little more by still checking the MATPLOTLIBDATA env variable
first and verifying that the embedded mpl-data folder actually exists.

Have we ever considered moving to a path-based solution? For example, one would
set MATPLOTLIBDATAPATH to be a list of directories. When something inside
matplotlib needs data, it will go through the list of directories looking for
the file, and finally checking os.path.join([os.path.dirname(__file__),
'mpl-data']) if the file is not on the path. This would enable users without
privileges to manipulate site-packages or /usr/local/share to make replacements
or additions.

···

--
Robert Kern
robert.kern@...149...

"In the fields of hell where the grass grows high
Are the graves of dreams allowed to die."
  -- Richard Harter

Short explanation:
     Use --prefix instead

Hmm, I don't want to use --prefix.

I understand the reasoning behind this change but I think it misses one
important case - installations under a user specified directory (without getting the
unnecessary tree structure from '--home' and ''--prefix'),
For example, to install matplotlib under /home/users/matpltolib in the past
I would do

python setup.py install --install-lib=/home/user --install-data=/home/user/matplotlib

This, I think, covers two important cases:
- easy support for multiple versions on the system
- installation in a user directory (not having write permissions in site-packages)

Of course this can be done (in an ugly way) with --prefix.
I think you and I need the same kind of installation - data files bundled with
matplotlib, except that I don't want to install them in site-packages and this is
what's missing from the setup file now.

Am I missing something?

Nadia

···

On Jan 25, 2006, at 11:51 AM, Charlie Moad wrote:

Long explanation:
     Without explicitly moving around files in cvs and declaring the
mpl data as package_data, it is pretty hard to be 100% compliant. The
code you put below is basically how distutils now determines where to
stick data_files, and that is why I used it as a guide too faking it.
--home is unix specific I think, and I don't know that it gives you
any power over --prefix. --install-data is useless now since the data
is embedded into the matplotlib module itself.
     I think I wrote a little a while back justifying the move, but
I'll restate. If you look at older versions of the
matplotlib._get_data_path() method, it was becoming a huge collection
of special cases. Those cases are still in cvs, but commented out.
It had the approach of try everything until I find the data. Now you
could acutally write this method in one line, "return
os.sep.join([os.path.dirname(__file__), 'mpl-data'])". However it
does a little more by still checking the MATPLOTLIBDATA env variable
first and verifying that the embedded mpl-data folder actually exists.
     Another strong reason for the move is it makes matplotlib a lot
more embeddable. I personally have some plugins I have developed for
applications that ship with their own python, and it is much easier to
just drop matplotlib into on place, and not have to worry about
installing data files outside the scope of the application. It also
allows for multiple versions/instances of matplotlib to live on one
machine, where as before they would be forced to use a single version
of the dataset unless special care was taken to prevent that.

- Charlie

On 1/25/06, Nadezhda Dencheva <dencheva@...31...> wrote:

Hello,

I somehow missed all the action between matplotlib 0.84 and 0.86.2.
Trying to install v0.86.2 I find data files are not installed correctly
in all cases. Specifically support for '--home=' and '--install-data' is
broken. (We use --install-data for our installations.)

I know this is a very tricky issue (in fact if someone can tell me
how to get this done correctly for bdist_wininst, I'll be very gratefull)
but here's a suggestion which worked for me. I am copying the relevant
part from setup.py and hope someone has a better solution.

Thanks,
Nadia Dencheva

if has_setuptools: # EGG's make it simple
     datapath = os.path.curdir
     datapath = os.sep.join([datapath, 'matplotlib', 'mpl-data']) # This is where mpl data
will be installed
# logic from distutils.command.install.finalize_options
elif os.name == 'posix':
     py_version_short = sys.version[0:3]
     #datapath = INSTALL_SCHEMES['unix_prefix']['platlib']
     #datapath = datapath.replace('$platbase/', '').replace('$py_version_short',
py_version_short)
     #datapath = os.sep.join(['mpl-data']) # This is where mpl data will be installed
     args = sys.argv
     for a in args:
         if a.startswith('--home='):
             dir = os.path.abspath(a.split('=')[1])
             datapath = os.path.join(dir, 'lib', 'python', 'matplotlib', 'mpl-data')
         elif a.startswith('--prefix='):
             dir = os.path.abspath(a.split('=')[1])
             pythonver = 'python'+py_version_short
             datapath = os.path.join(dir, 'lib', pythonver, 'site-packages', 'matplotlib',
'mpl-data')
         elif a.startswith('--install-data='):
             dir = os.path.abspath(a.split('=')[1])
             datapath = os.path.join(dir, 'mpl-data')
         else:
             pythonlib = distutils.sysconfig.get_python_lib(plat_specific=1)
             datapath = os.path.join(pythonlib, 'matplotlib', 'mpl-data')
else:
     datapath = INSTALL_SCHEMES[os.name]['platlib'].replace('$base/', '')
     datapath = os.sep.join([datapath, 'matplotlib', 'mpl-data']) # This is where mpl data
will be installed

Charlie Moad wrote:

Alright, I'll give it a shot and let you know.

On 12/7/05, John Hunter <jdhunter@...5...> wrote:

   > Would it be considered cleaner to embed the mpl data into
   > the matplotlib module? This would make it easier to
   > clean a mpl install. The data path could be expressed
   > fairly easily too, as a one-liner:

   > os.sep.join([os.path.split(matplotlib.__file__)[0],
   > 'matplotlib-data'])

Yes, if you can engineer in a way that works with setup w/ and w/o a
--prefix arg it would be preferable, in my view.

JDH

-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_idv37&alloc_id865&op=click
_______________________________________________
Matplotlib-devel mailing list
Matplotlib-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-devel

Charlie Moad wrote:
> Short explanation:
> Use --prefix instead
>
> Long explanation:
> Without explicitly moving around files in cvs and declaring the
> mpl data as package_data, it is pretty hard to be 100% compliant. The
> code you put below is basically how distutils now determines where to
> stick data_files, and that is why I used it as a guide too faking it.
> --home is unix specific I think, and I don't know that it gives you
> any power over --prefix. --install-data is useless now since the data
> is embedded into the matplotlib module itself.

There is another problem with the current approach. The current setup.py assumes
that if you have an egg-capable setuptools that you are building an egg and so
sets the data path for that. However, that's not always the case. For example,
the --single-version-externally-managed option should install matplotlib and
company as regular Python packages into site-packages (or wherever) with a
.egg-info/ directory alongside. This is how Debian (and presumably other
distros) is going to install eggified packages. However, the choice for the data
path ends up being incorrect.

I have ran into this as well, and it is just coming from the game of
trying to make the setup file work with distutils and setuptools.

I think a general rule might be to say that the innards of distutils are usually
a bad example for *using* distutils. It makes a lot of assumptions inside, and
the current mechanism in mpl's setup.py is fairly fragile. distutils is a piece
of junk, and really, really violates the "There should be one-- and preferably
only one --obvious way to do it," principle everywhere it possibly can, it seems.

The most robust approach seems to be this:

  http://wiki.python.org/moin/DistutilsInstallDataScattered

> I think I wrote a little a while back justifying the move, but
> I'll restate. If you look at older versions of the
> matplotlib._get_data_path() method, it was becoming a huge collection
> of special cases. Those cases are still in cvs, but commented out.
> It had the approach of try everything until I find the data. Now you
> could acutally write this method in one line, "return
> os.sep.join([os.path.dirname(__file__), 'mpl-data'])". However it
> does a little more by still checking the MATPLOTLIBDATA env variable
> first and verifying that the embedded mpl-data folder actually exists.

Have we ever considered moving to a path-based solution? For example, one would
set MATPLOTLIBDATAPATH to be a list of directories. When something inside
matplotlib needs data, it will go through the list of directories looking for
the file, and finally checking os.path.join([os.path.dirname(__file__),
'mpl-data']) if the file is not on the path. This would enable users without
privileges to manipulate site-packages or /usr/local/share to make replacements
or additions.

I left the check for this env variable there for this reason. Just in
case someone wants to put the data somewhere else on the system. It
doesn't support a list of directories now, but wouldn't you presume
the user who sets it knows where the data is? Privileges should not
be an issue at all now since the data is embedded in the module.

I still think the best approach is going to be to specify the mpldata
as package_data, like it is, instead of data_files. Then all the
logic in the setup file goes away. I tried this, but distutils would
not respect "../fonts" type directories. We would actually have to
move the data files into the mpl module.

Matplotlib is a python plugin, not an application. I can't think of
any other python modules that dump their data files around the system
during installation. I have seen many projects though with
glade/png/etc. files embedded into the module as package_data and they
avoid all these issues mentioned above.

···

On 1/25/06, Robert Kern <robert.kern@...149...> wrote:

Charlie Moad wrote:

I left the check for this env variable there for this reason. Just in
case someone wants to put the data somewhere else on the system. It
doesn't support a list of directories now, but wouldn't you presume
the user who sets it knows where the data is? Privileges should not
be an issue at all now since the data is embedded in the module.

Sure, but he may be putting data in multiple places (fonts in one directory,
colormaps in another, basemap data in a third, etc.). Or only providing a few
new pieces of data, not the complete suite of data.

And once we use a path-based approach, it would be easy to keep fonts and other
data in separate directories inside the package. You simply append both
directories to the end of MATPLOTLIBDATAPATH. AFAICT, that's the only objection
against moving the data into the lib/matplotlib/mpl-data/ in the source
distribution.

I still think the best approach is going to be to specify the mpldata
as package_data, like it is, instead of data_files. Then all the
logic in the setup file goes away. I tried this, but distutils would
not respect "../fonts" type directories. We would actually have to
move the data files into the mpl module.

Yes, I agree with you. Believe me, I'm not arguing against installing data in
the package itself.

Matplotlib is a python plugin, not an application. I can't think of
any other python modules that dump their data files around the system
during installation.

Oh, there are plenty, but they all suck for doing so. Unless if they are
following a particular standard, like Gnome or KDE applications.

···

--
Robert Kern
robert.kern@...149...

"In the fields of hell where the grass grows high
Are the graves of dreams allowed to die."
  -- Richard Harter