File format for plots

sam_tygier · February 25, 2009, 8:35am

I think this topic has come up before, but i don't think anything has resulted from it.

I'd like a way for saving a plot from from matplotlib, so that it can be re-rendered later, possibly with a different backend, maybe to a different size, and maybe with changes to the labels. This would save me having to rerun the simulation that generated the plot.

Ideally this would work by having a save_plot() function, that would save all state of the current plot into a file. This could then be loaded by a program to regenerate that plot.

I have made a rough prototype to demonstrate. It is incomplete. It only implements a very small subset of pylab.

I shall attach some files (if these get mangled, then i can upload them somewhere).

example1 and example2 are what the plot files might look like.

plot.py renders the plot files.
eg.
plot.py example1
plot.py example2 example.png

fakepylab.py is a wrapper around pylab that record you plotting, and offers a save_plot() function

test.py is script that uses fakepylab to create a plot file.

So does any of this look useful? What more might it need to be useful?

Any comments on the file format. Is there an existing standard that could be used instead? Would XML be better than plain ascii?

Sam Tygier

example1 (159 Bytes)

example2 (305 Bytes)

fakepylab.py (913 Bytes)

plot.py (1.41 KB)

test.py (120 Bytes)

_Troels_Kofoed_Jacob · February 25, 2009, 9:23am

I think this is a good idea, but why don't you just save your data to a file
and plot from a different script. If the data is only numbers you can just do
savetxt('data.dat',data) in you simulation script and then
data=loadtxt('data.dat') from your plot script...
Now if you also just use savefig('fig') without suffix, you can just run your
plot script like: python plot.py -DAgg or -DPS or whatever and it will plot to
the default format for that backend.

Best regards
Troels Kofoed Jacobsen

···

On Wednesday 25 February 2009 09:35:07 am sam tygier wrote:

I think this topic has come up before, but i don't think anything has
resulted from it.

I'd like a way for saving a plot from from matplotlib, so that it can be
re-rendered later, possibly with a different backend, maybe to a different
size, and maybe with changes to the labels. This would save me having to
rerun the simulation that generated the plot.

Ideally this would work by having a save_plot() function, that would save
all state of the current plot into a file. This could then be loaded by a
program to regenerate that plot.

I have made a rough prototype to demonstrate. It is incomplete. It only
implements a very small subset of pylab.

I shall attach some files (if these get mangled, then i can upload them
somewhere).

example1 and example2 are what the plot files might look like.

plot.py renders the plot files.
eg.
plot.py example1
plot.py example2 example.png

fakepylab.py is a wrapper around pylab that record you plotting, and offers
a save_plot() function

test.py is script that uses fakepylab to create a plot file.

So does any of this look useful? What more might it need to be useful?

Any comments on the file format. Is there an existing standard that could
be used instead? Would XML be better than plain ascii?

Sam Tygier

sam_tygier · February 25, 2009, 11:11pm

Troels Kofoed Jacobsen wrote:

···

On Wednesday 25 February 2009 09:35:07 am sam tygier wrote:

I think this topic has come up before, but i don't think anything has
resulted from it.

I'd like a way for saving a plot from from matplotlib, so that it can be
re-rendered later, possibly with a different backend, maybe to a different
size, and maybe with changes to the labels. This would save me having to
rerun the simulation that generated the plot.

I think this is a good idea, but why don't you just save your data to a file and plot from a different script. If the data is only numbers you can just do savetxt('data.dat',data) in you simulation script and then data=loadtxt('data.dat') from your plot script...
Now if you also just use savefig('fig') without suffix, you can just run your plot script like: python plot.py -DAgg or -DPS or whatever and it will plot to the default format for that backend.

Best regards
Troels Kofoed Jacobsen

That is one method that i have used, but i don't think it is ideal. My data can be a wide range of things, sometimes the coordinates of a bunch of many particles, sometimes the track of one. If I save just an array of numbers it can get a bit confusing. So it would be useful to be able to save everything needed to make the plot.

Sam Tygier

Joao_Luis_Silva · February 26, 2009, 10:03am

sam tygier wrote:

That is one method that i have used, but i don't think it is ideal. My data can be a wide range of things,
sometimes the coordinates of a bunch of many particles, sometimes the track of one. If I save just an array

> of numbers it can get a bit confusing. So it would be useful to be able to save everything needed to make the plot.

You could use a file format made for scientific data storage, such as netCDF or HDF5.

To use netCDF files from Python you can use either ScientificPython ( http://dirac.cnrs-orleans.fr/plone/software/scientificpython/ ) or Pupynere pupynere · PyPI
ScientificPython is bigger and more general, Pupynere is lightweight but you can run into some bugs.

For HDF5 you can use PyTables (http://www.pytables.org/).

These file types can store not only the data itself, but also it's type, name, units, and any other property you might like, for an arbitrary number of data sets. For some fields there are naming conventions conventions to guide you (ex: http://cf-pcmdi.llnl.gov/documents/cf-conventions/1.0/cf-conventions.html ).

Jo�o Silva

_Sandro_Tosi3 · February 28, 2009, 3:49pm

Hi Sam,

···

On Wed, Feb 25, 2009 at 09:35, sam tygier <samtygier@...705...> wrote:

I think this topic has come up before, but i don't think anything has
resulted from it.

I'd like a way for saving a plot from from matplotlib, so that it can be
re-rendered later, possibly with a different backend, maybe to a different
size, and maybe with changes to the labels. This would save me having to
rerun the simulation that generated the plot.

Ideally this would work by having a save_plot() function, that would save
all state of the current plot into a file. This could then be loaded by a
program to regenerate that plot.

Can't this be achieved by pickling/unpickling the mpl objects? Didn't
manage to test it, but it should work.

Of course, it might fall in uncompatibility from source (pickling)
environment and the destination (unpickling) one.

Regards,
--
Sandro Tosi (aka morph, morpheus, matrixhasu)
My website: http://matrixhasu.altervista.org/
Me at Debian: http://wiki.debian.org/SandroTosi

Eric_Firing2 · February 28, 2009, 6:07pm

Sandro Tosi wrote:

Hi Sam,

I think this topic has come up before, but i don't think anything has
resulted from it.

Correct, because the capability would require a *lot* of work to implement, and most of us don't see a compelling need; we believe that a better practice is to structure one's work so that plotting is separated from data (result) generation in any cases where the latter is highly time-consuming.

I'd like a way for saving a plot from from matplotlib, so that it can be
re-rendered later, possibly with a different backend, maybe to a different
size, and maybe with changes to the labels. This would save me having to
rerun the simulation that generated the plot.

Ideally this would work by having a save_plot() function, that would save
all state of the current plot into a file. This could then be loaded by a
program to regenerate that plot.

Can't this be achieved by pickling/unpickling the mpl objects? Didn't
manage to test it, but it should work.

No, this has been discussed several times. Quite a bit of work would be required to make all the extension code compatible with pickling. More work, more complexity, more difficult code maintenance and testing. It's not worth it, given the developer resources available for mpl.

Of course, it might fall in uncompatibility from source (pickling)
environment and the destination (unpickling) one.

Yes, pickling is fundamentally unreliable, and should be used only under controlled, non-critical circumstances, such as for caching.

Eric

···

On Wed, Feb 25, 2009 at 09:35, sam tygier <samtygier@...705...> wrote:

Regards,

Andrew_Straw5 · February 28, 2009, 6:36pm

Eric Firing wrote:

Sandro Tosi wrote:

Hi Sam,

I think this topic has come up before, but i don't think anything has
resulted from it.

Correct, because the capability would require a *lot* of work to
implement, and most of us don't see a compelling need; we believe that a
better practice is to structure one's work so that plotting is separated
from data (result) generation in any cases where the latter is highly
time-consuming.

One nice benefit, however, would be that the data could be shipped to
another interpreter for plotting without worrying about threads/GIL/etc.
So, having an MPL-native plot description would be useful. But, I agree,
it would be a lot of work.

···

On Wed, Feb 25, 2009 at 09:35, sam tygier <samtygier@...705...> wrote:

sam_tygier · March 1, 2009, 2:17pm

Eric Firing wrote:

Sandro Tosi wrote:

Hi Sam,

I think this topic has come up before, but i don't think anything has
resulted from it.

Correct, because the capability would require a *lot* of work to implement,

Would i be right in assuming that it would take roughly the same amount of effort as writing a new backend? ie for each motplotlib action it would need a function to store that action and a function to call that action again.

and most of us don't see a compelling need; we believe that a better practice is to structure one's work so that plotting is separated from data (result) generation in any cases where the latter is highly time-consuming.

It might not be essential, but it would offer an additional work flow, that a few people seem to like.

I think it would be especially useful when it comes to putting plots into papers. I often find that i want to tweak something like the font size or labels. having a modifiable plot format seems the easiest way to achieve that. maybe the could even be some integration into latex so that if you were to resize your plot in your paper, it would be re-rendered with the fonts adjusted.

I'd like a way for saving a plot from from matplotlib, so that it can be
re-rendered later, possibly with a different backend, maybe to a different
size, and maybe with changes to the labels. This would save me having to
rerun the simulation that generated the plot.

Ideally this would work by having a save_plot() function, that would save
all state of the current plot into a file. This could then be loaded by a
program to regenerate that plot.

Can't this be achieved by pickling/unpickling the mpl objects? Didn't
manage to test it, but it should work.

No, this has been discussed several times. Quite a bit of work would be required to make all the extension code compatible with pickling. More work, more complexity, more difficult code maintenance and testing. It's not worth it, given the developer resources available for mpl.

a file format avoids all the issues that pickling causes.

thanks for all the comments

sam tygier

···

On Wed, Feb 25, 2009 at 09:35, sam tygier <samtygier@...705...> wrote:

Eric_Firing2 · March 1, 2009, 8:02pm

sam tygier wrote:

Eric Firing wrote:

Sandro Tosi wrote:

Hi Sam,

I think this topic has come up before, but i don't think anything has
resulted from it.

Correct, because the capability would require a *lot* of work to implement,

Would i be right in assuming that it would take roughly the same amount of effort as writing a new backend? ie for each motplotlib action it would need a function to store that action and a function to call that action again.

It is much more than that; it would take a backend to write out the new format, and an interpreter to turn that format back into mpl objects or API calls.

One of the mpl backends is svg; can you use something like Inkscape to make the plot adjustments you are talking about?

Eric

···

On Wed, Feb 25, 2009 at 09:35, sam tygier <samtygier@...705...> wrote:

and most of us don't see a compelling need; we believe that a better practice is to structure one's work so that plotting is separated from data (result) generation in any cases where the latter is highly time-consuming.

It might not be essential, but it would offer an additional work flow, that a few people seem to like.

I think it would be especially useful when it comes to putting plots into papers. I often find that i want to tweak something like the font size or labels. having a modifiable plot format seems the easiest way to achieve that. maybe the could even be some integration into latex so that if you were to resize your plot in your paper, it would be re-rendered with the fonts adjusted.

I'd like a way for saving a plot from from matplotlib, so that it can be
re-rendered later, possibly with a different backend, maybe to a different
size, and maybe with changes to the labels. This would save me having to
rerun the simulation that generated the plot.

Ideally this would work by having a save_plot() function, that would save
all state of the current plot into a file. This could then be loaded by a
program to regenerate that plot.

Can't this be achieved by pickling/unpickling the mpl objects? Didn't
manage to test it, but it should work.

No, this has been discussed several times. Quite a bit of work would be required to make all the extension code compatible with pickling. More work, more complexity, more difficult code maintenance and testing. It's not worth it, given the developer resources available for mpl.

a file format avoids all the issues that pickling causes.

thanks for all the comments

sam tygier

------------------------------------------------------------------------------
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H
_______________________________________________
Matplotlib-devel mailing list
Matplotlib-devel@lists.sourceforge.net
matplotlib-devel List Signup and Options

_Ryan_May · March 2, 2009, 7:49pm

Other than the automatic regeneration from latex, what you want sounds like what we already have: small python scripts.

In general, I’m completely amazed by how many people want to develop a new markup/script language to wrap what is already a simple and expressive language, both for plots and (at least around here) analyses. If there are some spots that require too many lines of code to accomplish something really simple, then maybe we need to API additions. But in general, I think we have a format for specifying how to make a plot: python. Now, if we’re taking the output from some monstrous application or set of scripts, it might be nice to get the commands that made the plot, like MayaVi 2 and its ability to record. However, at the end of the day what MayaVi creates is a python script, and I think that’s more useful than any general markup since I can look at that file and figure out what’s going on without learning anything new.

Now, a matplotlib backend that writes out python code could be useful and cool, though it would only matter for the large applications/scripts. In fact, it’s at the application level that such functionality would probably belong.

My 0.02 anyways.

Ryan

···

On Sun, Mar 1, 2009 at 8:17 AM, sam tygier <samtygier@…705…> wrote:

Eric Firing wrote:

Sandro Tosi wrote:

Hi Sam,

On Wed, Feb 25, 2009 at 09:35, sam tygier <samtygier@…705…> wrote:

I think this topic has come up before, but i don’t think anything has

resulted from it.

Correct, because the capability would require a lot of work to

implement,

Would i be right in assuming that it would take roughly the same amount of effort as writing a new backend? ie for each motplotlib action it would need a function to store that action and a function to call that action again.

and most of us don’t see a compelling need; we believe that a

better practice is to structure one’s work so that plotting is separated

from data (result) generation in any cases where the latter is highly

time-consuming.

It might not be essential, but it would offer an additional work flow, that a few people seem to like.

I think it would be especially useful when it comes to putting plots into papers. I often find that i want to tweak something like the font size or labels. having a modifiable plot format seems the easiest way to achieve that. maybe the could even be some integration into latex so that if you were to resize your plot in your paper, it would be re-rendered with the fonts adjusted.

–
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
Sent from: Norman Oklahoma United States.

Gael_Varoquaux1 · March 2, 2009, 9:52pm

Although I agree with you that reinventing an extra scripting layer is
often a bad solution to a problem which should simply be solved by having
a good scripting API in Python, I believe there is here a fundamental
misconception.

Python is an imperative, Turing-complete. This is a very good thing for a
scripting language. For making a description of a static object as a
plot, this is not a good thing. For instance, if I want to make a plot,
save it, and later blow up all the fonts, I really don't want to be using
an imperative, Turing-complete language for the persistence model, as
static analysis of this persisted object is going to be next to
impossible. Same thing if I want to change colormaps, or just about
anything in my persisted object, for the same reason.

A good rule for most software design is that the state of the
application, or of the object of interest, in our case the plot, should
be fully represented by a fully-static set of values, that I like to call
the model. Although this sounds like a tautology, this design rule is
more often broken than followed. For instance the status of an
application may be entirely dependent on its past, or the important state
variables may be hidden in places where you can't get hold of them (eg
the status of a GUI widget, or inside a generator).

Having a very clean separation between your (fully-static) model, and the
logics around is a very important part of good application design, and I
believe I know this because I have so often made an error and violated
this rule :).

If you have this static model, rather than an imperative language, then
you can have persistence. By the way, Mayavi2 achieves its code
generation by introspection on the model. The generated lines of code are
just a way of expressing the changes.

Sorry for being fussy, I am just trying to pass on what I believe I am
learning painfully :).

Ga�l

···

On Mon, Mar 02, 2009 at 01:49:38PM -0600, Ryan May wrote:

Other than the automatic regeneration from latex, what you want sounds
like what we already have: small python scripts.

   In general, I'm completely amazed by how many people want to develop a new
   markup/script language to wrap what is already a simple and expressive
   language, both for plots and (at least around here) analyses.� If there
   are some spots that require too many lines of code to accomplish something
   really simple, then maybe we need to API additions. But in general, I
   think we have a format for specifying how to make a plot: python.�

_Ryan_May · March 3, 2009, 5:23pm

Not at all. You made some good points. I hadn’t really thought about the prospect of things changing in the core of the rest of the code. It was probably just a knee jerk reaction to something I hear a lot around here, regarding making a small language/configuration file for automating analyses in python.

Ryan

···

On Mon, Mar 2, 2009 at 3:52 PM, Gael Varoquaux <gael.varoquaux@…427…> wrote:

On Mon, Mar 02, 2009 at 01:49:38PM -0600, Ryan May wrote:

Other than the automatic regeneration from latex, what you want sounds

like what we already have: small python scripts.

In general, I’m completely amazed by how many people want to develop a new

markup/script language to wrap what is already a simple and expressive

language, both for plots and (at least around here) analyses. If there

are some spots that require too many lines of code to accomplish something

really simple, then maybe we need to API additions. But in general, I

think we have a format for specifying how to make a plot: python.

Although I agree with you that reinventing an extra scripting layer is

often a bad solution to a problem which should simply be solved by having

a good scripting API in Python, I believe there is here a fundamental

misconception.

Python is an imperative, Turing-complete. This is a very good thing for a

scripting language. For making a description of a static object as a

plot, this is not a good thing. For instance, if I want to make a plot,

save it, and later blow up all the fonts, I really don’t want to be using

an imperative, Turing-complete language for the persistence model, as

static analysis of this persisted object is going to be next to

impossible. Same thing if I want to change colormaps, or just about

anything in my persisted object, for the same reason.

A good rule for most software design is that the state of the

application, or of the object of interest, in our case the plot, should

be fully represented by a fully-static set of values, that I like to call

the model. Although this sounds like a tautology, this design rule is

more often broken than followed. For instance the status of an

application may be entirely dependent on its past, or the important state

variables may be hidden in places where you can’t get hold of them (eg

the status of a GUI widget, or inside a generator).

Having a very clean separation between your (fully-static) model, and the

logics around is a very important part of good application design, and I

believe I know this because I have so often made an error and violated

this rule :).

If you have this static model, rather than an imperative language, then

you can have persistence. By the way, Mayavi2 achieves its code

generation by introspection on the model. The generated lines of code are

just a way of expressing the changes.

Sorry for being fussy, I am just trying to pass on what I believe I am

learning painfully :).

–
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
Sent from: Norman Oklahoma United States.

_John_Hunter · March 3, 2009, 6:08pm

I don’t think this approach would be viable, because the backend doesn’t know the progeny of the object (eg a tick line). I think to have a proper serialized format, you would want to do it at the artist layer.

JDH

···

On Sun, Mar 1, 2009 at 2:02 PM, Eric Firing <efiring@…706…29…> wrote:

Would i be right in assuming that it would take roughly the same amount of effort as writing a new backend? ie for each motplotlib action it would need a function to store that action and a function to call that action again.

It is much more than that; it would take a backend to write out the new

format, and an interpreter to turn that format back into mpl objects or

API calls.

_Eric_Bruning · March 3, 2009, 8:44pm

One of the mpl backends is svg; can you use something like Inkscape to
make the plot adjustments you are talking about?

Eric [F]

I'll second this recommendation - indeed, it's my default workflow
(except that I use Illustrator). By definition, vector image formats
contain all the data needed to (re)make the plot. Everything can be
rescaled, line weights changed, colors modified, etc.

-Eric B