Persistent Matplotlib Figures

Does anyone know the status of development for Matplotlib persistent figure saving? I would like to be able to save the figures from matplotlib in an editable form, without flattening down to an image file. The closest thing to this right now is the SVG output, but a native mpl format would be better. I need to be able to save the figure, so that later it can be loaded, edited, and re-saved. I know that this topic has been somewhat discussed in the past, but I believe it is desperately needed, so I thought I would bring it back up.

Let me say why I think this feature is so essential. Anyone who is in research or academia knows that figures often need to be edited when a publication comes back from peer review. It’s already happened to me many times, and I’ve learned that I absolutely have to save my figures for later editing to save myself a lot of time. Some people have argued that a script that generates the plots/figures should be saved, and that if you need to edit the figure, just re-run the script. The problem with this argument is that scientific plots often take hours, days, or even weeks of computation to generate. For example, generating a bit-error-rate curve in communications takes days. Therefore, always re-running from a script is just not practical.

Now, I understand that resources are limited, so I would be willing to raise some money to get this feature added to Matplotlib. It’s desperately needed by myself and many others in the community. I would really like to completely replace Matlab with Python,Scipy, and Matplotlib. MPL is an excellent tool, and it could be even more useful/professional with the addition of a figure save feature.

Any thoughts?

-Joey Wilson

span.ece.utah.edu/joey-wilson

2009/12/16 Joey Wilson <doughywilson@...149...>:

Does anyone know the status of development for Matplotlib persistent figure
saving? I would like to be able to save the figures from matplotlib in an
editable form, without flattening down to an image file. The closest thing
to this right now is the SVG output, but a native mpl format would be
better. I need to be able to save the figure, so that later it can be
loaded, edited, and re-saved. I know that this topic has been somewhat
discussed in the past, but I believe it is desperately needed, so I thought
I would bring it back up.
Let me say why I think this feature is so essential. Anyone who is in
research or academia knows that figures often need to be edited when a
publication comes back from peer review. It's already happened to me many
times, and I've learned that I absolutely have to save my figures for later
editing to save myself a lot of time. Some people have argued that a script
that generates the plots/figures should be saved, and that if you need to
edit the figure, just re-run the script. The problem with this argument is
that scientific plots often take hours, days, or even weeks of computation
to generate. For example, generating a bit-error-rate curve in
communications takes days. Therefore, always re-running from a script is
just not practical.
Now, I understand that resources are limited, so I would be willing to raise
some money to get this feature added to Matplotlib. It's desperately needed
by myself and many others in the community. I would really like to
completely replace Matlab with Python,Scipy, and Matplotlib. MPL is an
excellent tool, and it could be even more useful/professional with the
addition of a figure save feature.
Any thoughts?

Leaving entirely aside any question of persistence, do you find
matplotlib plots to be modifiable in the ways you want? I find for
anything beyond minor changes of axes, I end up rerunning my plotting
command anyway - for example, I suppose it's possible to change a line
on an existing plot from red to black, but I just rerun the plotting
command. What about adding/removing error bars? changing the number of
bins, range, or starting position of your histogram? plotting the
square root instead of the logarithm of the image values? removing
bogus data points (or adding back in points you I previously removed)?
It seems to me that all of these things require me to keep the
original data around.

Since that's the case, I usually generate my plots in one of two ways:
either I just write a script that runs the calculation and generates
the plot, or I write one script to generate the data and save it to
disk, and another to plot the data from disk. This is sometimes mildly
annoying when a script is just a bit slow, but not enough to warrant
saving the data to disk. In those cases if I must I can run the script
under ipython and modify the plot, then save out the modifications to
a script.

Now, if you want a very-low-effort way to save your data to disk, I
agree that would be valuable to have, but there are, in ascending
order of complexity and power, the native numpy data format, pyfits,
and pytables/pyhdf.

Anne

Ignoring the issue of having saved matplotlib figures, I'd argue you
should separate the parts of the code that do computation from those
that do plotting into separate scripts. Is there anything keeping you
from saving all of the results from the computation into (for
instance) a NetCDF file? Then the plotting script can just read in
the file and do the plotting. This is exactly how my workflow is set
up. I'd be happy to address any concerns you see with doing things
this way.

Ryan

···

On Wed, Dec 16, 2009 at 4:45 PM, Joey Wilson <doughywilson@...149...> wrote:

Let me say why I think this feature is so essential. Anyone who is in
research or academia knows that figures often need to be edited when a
publication comes back from peer review. It's already happened to me many
times, and I've learned that I absolutely have to save my figures for later
editing to save myself a lot of time. Some people have argued that a script
that generates the plots/figures should be saved, and that if you need to
edit the figure, just re-run the script. The problem with this argument is
that scientific plots often take hours, days, or even weeks of computation
to generate. For example, generating a bit-error-rate curve in
communications takes days. Therefore, always re-running from a script is
just not practical.

--
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma

Joey Wilson wrote:

I would like to be able to save the figures from matplotlib in an editable form, without flattening down to an image file.

Now, I understand that resources are limited, so I would be willing to raise some money to get this feature added to Matplotlib.

I think to do this right, you'd need to completely re-design MPL to be based on a more declarative structure: i.e. you'd define what the objects were in a figure, and MPL would generate the figure from the declaration -- much like how an drawing is generated from SVG. Maybe it's not as big a re-factor I think it is, but it seems that MPL is built to be used from a scripting interface instead: a series of commands that builds the figure.

Honestly, for your purposes, I don't know that there is much difference. I suppose what you are looking for a is a way to get a script that you could edit and re-run, but have it generated from a figure automatically, the figure itself could have been generated by a different script (intermeshed with computational code), or an interactive session, or....

That would be pretty cool, but I think a bit of re-factoring of your process would make it pretty easy to edit and re-run your scripts anyway.

I would really like to completely replace Matlab with Python,Scipy, and Matplotlib.

How does Matlab handle this? In my Matlab days, I wrote scripts that generated my figures, and when I needed to change them, I edited the scripts and re-ran them -- exactly the workflow we're suggesting for Python/MPL. But that was 10 years ago...

-Chris

···

--
Christopher Barker, Ph.D.
Oceanographer

NOAA/OR&R/HAZMAT (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception