Saving data used to generate figures (with POC)

I know we should all do 100% reproducible data analysis and save all our data before plotting them, but unfortunately often the overheads of doing that are still too large. So I wanted to automate saving data used to generate a figure side-by-side with the figure itself.

I’ve made a little proof of concept of this which wraps pyplot functions keeping track of data being plotted, and when savefig() is called stores the data next to the figure, and also produces a short python script that replots the figure from the data.

The POC is available at

Its fairly straightforward piece of code but works fine with basic plot, scatter and hist commands.

Any feedback very much appreciated and if anybody would like to contribute toward completing the functionality that would be very welcome. And of course even better if eventually this is worth integrating into matplotlib itself.

This is super cool! There’s been some conversation of doing something sort of like this - being able to save a figure in a reproducible/serialized manner using a .json or multiple file compressed format like .doc or .shp where it can also have python and the like as needed. I think for long term serializability, the goal would be more focused on just what goes into the figure (so preliminary steps that aren’t necessarily matplotlib for example would get stripped out) so that loading the figure would be more about changing view limits or styling or the like.

Thanks. I’m making this part of my normal workflow and and so will work on the library. I’ll update on where I get to in a few weeks…

1 Like

While interesting, I have some concerns about this method of serializing the state of the figure.

This approach, keeping a log of what the user did, has been used to good effect by some plotting applications (I believe paraview and photoshop both have functionality like this), however for it to work you have to be sure to capture everything the user may do. In the context of a GUI application it is clear how to be sure you get everything, but that is less clear to me in the context of a library.

If you are going to go down this path, rather than decorating pyplot functions, I suggest decorating the Figure and Axes classes.

An alternate path would be work on the Figure objects (which contain all of the state of the final plot) and export based on that.

If you want to write tests (to ensure that you can correctly round-trip plots), have a look at pytest-mpl.