'pyplot' interface and memory management

hakostra · September 9, 2021, 11:27am

I have been using Matplotlib and Python for ~10 years or so and consider myself quite skilled. I just recently realized something that puzzled me and want to share my thoughts:

All Matplotlib tutorials, guides and examples usually start with import matplotlib.pyplot as plt. That’s fine and what I have been doing since day 0. I also very much like the object-oriented way of making plots and a typical plotting code can then be something like:

import matplotlib.pyplot as plt
import numpy as np

x = np.linspace(0.0, 1.0)
y = x**2

fig, ax = plt.subplots(1, 1)
ax.plot(x, y)
fig.savefig("parabola.pdf")

Which according to my own observations, is more or less a “textbook example” of very simple Matplotlib usage.

I have a program that generate plots from simulation results. Usually just a few, but at some rare occasions there can be hundreds. Recently I got the warning:

RuntimeWarning: More than 20 figures have been opened. Figures created through the pyplot interface (`matplotlib.pyplot.figure`) are retained until explicitly closed and may consume too much memory. (To control this warning, see the rcParam `figure.max_open_warning`).
  fig, ax = plt.subplots(1, 1)

And I was puzzled… Why do I have 20 figures open??? What?

I have my plotting nicely organized in a separate function, and my fig and ax objects are not automatically refcounted and deleted by the interpreter’s garbage collection???

I have been aware that there are the very-old-school and in my opinion extremely cumbersome matlab-like plotting functions like plt.plot(...), plt.xlabel(...) which require an explicit closing, like in Matlab, but I have always been under the impression that the modern object-oriented interfaces through fig and ax were not affected by this.

A solution to the memory leak and warning is to close the figure with plt.close(fig) apparently. However, this counteract some of the beauty of the OO-interfaces.

But the aim of the post is to understand: how could I have been so wrong for 10 years?

When I look at more or less every Matplotlib example I understand: They all begin with import matplotlib.pyplot as plt and none of them ends with plt.close(fig). There are almost no examples on the Matplotlib gallery that ends with closing the figure properly. In my opinion this is training users in creating deliberate memory leaks!

In the end my preferred solution is currently to stop using pyplot and instead do:

import matplotlib.figure as figure
import numpy as np

x = np.linspace(0.0, 1.0)
y = x**2

fig = figure.Figure()
ax = fig.subplots(1, 1)
ax.plot(x, y)
fig.savefig("parabola.pdf")

then the fig and ax seems to be properly refcounted and deleted when they go out of scope as any other Python object usually is (am i right??).

In my opinion the latter variant is far more elegant. For anyone that is skilled in OO programming and Python this is way more intuitive than to have this external teardown-method plt.close(fig) that must be called manually on the fig object (what about the `ax).

My summary:

Shouldn’t the examples be more correct and close their figures when finished, to educate the users? Why are there almost no examples that close the figure properly?
Why is not import matplotlib.figure as figure the default, modern way of using Matplotlib? Why is there not more examples using this alone without the pyplot singelton interfaces?

tacaswell · September 10, 2021, 8:48pm

TL;DR: you are correct about basically everything @hakostra but there are good reasons!

You can think of Matplotlib as have three main layers:

The top layer are functions / methods that take user data and generate Artists or one sort or the other. I include in this category most of the Axes methods, the pyplot functions, and third-party libraries (like seaborn, plotnine, the pandas/xarray plotting methods, …). Anything that takes in something a user would call “data” and “style” an instantiates a bunch of Artist instances.
The Artist see ayer which is our internal (intermediate) representation of what visualization the user asked us to make. These objects include everything that can be known about that part of the visualization (e.g. the Line2D classes know their data, their transform, the linestyle, the marker shape, a number of colors, …). Because these are objects you can reach in and mutate this state after you create them (e.g. for animation). These Aritsts are arrange in a tree with the Figure instance at the top. obj.get_children (docs) get you the next generation down.
The backends which own the code to output (well enable artists to output them selves) to a hard copy of some sort (raster format, raster format embedded in a UI, vector format). The API for this is that the Figure has a canvas. When you save a Figure we ask the Canvas for a Renderer which we then pass into Figure.draw(renderer) which will in turn call child.draw on its children and so on. See mpl.backend_bases for more details on the Canvas and Render API work.

When you are using one of the interactive backend that binds to a desktop UI toolkit, what is actually happening is that Matplotlib is becoming a full-blown GUI application (in some sense one of the most amazing technical achievements of Matplotlib is we have made millions of users cross-toolkit GUI application developers without them knowing ). Being a GUI applications implies a level of global state that we need to maintain (far example so that plt.show() will be able to bring all of the windows!) and we want to make sure that if the user creates a Figure in a function, but does not return anything, that the window will remain on their screen until they close it.

The bookkeeping for keeping track of all currently existing Figures is done by pyplot (and specfically by matplotlib._pylab_helpers.Gcf) and plt.figure, plt.subplots() and plt.subplot_mosaic use this machinery to:

Create a figure with the correct Canvas for your configrued backend
Register that Figure with Gcf so that it “does the right thing” from a UX point of view

One wrinkle that actually makes the situation worse than you describe is that while you can always create the underlying c++ objects from GUI toolkits not all of them (noteable Qt) will tear them down without spinning the event loop which can lead to resource leaks even if you do plt.close(fig) or plt.close('all')! Something has changed someplace (maybe in us, maybe in Qt, maybe in the Python bindings for Qt that has made this much worse recently.

There is also the implicit assumption in many of the tutorials and example that they are either being run as a short-top-to-bottom script with a plt.show(block=True) at the bottom (which will stop blocking when all of the figures are closed and if it is the last line of your script, exit Python) or the reader is at an interactive command prompt where there is a very natural back-pressure on having too many windows (where the user starts to close them because they have too many windows!).

Until only recently (starting with mpl3.1) started creating Figure instances with a sufficiently functioning canvas attached. Prior to this, creating a Figure without pyplot was a multi-step process (as shown in embedding in Qt but with a non-interactive backend’s canvas) so it was not widely documented outside of the GUI embedding examples.

One major downside of using fig = figure.Figure is that the Canvas class attached will be a FigureCanvasBase which has enough of an implementation that saving will work, but it knows nothing about GUI framework etc so there is no way to get it onto the screen other than saving it. Given the assumptions about what are users are doing above, not being able to get a GUI up by default is less than ideal. To address this we have started to sketch out code that would allow you to take these naive figures and promote them to GUI aware figures as you want: GitHub - tacaswell/mpl-gui: Prototype for mpl-gui module but that has become (very sadly) stalled (because it kept getting pushed down my to-do list )

We have also had some (very long running) discussion about how to use context manager to have a “local” version of the current axis to make it a bit more fluid to work with. This is motivated in part by a desire move away from methods to free functions for much of the API.

So in conclusion, @hakostra very good and thoughtful question I hope my response helps clarify some things for you