How to do million data-point plots with Matplotlib?

David_Smith · December 10, 2011, 6:12pm

I have been working on a program that uses Matplotlib to plot data
consisting of around one million points. Sometimes the plots succeed but
often I get an exception: OverFlowError: Agg rendering complexity exceeded.

I can make this message go away by plotting the data in "chunks" as
illustrated in the demo code below. However, the extra code is a chore
which I don't think should be necessary - I hope the developers will
be able to fix this issue sometime soon. I know that the development
version has some modifications to addressing this issue. I wonder if it is
expected to make the problem go away?

By the way, this plot takes about 30 seconds to render on my I7 2600k.
The main program reaches the show() statement quickly and prints
"Done plotting?". Then I see that the program reaches 100% usage
on one CPU core (4 real, 8 virtual on the 2600k) until the plot is
displayed. I wonder if there is any way to persuade Matplotlib to run
some of the chunks in parallel so as to use more CPU cores?

Plotting something other than random data, the plots run faster and
the maximum chunk size is smaller. The maximum chunk size
also depends on the plot size - it is smaller for larger plots. I am
wondering if I could use this to plot course and fine versions of the
plots. The course plot is zoomed in version of the small-sized raster.
That would be better than decimation as all the points would at least
be there.

Thanks in advance,

David

--------------------------- start code ---------------------------------
## Demo program shows how to "chunk" plots to avoid the exception:

···

##
## OverflowError: Agg rendering complexity exceeded.
## Consider downsampling or decimating your data.
##
## David Smith December 2011.

from pylab import *
import numpy as np

nPts=600100
x = np.random.rand(nPts)
y = np.random.rand(nPts)

## This seems to always succeed if Npts <= 20000, but fails
## for Npts > 30000. For points between, it sometimes succeeds
## and sometimes fails.
figure(1)
plot (x, y)

## Chunking the plot alway succeeds.
figure(2)
chunk_size=20000
iStarts=range(x.size/chunk_size)
for iStart in iStarts:
print "Plotting chunk starting at %d\n" % iStart
plot(x[iStart:iStart+chunk_size], y[iStart:iStart+chunk_size], '-b')

left_overs = nPts % chunk_size
if left_overs > 0:
print "Leftovers %d points\n" % left_overs
plot(x[-left_overs-1:], y[-left_overs-1:], '-r')

print "done plotting?"
show()
---------------------------------- end code ------------------------
Please don't reply to this post "It is rediculous to plot 1 million points on
screen". I am routinely capturing million-point traces from oscilloscopes and
other test equipment and to I need to be able to spot features in the
data (glitches if you will) that may not show up plotting decimated data.
I can then zoom the plot to inspect these features.

Michael_Droettboom1 · December 16, 2011, 2:11am

I have been working on a program that uses Matplotlib to plot data
consisting of around one million points. Sometimes the plots succeed but
often I get an exception: OverFlowError: Agg rendering complexity exceeded.

Are you sure path simplification is running? (i.e. the rcParam path.simplify is True)? That generally does a good job of removing excess points on the fly. You shouldn't need a development version for this to work. 0.99.x or later should be adequate. You're not going to "see" a million points at typical screen resolutions anyway.

I can make this message go away by plotting the data in "chunks" as
illustrated in the demo code below. However, the extra code is a chore
which I don't think should be necessary - I hope the developers will
be able to fix this issue sometime soon. I know that the development
version has some modifications to addressing this issue. I wonder if it is
expected to make the problem go away?

By the way, this plot takes about 30 seconds to render on my I7 2600k.
The main program reaches the show() statement quickly and prints
"Done plotting?". Then I see that the program reaches 100% usage
on one CPU core (4 real, 8 virtual on the 2600k) until the plot is
displayed. I wonder if there is any way to persuade Matplotlib to run
some of the chunks in parallel so as to use more CPU cores?

That would be great, but very difficult. The Python parts of the problem are tricky to parallelize due to the GIL. The Agg part of the problem will be difficult to parallelize unless there is a trivial way to chuck the plotted lines into parts before stroking -- each chunk could be rendered to its own buffer and then blended together in a final step. But all that is academic at this point -- there's no code to do such a thing now.

Plotting something other than random data, the plots run faster and
the maximum chunk size is smaller. The maximum chunk size
also depends on the plot size - it is smaller for larger plots. I am
wondering if I could use this to plot course and fine versions of the
plots. The course plot is zoomed in version of the small-sized raster.
That would be better than decimation as all the points would at least
be there.

I think what you're seeing is the effect of the path simplification algorithm. The number of points that it removes depends on the density of the points and the resolution of the output image. It's hard to predict exactly how many points it will remove.

Mike

···

On 12/10/2011 01:12 PM, David Smith wrote:

Thanks in advance,

David

--------------------------- start code ---------------------------------
## Demo program shows how to "chunk" plots to avoid the exception:
##
## OverflowError: Agg rendering complexity exceeded.
## Consider downsampling or decimating your data.
##
## David Smith December 2011.

from pylab import *
import numpy as np

nPts=600100
x = np.random.rand(nPts)
y = np.random.rand(nPts)

## This seems to always succeed if Npts<= 20000, but fails
## for Npts> 30000. For points between, it sometimes succeeds
## and sometimes fails.
figure(1)
plot (x, y)

## Chunking the plot alway succeeds.
figure(2)
chunk_size=20000
iStarts=range(x.size/chunk_size)
for iStart in iStarts:
     print "Plotting chunk starting at %d\n" % iStart
     plot(x[iStart:iStart+chunk_size], y[iStart:iStart+chunk_size], '-b')

left_overs = nPts % chunk_size
if left_overs> 0:
     print "Leftovers %d points\n" % left_overs
     plot(x[-left_overs-1:], y[-left_overs-1:], '-r')

print "done plotting?"
show()
---------------------------------- end code ------------------------
Please don't reply to this post "It is rediculous to plot 1 million points on
screen". I am routinely capturing million-point traces from oscilloscopes and
other test equipment and to I need to be able to spot features in the
data (glitches if you will) that may not show up plotting decimated data.
I can then zoom the plot to inspect these features.

------------------------------------------------------------------------------
Learn Windows Azure Live! Tuesday, Dec 13, 2011
Microsoft is holding a special Learn Windows Azure training event for
developers. It will provide a great way to learn Windows Azure and what it
provides. You can attend the event by watching it streamed LIVE online.
Learn more at http://p.sf.net/sfu/ms-windowsazure
_______________________________________________
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
matplotlib-users List Signup and Options