Matplotlib performance

> Main issue is Matplotlib's performance. I'm trying to plot a current
> trace from a physics experiment, containing about 300,000 data points.
> In LabVIEW, one can easily browse through a data set like this, but I
> haven't been able yet to get such a good performance with
> IPython+Matplotlib. Especially scrolling/panning through the data is
> sluggish. (Anyone knows how to add a scrollbar for this instead of
> panning with the mouse, btw?)
>

http://matplotlib.sf.net/examples/embedding_in_gtk3.py shows an
example using a scrolled window.

You could also use the "clipped line" approach to pass in a custom
class that only plots the data in the current view limits defined by
timemin, timemax. See
http://matplotlib.sf.net/examples/clippedline.py. This example
changes the marker and line style depending on how many points are in
the view port, but you could expand on this idea to do downsampling
when the number of points is too large.

Hi Onno and JDH,

JDH, I have just started using matplotlib and love it. Thanks so much for your
work.

I have come across the same performance issues. My vote is for bringing
clipped line back and even making it the default. A check may be needed in the
constructor to make sure it is sorted, but I think it is worth it. If the
program is used for its primary original intent (plotting), the vast majority
are going to be increasing in x.

I am including a class based on ClippedLine that does decimation. Please reply
if you have improvements and please consider putting something like it in the
code. This probably should not be used as default, though, because it may not
be what the user expects. For example, if Onno is looking for very short
duration spikes, they will not get plotted. That is the nature of the
decimation beast. And, the filter requires the x data to be equally spaced.

With decimation you not only get performance increases, but you also get rid of
the smooching that occurs if the data is not monotonic so you can actually see
something.

Here are the performance results on my computer:
it took -0.511511087418 seconds for matplotlib.lines.Line2D to draw()
it took -0.4196870327 seconds for __main__.ClippedLine to draw()
downsampling plotted line...
it took -0.11829996109 seconds for __main__.DecimatedClippedLine to draw()

from matplotlib.lines import Line2D
import numpy as npy
from pylab import figure, show, draw
import scipy.signal
import time

# adjusted from /usr/share/doc/matplotlib-0.91.2/examples/clippedline.py
class ClippedLine(Line2D):
    """
    Clip the xlimits to the axes view limits -- this example assumes x is sorted
    """

    def __init__(self, ax, *args, **kwargs):
        Line2D.__init__(self, *args, **kwargs)
  ## axes the line is plotted in
        self.ax = ax

    def set_data(self, *args, **kwargs):
        Line2D.set_data(self, *args, **kwargs)
  ## what is plotted pre-clipping
        self.xorig = npy.array(self._x)
  ## what is plotted pre-clipping
        self.yorig = npy.array(self._y)

    def draw(self, renderer):
        xlim = self.ax.get_xlim()

        ind0, ind1 = npy.searchsorted(self.xorig, xlim)
        self._x = self.xorig[ind0:ind1]
        self._y = self.yorig[ind0:ind1]

        Line2D.draw(self, renderer)

class DecimatedClippedLine(Line2D):
  """
  Decimate and clip the data so it does not take as long to plot. Assumes data
is sorted and equally spaced.
  """

  def __init__(self, ax, *args, **kwargs):
    """
    *Parameters*:
      ax:
  axes the line is plotted on

      *args, **kwargs:
  Line2D args

    """
    Line2D.__init__(self, *args, **kwargs)
    ## axes the line is plotted in
    self.ax = ax

  def set_data(self, *args, **kwargs):
    Line2D.set_data(self, *args, **kwargs)
    ## data preclipping and decimation
    self.xorig = npy.array(self._x)
    ## data pre clipping and decimation
    self.yorig = npy.array(self._y)

  def draw(self, renderer):
    bb = self.ax.get_window_extent()
    width = bb.width()

    xlim = self.ax.get_xlim()
    ind0, ind1 = npy.searchsorted(self.xorig, xlim)

    if self.ax.get_autoscale_on():
      ylim =
self.ax.get_xlim()
                                                                        
      self.ax.set_ylim( min([ylim[0], self._y.min()]), max([ylim[1],
self._y.max()]) )

    self._x = self.xorig[ind0:ind1]
    self._y = self.yorig[ind0:ind1]
    if width / float( ind1 - ind0 ) < 0.4: # if number of points to plot is
much greater than the pixels in the plot
      b, a = scipy.signal.butter(5, width / float( ind1 - ind0 ) )
      print 'downsampling plotted line...'

      filty = scipy.signal.lfilter( b, a, self._y )
      
      step = int( ( ind1 - ind0 ) / width )
      self._x = self._x[::step]
      self._y = filty[::step]

    Line2D.draw(self, renderer)

t = npy.arange(0.0, 100.0, 0.0001)
s = npy.sin(2*npy.pi*t)
s += (npy.random.rand( len(t) ) - 0.5)*3.0

for i in xrange(3):
  starttime = time.time()
  fig = figure(i)
  ax = fig.add_subplot(111, autoscale_on=False)
  if i == 0:
    line = Line2D(t, s, color='g', ls='-', lw=2)
  elif i == 1:
    line = ClippedLine(ax, t, s, color='g', ls='-', lw=2)
  elif i == 2:
    line = DecimatedClippedLine(ax, t, s, color='g', ls='-', lw=2)
  ax.add_line(line)
  ax.set_xlim(10,20)
  ax.set_ylim(-3.3,3.3)
  ax.set_title( str(line.__class__).replace('_','\_') )
  draw()
  endtime = time.time()
  print 'it took', starttime-endtime, 'seconds for', str(line.__class__), 'to
draw()'
  
show()

thewtex wrote:

Main issue is Matplotlib's performance. I'm trying to plot a current
trace from a physics experiment, containing about 300,000 data points.
In LabVIEW, one can easily browse through a data set like this, but I
haven't been able yet to get such a good performance with
IPython+Matplotlib. Especially scrolling/panning through the data is
sluggish. (Anyone knows how to add a scrollbar for this instead of
panning with the mouse, btw?)

http://matplotlib.sf.net/examples/embedding_in_gtk3.py shows an
example using a scrolled window.

You could also use the "clipped line" approach to pass in a custom
class that only plots the data in the current view limits defined by
timemin, timemax. See
http://matplotlib.sf.net/examples/clippedline.py. This example
changes the marker and line style depending on how many points are in
the view port, but you could expand on this idea to do downsampling
when the number of points is too large.

Hi Onno and JDH,

JDH, I have just started using matplotlib and love it. Thanks so much for your work.

I have come across the same performance issues. My vote is for bringing clipped line back and even making it the default. A check may be needed in the constructor to make sure it is sorted, but I think it is worth it. If the program is used for its primary original intent (plotting), the vast majority are going to be increasing in x.

I am including a class based on ClippedLine that does decimation. Please reply if you have improvements and please consider putting something like it in the code. This probably should not be used as default, though, because it may not be what the user expects. For example, if Onno is looking for very short duration spikes, they will not get plotted. That is the nature of the decimation beast. And, the filter requires the x data to be equally spaced.

With decimation you not only get performance increases, but you also get rid of the smooching that occurs if the data is not monotonic so you can actually see something.

I agree that exploration of large data sets is an important application, and that we need to speed it up. A couple days ago I added automatic subsetting (but not decimation--although this could be added easily) to image drawing, and that made a big difference for panning and zooming using imshow or pcolorfast with regular grids.

An easy, built-in interface makes sense for line/marker plotting as well, but it will take some thought to figure out exactly what that interface should be. The line plotting case (including things like scatter) is more complicated than the image. Probably optimizations should be specified via kwargs, not by default.

Clipping should not be to points inside the xlim, but should include one more point on each side so that lines go to the edge of the box.

Eric

I agree that exploration of large data sets is an important application,
and that we need to speed it up. A couple days ago I added automatic
subsetting (but not decimation--although this could be added easily) to
image drawing, and that made a big difference for panning and zooming
using imshow or pcolorfast with regular grids.

Cool.

Low-pass filtering is more work to implement and takes away from the
computational gains, but it's necessary to prevent aliasing a la the
Nyquist-Shannon theorem.

An easy, built-in interface makes sense for line/marker plotting as
well, but it will take some thought to figure out exactly what that
interface should be. The line plotting case (including things like
scatter) is more complicated than the image. Probably optimizations
should be specified via kwargs, not by default.

true

Clipping should not be to points inside the xlim, but should include one
more point on each side so that lines go to the edge of the box.

Good point. As I understand npy.searchsorted(), it should then be

  ind0 = npy.searchsorted(self.xorig, xlim[0], side='left')
  ind1 = npy.searchsorted(self.xorig, xlim[1], side='right')

instead of

  ind0, ind1 = npy.searchsorted(self.xorig, xlim)