Large datasets performance....

Hello,

I'm using matplotlib for various tasks beautifully...but on some occasions,
I have to visualize large datasets (in the range of 10M data points) (using
imshow or regular plots)...system start to choke a bit at that point...

I would like to be consistent somehow and not use different tools for
basically similar tasks...
so I'd like some pointers regarding rendering performance...as I would be
interested to be involved in dev is there is something to be done....

To active developers, what's the general feel does matplotlib have room to
spare in its rendering performance?...
or is it pretty tied down to the speed of Agg right now?
Is there something to gain from using the multiprocessing module now
included by default in 2.6?
or even go as far as using something like pyGPU for fast vectorized
computations...?

I've seen around previous discussions about OpenGL being a backend in some
future...
would it really stand up compared to the current backends? is there clues
about that right now?

thanks for any inputs! :smiley:
bye

···


View this message in context: http://www.nabble.com/Large-datasets-performance…-tp24074329p24074329.html
Sent from the matplotlib - devel mailing list archive at Nabble.com.

Hello,

To give you some hints on performances using OpenGL, you can have a look
at glumpy: http://www.loria.fr/~rougier/tmp/glumpy.tgz
(It requires pyglet for the OpenGL backend).

It is not yet finished but it is usable. Current version allows to
visualize static numpy float32 array up to 8000x8000 and dynamic numpy
float32 array around 500x500 depending on GPU hardware (dynamic means
that you update image at around 30 fps/second).

The idea behind glumpy is to directly translate a numpy array into a
texture and to use shaders to make the colormap transformation and
filtering (nearest, bilinear or bicubic).

Nicolas

···

On Wed, 2009-06-17 at 07:02 -0700, vehemental wrote:

Hello,

I'm using matplotlib for various tasks beautifully...but on some occasions,
I have to visualize large datasets (in the range of 10M data points) (using
imshow or regular plots)...system start to choke a bit at that point...

I would like to be consistent somehow and not use different tools for
basically similar tasks...
so I'd like some pointers regarding rendering performance...as I would be
interested to be involved in dev is there is something to be done....

To active developers, what's the general feel does matplotlib have room to
spare in its rendering performance?...
or is it pretty tied down to the speed of Agg right now?
Is there something to gain from using the multiprocessing module now
included by default in 2.6?
or even go as far as using something like pyGPU for fast vectorized
computations...?

I've seen around previous discussions about OpenGL being a backend in some
future...
would it really stand up compared to the current backends? is there clues
about that right now?

thanks for any inputs! :smiley:
bye

vehemental wrote:

Hello,

I'm using matplotlib for various tasks beautifully...but on some occasions,
I have to visualize large datasets (in the range of 10M data points) (using
imshow or regular plots)...system start to choke a bit at that point...
  

The first thing I would check is whether your system becomes starved for memory at this point and virtual memory swapping kicks in.

A common technique for faster plotting of image data is to downsample it before passing it to matplotlib. Same with line plots -- they can be decimated. There is newer/faster path simplification code in SVN trunk that may help with complex line plots (when the path.simplify rcParam is True). I would suggest starting with that as a baseline to see how much performance it already gives over the released version.

I would like to be consistent somehow and not use different tools for
basically similar tasks...
so I'd like some pointers regarding rendering performance...as I would be
interested to be involved in dev is there is something to be done....

To active developers, what's the general feel does matplotlib have room to
spare in its rendering performance?...
  

I've spent a lot of time optimizing the Agg backend (which is already one of the fastest software-only approaches out there), and I'm out of obvious ideas. But a fresh set of eyes may find new things. An advantage of Agg that shouldn't be overlooked is that is works identically everywhere.

or is it pretty tied down to the speed of Agg right now?
Is there something to gain from using the multiprocessing module now
included by default in 2.6?
  

Probably not. If the work of rendering were to be divided among cores, that would probably be done at the C++ level anyway to see any gains. As it is, the problem with plotting many points generally tends to be limited by memory bandwidth anyway, not processor speed.

or even go as far as using something like pyGPU for fast vectorized
computations...?
  

Perhaps. But again, the computation isn't the bottleneck -- it's usually a memory bandwidth starvation issue in my experience. Using a GPU may only make matters worse. Note that I consider that approach distinct from just using OpenGL to colormap and render the image as a texture. That approach may bear some fruit -- but only for image plots. Vector graphics acceleration with GPUs is still difficult to do in high quality across platforms and chipsets and beat software for speed.

I've seen around previous discussions about OpenGL being a backend in some
future...
  would it really stand up compared to the current backends? is there clues
about that right now?

thanks for any inputs! :smiley:
bye
  

Hope this helps,
Mike

···

--
Michael Droettboom
Science Software Branch
Operations and Engineering Division
Space Telescope Science Institute
Operated by AURA for NASA

2009/6/17 Michael Droettboom <mdroe@…31…>

vehemental wrote:

Hello,

I’m using matplotlib for various tasks beautifully…but on some occasions,

I have to visualize large datasets (in the range of 10M data points) (using

imshow or regular plots)…system start to choke a bit at that point…

The first thing I would check is whether your system becomes starved for memory at this point and virtual memory swapping kicks in.

the python process is sitting around a 300Mo of memory comsumption…there should plenty of memory left…

but I will look more closely to what’s happenning…
I would assume the Memory bandwidth to not be very high, given the cheapness of the comp i’ m using :smiley:

A common technique for faster plotting of image data is to downsample it before passing it to matplotlib. Same with line plots – they can be decimated. There is newer/faster path simplification code in SVN trunk that may help with complex line plots (when the path.simplify rcParam is True). I would suggest starting with that as a baseline to see how much performance it already gives over the released version.

yes totally make sense…no need to visualize 3 millions points if you can only display 200 000…
I’m already doing that to some extent, but it’s taking time on its own…but at least I have solutions to reduce this time if needed…

i’ ll try the SVN version…see if I can extract some improvements…

I would like to be consistent somehow and not use different tools for

basically similar tasks…

so I’d like some pointers regarding rendering performance…as I would be

interested to be involved in dev is there is something to be done…

To active developers, what’s the general feel does matplotlib have room to

spare in its rendering performance?..

I’ve spent a lot of time optimizing the Agg backend (which is already one of the fastest software-only approaches out there), and I’m out of obvious ideas. But a fresh set of eyes may find new things. An advantage of Agg that shouldn’t be overlooked is that is works identically everywhere.

or is it pretty tied down to the speed of Agg right now?

Is there something to gain from using the multiprocessing module now

included by default in 2.6?

Probably not. If the work of rendering were to be divided among cores, that would probably be done at the C++ level anyway to see any gains. As it is, the problem with plotting many points generally tends to be limited by memory bandwidth anyway, not processor speed.

or even go as far as using something like pyGPU for fast vectorized

computations…?

Perhaps. But again, the computation isn’t the bottleneck – it’s usually a memory bandwidth starvation issue in my experience. Using a GPU may only make matters worse. Note that I consider that approach distinct from just using OpenGL to colormap and render the image as a texture. That approach may bear some fruit – but only for image plots. Vector graphics acceleration with GPUs is still difficult to do in high quality across platforms and chipsets and beat software for speed.

So if I hear you correctly, the Matplotlib/Agg combination is not terribly slower that would be a C plotting lib using Agg as well to render…
and we are talking more about hardware limitations, right?

I’ve seen around previous discussions about OpenGL being a backend in some

future…

would it really stand up compared to the current backends? is there clues

about that right now?

Thanks Nicolas, I’ ll take a closer look at GLnumpy…
I can probably gather some info by making a comparison of an imshow to the equivalent in OGL…

thanks for any inputs! :smiley:

bye

Hope this helps,

it did! thanks
jimmy

···

Mike

Michael Droettboom

Science Software Branch

Operations and Engineering Division

Space Telescope Science Institute

Operated by AURA for NASA

Nicholas,

How do you run a the demo scripts in glumpy?

I get errors both with Ipython run and python script_name.py

In [1]: run demo-simple.py

···

On Wed, Jun 17, 2009 at 9:25 AM, Nicolas Rougier <Nicolas.Rougier@…466…> wrote:

Hello,

To give you some hints on performances using OpenGL, you can have a look

at glumpy: http://www.loria.fr/~rougier/tmp/glumpy.tgz

(It requires pyglet for the OpenGL backend).

It is not yet finished but it is usable. Current version allows to

visualize static numpy float32 array up to 8000x8000 and dynamic numpy

float32 array around 500x500 depending on GPU hardware (dynamic means

that you update image at around 30 fps/second).

The idea behind glumpy is to directly translate a numpy array into a

texture and to use shaders to make the colormap transformation and

filtering (nearest, bilinear or bicubic).

Nicolas


AttributeError Traceback (most recent call last)

/home/gsever/glumpy/demo-simple.py in ()
20 #
21 # -----------------------------------------------------------------------------

—> 22 import glumpy
23 import numpy as np
24 import pyglet, pyglet.gl as gl

/home/gsever/glumpy/glumpy/init.py in ()
23 import colormap

 24 from color import Color

—> 25 from image import Image
26 from trackball import Trackball
27 from app import app, proxy

/home/gsever/glumpy/glumpy/image.py in ()
25

 26

—> 27 class Image(object):
28 ‘’’ ‘’’
29 def init(self, Z, format=None, cmap=colormap.IceAndFire, vmin=None,

/home/gsever/glumpy/glumpy/image.py in Image()

119         return self._cmap
120

→ 121 @cmap.setter
122 def cmap(self, cmap):
123 ‘’’ Colormap to be used to represent the array. ‘’’

AttributeError: ‘property’ object has no attribute ‘setter’

WARNING: Failure executing file: <demo-simple.py>

[gsever@…730… glumpy]$ python demo-cube.py
Traceback (most recent call last):
File “demo-cube.py”, line 22, in

import glumpy

File “/home/gsever/glumpy/glumpy/init.py”, line 25, in
from image import Image
File “/home/gsever/glumpy/glumpy/image.py”, line 27, in

class Image(object):

File “/home/gsever/glumpy/glumpy/image.py”, line 121, in Image
@cmap.setter
AttributeError: ‘property’ object has no attribute ‘setter’

Have Python 2.5.2…

I think the setter method is available in python 2.6 only. I modified
sources and put them at same place. It should be ok now.

Nicolas

···

On Wed, 2009-06-17 at 10:10 -0500, Gökhan SEVER wrote:

On Wed, Jun 17, 2009 at 9:25 AM, Nicolas Rougier > <Nicolas.Rougier@...466...> wrote:
        
        Hello,
        
        To give you some hints on performances using OpenGL, you can
        have a look
        at glumpy: http://www.loria.fr/~rougier/tmp/glumpy.tgz
        (It requires pyglet for the OpenGL backend).
        
        It is not yet finished but it is usable. Current version
        allows to
        visualize static numpy float32 array up to 8000x8000 and
        dynamic numpy
        float32 array around 500x500 depending on GPU hardware
        (dynamic means
        that you update image at around 30 fps/second).
        
        The idea behind glumpy is to directly translate a numpy array
        into a
        texture and to use shaders to make the colormap transformation
        and
        filtering (nearest, bilinear or bicubic).
        
        Nicolas

Nicholas,

How do you run a the demo scripts in glumpy?

I get errors both with Ipython run and python script_name.py

In [1]: run demo-simple.py
---------------------------------------------------------------------------
AttributeError Traceback (most recent call
last)

/home/gsever/glumpy/demo-simple.py in <module>()
     20 #
     21 #
-----------------------------------------------------------------------------
---> 22 import glumpy
     23 import numpy as np
     24 import pyglet, pyglet.gl as gl

/home/gsever/glumpy/glumpy/__init__.py in <module>()
     23 import colormap
     24 from color import Color
---> 25 from image import Image
     26 from trackball import Trackball
     27 from app import app, proxy

/home/gsever/glumpy/glumpy/image.py in <module>()
     25
     26
---> 27 class Image(object):
     28 ''' '''
     29 def __init__(self, Z, format=None,
cmap=colormap.IceAndFire, vmin=None,

/home/gsever/glumpy/glumpy/image.py in Image()
    119 return self._cmap
    120
--> 121 @cmap.setter
    122 def cmap(self, cmap):
    123 ''' Colormap to be used to represent the array. '''

AttributeError: 'property' object has no attribute 'setter'
WARNING: Failure executing file: <demo-simple.py>

[gsever@...730... glumpy]$ python demo-cube.py
Traceback (most recent call last):
  File "demo-cube.py", line 22, in <module>
    import glumpy
  File "/home/gsever/glumpy/glumpy/__init__.py", line 25, in <module>
    from image import Image
  File "/home/gsever/glumpy/glumpy/image.py", line 27, in <module>
    class Image(object):
  File "/home/gsever/glumpy/glumpy/image.py", line 121, in Image
    @cmap.setter
AttributeError: 'property' object has no attribute 'setter'

Have Python 2.5.2...

The demo-animation.py worked beautifully out of the box at 150fps…
I upped a bit the array size to 1200x1200…still around 40fps…

very interesting…

jimmy

2009/6/17 Jimmy Paillet <jimmy.paillet@…149…>

···

2009/6/17 Michael Droettboom <mdroe@…695…1…>

vehemental wrote:

Hello,

I’m using matplotlib for various tasks beautifully…but on some occasions,

I have to visualize large datasets (in the range of 10M data points) (using

imshow or regular plots)…system start to choke a bit at that point…

The first thing I would check is whether your system becomes starved for memory at this point and virtual memory swapping kicks in.

the python process is sitting around a 300Mo of memory comsumption…there should plenty of memory left…

but I will look more closely to what’s happenning…
I would assume the Memory bandwidth to not be very high, given the cheapness of the comp i’ m using :smiley:

A common technique for faster plotting of image data is to downsample it before passing it to matplotlib. Same with line plots – they can be decimated. There is newer/faster path simplification code in SVN trunk that may help with complex line plots (when the path.simplify rcParam is True). I would suggest starting with that as a baseline to see how much performance it already gives over the released version.

yes totally make sense…no need to visualize 3 millions points if you can only display 200 000…
I’m already doing that to some extent, but it’s taking time on its own…but at least I have solutions to reduce this time if needed…

i’ ll try the SVN version…see if I can extract some improvements…

I would like to be consistent somehow and not use different tools for

basically similar tasks…

so I’d like some pointers regarding rendering performance…as I would be

interested to be involved in dev is there is something to be done…

To active developers, what’s the general feel does matplotlib have room to

spare in its rendering performance?..

I’ve spent a lot of time optimizing the Agg backend (which is already one of the fastest software-only approaches out there), and I’m out of obvious ideas. But a fresh set of eyes may find new things. An advantage of Agg that shouldn’t be overlooked is that is works identically everywhere.

or is it pretty tied down to the speed of Agg right now?

Is there something to gain from using the multiprocessing module now

included by default in 2.6?

Probably not. If the work of rendering were to be divided among cores, that would probably be done at the C++ level anyway to see any gains. As it is, the problem with plotting many points generally tends to be limited by memory bandwidth anyway, not processor speed.

or even go as far as using something like pyGPU for fast vectorized

computations…?

Perhaps. But again, the computation isn’t the bottleneck – it’s usually a memory bandwidth starvation issue in my experience. Using a GPU may only make matters worse. Note that I consider that approach distinct from just using OpenGL to colormap and render the image as a texture. That approach may bear some fruit – but only for image plots. Vector graphics acceleration with GPUs is still difficult to do in high quality across platforms and chipsets and beat software for speed.

So if I hear you correctly, the Matplotlib/Agg combination is not terribly slower that would be a C plotting lib using Agg as well to render…
and we are talking more about hardware limitations, right?

I’ve seen around previous discussions about OpenGL being a backend in some

future…

would it really stand up compared to the current backends? is there clues

about that right now?

Thanks Nicolas, I’ ll take a closer look at GLnumpy…
I can probably gather some info by making a comparison of an imshow to the equivalent in OGL…

thanks for any inputs! :smiley:

bye

Hope this helps,

it did! thanks
jimmy

Mike

Michael Droettboom

Science Software Branch

Operations and Engineering Division

Space Telescope Science Institute

Operated by AURA for NASA