Large datasets performance....

Hi,

On this subject, one program that has pretty impressive interactive
visualisation is the venerable snd
(http://ccrma.stanford.edu/software/snd/). It displays hours of audio
in a flash and allows you pan and zoom the signal without a hitch. It
only plots an envelope of the audio signal at first, and shows more
and more detail as you zoom in.

Jimmy's comment that there's no need to visualize 3 million points if
you can only display 200 000 is even more true for time signals, where
you can typically only display 1000 to 2000 samples (i.e. the number
of horizontal pixels).

Does the new path simplification code use a similar approach to snd?
I've always wanted something like that in matplotlib... :slight_smile:

Regards,
Ludwig

Ludwig Schwardt wrote:

Does the new path simplification code use a similar approach to snd?
I've always wanted something like that in matplotlib... :slight_smile:

Not knowing the details of what snd is doing, I would say "probably". The general idea is to remove points on-the-fly that do not change the appearance of the plot at the given resolution. Spending the time to do this at the front speeds up the path stroking immensely as it has fewer vertices and therefore fewer self-intersections to compute. I suspect what matplotlib is doing is a little more general, and therefore not quite as efficient as snd, because it can't assume a 1-dimensional time series.

To give credit where it is due, the path simplification was originally written by Allan Haldane and has been in matplotlib for some time. The recent work has been to fix some bugs when dealing with some degenerate cases, to improve its performance, greatly improve the clipping algorithm and allow the tolerance to be user-configurable.

Mike

ยทยทยท

--
Michael Droettboom
Science Software Branch
Operations and Engineering Division
Space Telescope Science Institute
Operated by AURA for NASA