Is Gtk draw() slow?

Hi,

I don't have conclusive proof, .. but I suspect that the draw() of a graph in a pyGTK application is order of magnitudes slower than I can plot the same data in the default Tk graphing widget.

i.e. 5 sec in tk, ... and >1 minute in gtk

Obvious question, ... is this a know issue?
Is there any tricks in gtk to speed up the draw()

Attached is a heavily snipped example of bits and pieces of my code, maybe I have structured my program incorrectly?

Thanks for any comments
Steve

SnippedPlotExample.py (5.08 KB)

···

--
NO to the Microsoft Office format as an ISO standard
http://www.noooxml.org/petition

In my own tests, using the built-in GUI windows I get the following numbers on the simple_plot_fps.py speed test (which essentially tests redrawing speed, which is pretty GUI-backend dependent, as opposed to the first drawing operation which involves more common code):

GtkAgg:
wallclock: 3.73636507988
user: 2.9
fps: 26.7639799276

Gtk:
wallclock: 1.99883008003
user: 1.99
fps: 50.0292651181

TkAgg:
wallclock: 4.55140709877
user: 4.41
fps: 21.9712273216

So you can see that GtkAgg is actually slightly faster than TkAgg, and Gtk (if you can accept the lower rendering quality), is almost 2x as fast.

So, if you're certain the same amount of data is being plotted in the default Tk window and your custom Gtk window, it seems to suggest that the slowdown is probably something in how you're embedding it. (Obviously the number of data points as a significant impact on speed regardless of backend.) I'm not enough of a Gtk expert that anything in what you're doing jumps out at me. What triggers the call to "plotFrictionProfile?" Is that possible that is getting called more times than you expect?

Cheers,
Mike

steve george wrote:

···

Hi,

I don't have conclusive proof, .. but I suspect that the draw() of a graph in a pyGTK application is order of magnitudes slower than I can plot the same data in the default Tk graphing widget.

i.e. 5 sec in tk, ... and >1 minute in gtk

Obvious question, ... is this a know issue?
Is there any tricks in gtk to speed up the draw()

Attached is a heavily snipped example of bits and pieces of my code, maybe I have structured my program incorrectly?

Thanks for any comments
Steve

------------------------------------------------------------------------

-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace

------------------------------------------------------------------------

_______________________________________________
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
matplotlib-users List Signup and Options

--
Michael Droettboom
Science Software Branch
Operations and Engineering Division
Space Telescope Science Institute
Operated by AURA for NASA

Ah -- just thought of something else.

If I adjust simple_plot_fps.py to have 100,000 data points rather than 1,000 I see something that starts to match with what you're seeing:

GtkAgg:
wallclock: 4.23297405243
user: 3.33
fps: 23.6240522057

Gtk:
wallclock: 15.0203828812
user: 14.92
fps: 6.65761990165

TkAgg:
wallclock: 4.8252530098
user: 4.67
fps: 20.7243018754

You can see that the Gtk time is starting to explode. If I go to 1,000,000 points, Gtk runs out of memory before the first plot, whereas the other two continue to chug along at a reasonable pace.

From looking at the code, I suspect the crucial difference is that the Gdk backend uses the Python sequence API (rather slow) to access the data as it gets rendered, whereas GtkAgg uses the numpy array interface which is essentially raw access to a C array.

So -- try using GtkAgg if you can get away with it. The only real advantage of the raw Gtk (Gdk specifically) backend is when running over a remote X connection. If that's not an option for you, I don't have any easy solution that comes to mind. It's sort of a pygtk issue -- it would have to be rewritten to take numpy arrays which is probably unlikely to happen in the official codebase. Matplotlib has a little more control over what happens in the Agg backend, since the Python wrapper is included in matplotlib.

Hope that information helps.

Cheers,
Mike

Michael Droettboom wrote:

···

In my own tests, using the built-in GUI windows I get the following numbers on the simple_plot_fps.py speed test (which essentially tests redrawing speed, which is pretty GUI-backend dependent, as opposed to the first drawing operation which involves more common code):

GtkAgg:
wallclock: 3.73636507988
user: 2.9
fps: 26.7639799276

Gtk:
wallclock: 1.99883008003
user: 1.99
fps: 50.0292651181

TkAgg:
wallclock: 4.55140709877
user: 4.41
fps: 21.9712273216

So you can see that GtkAgg is actually slightly faster than TkAgg, and Gtk (if you can accept the lower rendering quality), is almost 2x as fast.

So, if you're certain the same amount of data is being plotted in the default Tk window and your custom Gtk window, it seems to suggest that the slowdown is probably something in how you're embedding it. (Obviously the number of data points as a significant impact on speed regardless of backend.) I'm not enough of a Gtk expert that anything in what you're doing jumps out at me. What triggers the call to "plotFrictionProfile?" Is that possible that is getting called more times than you expect?

Cheers,
Mike

steve george wrote:

Hi,

I don't have conclusive proof, .. but I suspect that the draw() of a graph in a pyGTK application is order of magnitudes slower than I can plot the same data in the default Tk graphing widget.

i.e. 5 sec in tk, ... and >1 minute in gtk

Obvious question, ... is this a know issue?
Is there any tricks in gtk to speed up the draw()

Attached is a heavily snipped example of bits and pieces of my code, maybe I have structured my program incorrectly?

Thanks for any comments
Steve

------------------------------------------------------------------------

-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace

------------------------------------------------------------------------

_______________________________________________
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
matplotlib-users List Signup and Options

--
Michael Droettboom
Science Software Branch
Operations and Engineering Division
Space Telescope Science Institute
Operated by AURA for NASA

Michael Droettboom wrote:

It's sort of a pygtk issue -- it would have to be rewritten to take numpy arrays

not quite -- it would have to be re-written to use the array interface, which is different, as that can be done without requiring numpy, or its headers.

> which is probably

unlikely to happen in the official codebase.

That was true before the array interface, when supporting arrays essentially meant a dependency on numpy. That's not longer true, so it's quite likely that the pygtk folks would accept a patch -- someone still would need to write that patch, though!

-Chris

···

--
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception

Chris.Barker@...259...

This is not likely to be the culprit -- for drawing markers, the old
matplotlib API made a separate call to draw_polygon for every marker,
with a new gc each time. Many moons ago, we implemented draw_markers
as a renderer method to avoid this problem. For hundreds of thousands
of markers, we saw performance benefits of 25x to 100x. The backends
which implement draw_markers (Agg and PS) get the benefits, but the
other backends which did not are still slow. Basically it is a problem
with a lot of redundant function call overhead. The backend_bases
renderer method _draw_markers discusses this a little bit (it is
underscore hidden).

My guess is this difference will not be so pronounced on the trunk.

JDH

···

On Jan 15, 2008 7:46 AM, Michael Droettboom <mdroe@...86...> wrote:

Ah -- just thought of something else.

If I adjust simple_plot_fps.py to have 100,000 data points rather than
1,000 I see something that starts to match with what you're seeing:

GtkAgg:
wallclock: 4.23297405243
user: 3.33
fps: 23.6240522057

Gtk:
wallclock: 15.0203828812
user: 14.92
fps: 6.65761990165

TkAgg:
wallclock: 4.8252530098
user: 4.67
fps: 20.7243018754

You can see that the Gtk time is starting to explode. If I go to
1,000,000 points, Gtk runs out of memory before the first plot, whereas
the other two continue to chug along at a reasonable pace.

From looking at the code, I suspect the crucial difference is that the
Gdk backend uses the Python sequence API (rather slow) to access the
data as it gets rendered, whereas GtkAgg uses the numpy array interface
which is essentially raw access to a C array.

Christopher Barker wrote:

Michael Droettboom wrote:

It's sort of a pygtk issue -- it would have to be rewritten to take numpy arrays

not quite -- it would have to be re-written to use the array interface, which is different, as that can be done without requiring numpy, or its headers.

Of course, that's what I meant. It is passed numpy arrays now -- but they are accessed with all of the function call overhead of the Python sequence API, rather than the numpy array interface.

> which is probably

unlikely to happen in the official codebase.

That was true before the array interface, when supporting arrays essentially meant a dependency on numpy. That's not longer true, so it's quite likely that the pygtk folks would accept a patch -- someone still would need to write that patch, though!

Unless I misunderstand, I thought that functionality was slated for inclusion in Python 3.0 -- still a long ways off in terms of adoption rate. That patch would only make sense on a pygtk branch specifically intended for Python 3.0.

Cheers,
Mike

···

--
Michael Droettboom
Science Software Branch
Operations and Engineering Division
Space Telescope Science Institute
Operated by AURA for NASA

John Hunter wrote:

Ah -- just thought of something else.

If I adjust simple_plot_fps.py to have 100,000 data points rather than
1,000 I see something that starts to match with what you're seeing:

GtkAgg:
wallclock: 4.23297405243
user: 3.33
fps: 23.6240522057

Gtk:
wallclock: 15.0203828812
user: 14.92
fps: 6.65761990165

TkAgg:
wallclock: 4.8252530098
user: 4.67
fps: 20.7243018754

You can see that the Gtk time is starting to explode. If I go to
1,000,000 points, Gtk runs out of memory before the first plot, whereas
the other two continue to chug along at a reasonable pace.

From looking at the code, I suspect the crucial difference is that the
Gdk backend uses the Python sequence API (rather slow) to access the
data as it gets rendered, whereas GtkAgg uses the numpy array interface
which is essentially raw access to a C array.

This is not likely to be the culprit -- for drawing markers, the old
matplotlib API made a separate call to draw_polygon for every marker,
with a new gc each time. Many moons ago, we implemented draw_markers
as a renderer method to avoid this problem. For hundreds of thousands
of markers, we saw performance benefits of 25x to 100x. The backends
which implement draw_markers (Agg and PS) get the benefits, but the
other backends which did not are still slow. Basically it is a problem
with a lot of redundant function call overhead. The backend_bases
renderer method _draw_markers discusses this a little bit (it is
underscore hidden).

Markers are not the issue here. These benchmarks were done with lines. There are markers for the ticks, of course, but the number of those are fixed. I agree it's function call overhead, but I believe it's in the overhead of PySequence_GetItem vs. array[index]. In both cases, the line is still getting drawn with a single Python -> C function call.

My guess is this difference will not be so pronounced on the trunk.

Actually, I'm getting surprising results there. Numbers are in fps.

        Gtk GtkAgg
0.91.2, 1000 points 50 26
0.91.2, 10000 points 6 23
trunk, 1000 points 38 31
trunk, 10000 points 3 9

So, yes, the ratio between Gtk and GtkAgg on the trunk is not as pronounced. I'm a little disappointed by the timings on the trunk -- while one could say that Agg is a little better on the trunk with 1000 points, it doesn't scale nearly as well. That's certainly something to look into -- and I don't have any thoughts offhand. I would expect the trunk to do better since it doesn't perform a memory copy on the data with each call to draw_line/draw_path.

Cheers,
Mike

···

On Jan 15, 2008 7:46 AM, Michael Droettboom <mdroe@...86...> wrote:

--
Michael Droettboom
Science Software Branch
Operations and Engineering Division
Space Telescope Science Institute
Operated by AURA for NASA

Michael Droettboom wrote:

not quite -- it would have to be re-written to use the array interface, which is different, as that can be done without requiring numpy, or its headers.

Of course, that's what I meant. It is passed numpy arrays now -- but they are accessed with all of the function call overhead of the Python sequence API, rather than the numpy array interface.

right, which is slow, slower than lists. In fact, it might make sense for MPL to convert to a list of tuples (using numpy) first, then pass that to pyGTK:

l = array.tolist()

It might speed things up a bit -- it did with wxPython a while back.

That was true before the array interface, when supporting arrays essentially meant a dependency on numpy. That's not longer true, so it's quite likely that the pygtk folks would accept a patch -- someone still would need to write that patch, though!

Unless I misunderstand, I thought that functionality was slated for inclusion in Python 3.0 -- still a long ways off in terms of adoption rate. That patch would only make sense on a pygtk branch specifically intended for Python 3.0.

The array interface is slated for be built-in for Py3k, but you can use it in the meantime. You just need to make sure your code understands it. PIL, for instance, uses it for its fromarray() and toarray() methods. Travis O. and others on the numpy list have been very helpful to folks trying to add it to their packages.

-Chris

···

--
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception

Chris.Barker@...259...