collection efficiency improvement

John_Hunter1 · June 15, 2006, 1:55pm

Based on a quick look, I think it would be easy to make

    > LineCollection and PolyCollection accept a numerix array in
    > place of [(x,y), (x,y), ...] for each line segment or
    > polygon; specifically, this could replaced by an N x 2
    > array, where the first column would be x and the second
    > would be y. Backwards compatibility could be maintained
    > easily. This would eliminate quite a bit of useless
    > conversion back and forth among lists, tuples, and arrays.
    > As it is, each sequence of sequences is converted to a pair
    > of arrays in backend_bases, and typically it started out as
    > either a 2-D numerix array or a pair of 1-D arrays in the
    > code that is calling the collection constructor.

I think this is a useful enhancement. I would think that representing
each segment as (x,y) where x and y are 1D arrays, might be slightly
more natural than using an Nx2 but others may disagree.

How often does it come up that we want a homogeneous line collection,
ie a bunch of lines segments with the same properties (color,
linewidth...)? The most expensive part of the agg line collection
renderer is probably the multiple calls to render_scanlines, which is
necessary every time we change the linewidth or color.

If all of the lines in a collection shared the same properties, we
could draw the entire path with a combination of lineto/moveto, and
just stroke and render it once (agg has an upper limit on path length
though, since at some point I added the following to draw_lines

    if ((i%10000)==0) {
      //draw the path in chunks
      _render_lines_path(path, gc);
      path.remove_all();
      path.move_to(thisx, thisy);
    }

Ie I render it every 10000 points.

Actually, as I type this I realize the case of homogeneous lines (and
polys) can be handled by the backend method "draw_path". One
possibility is for the LineCollection to detect the homogeneous case
len(linewidths)==1 and len(colors)==1 and call out to draw_path
instead of draw_line_collection (the same could be done for a regular
poly collection). Some extra extension code would probably be
necessary to build the path efficiently from numerix arrays, and to
handle the "chunking" problem to avoid extra long paths, but for
certain special cases (scatters and quiver w/o color mapping) it would
probably be a big win. The downside is that not all backend implement
draw_paths, but the Collection front-end could detect this and fall
back on the old approach if draw_paths is not implemented.

JDH

_Helge_Avlesen1 · June 15, 2006, 2:28pm

Hi,
for b&w PS publication quality plotting, this must be a common thing to draw;
contour lines, vectors, xy plots, the axes, tick marks, even fonts
can all be constructed from disjoint line segments, no?
if matplotlib could pass numerix arrays more or less directly to gtk it could
perhaps also become the speed king of plotting packages

Helge

···

On 6/15/06, John Hunter <jdhunter@...5...> wrote:

How often does it come up that we want a homogeneous line collection,
ie a bunch of lines segments with the same properties (color,
linewidth...)?

Eric_Firing2 · June 19, 2006, 6:40pm

John Hunter wrote:

"Eric" == Eric Firing <efiring@...229...> writes:

    > Based on a quick look, I think it would be easy to make
    > LineCollection and PolyCollection accept a numerix array in
    > place of [(x,y), (x,y), ...] for each line segment or
    > polygon; specifically, this could replaced by an N x 2
    > array, where the first column would be x and the second
    > would be y. Backwards compatibility could be maintained
    > easily. This would eliminate quite a bit of useless
    > conversion back and forth among lists, tuples, and arrays.
    > As it is, each sequence of sequences is converted to a pair
    > of arrays in backend_bases, and typically it started out as
    > either a 2-D numerix array or a pair of 1-D arrays in the
    > code that is calling the collection constructor.

I think this is a useful enhancement. I would think that representing
each segment as (x,y) where x and y are 1D arrays, might be slightly
more natural than using an Nx2 but others may disagree.

John,

I have been working on this and I can probably commit something in the next few days. I have been pursuing the Nx2 representation for the following reasons:

1) It is highly compatible with the present sequence of tuples, so that the two representations can coexist peacefully:

a = [(1,2), (3,4), (5,6)] # present style
aa = numerix.array(a) # new style

In most places, a and aa work the same with no change to the code. The exception is where code does something like "a.append(b)". This occurs in the contour labelling code. I haven't fixed it yet, but I don't see any fundamental problem in doing so.

2) The Nx2 representation streamlines code because it involves one 2-D object, "XY", in place of two 1-D objects, X and Y. This also eliminates the need to check that the lengths of X and Y match. Logically, X and Y must go together, so why not keep them glued together in a single array?

Because of the compatibility, there is very little code that actually has to be changed to support the numerix array. There is a potential for breakage of user code, however. This is a concern. I don't know of any way of eliminating it entirely while retaining the efficiency benefits of using numerix arrays when possible. One thing that might help is to have the transform seq_xy_tups method handle both input forms, and return the form corresponding to the input. I can do this; I now have a transform method that handles both "a" and "aa", but presently it returns a numerix array in either case.

The optimization you describe below sounds good, but I want to finish stage 1, above, first.

Eric

···

How often does it come up that we want a homogeneous line collection,
ie a bunch of lines segments with the same properties (color,
linewidth...)? The most expensive part of the agg line collection
renderer is probably the multiple calls to render_scanlines, which is
necessary every time we change the linewidth or color.

If all of the lines in a collection shared the same properties, we
could draw the entire path with a combination of lineto/moveto, and
just stroke and render it once (agg has an upper limit on path length
though, since at some point I added the following to draw_lines

    if ((i%10000)==0) {
      //draw the path in chunks
      _render_lines_path(path, gc);
      path.remove_all();
      path.move_to(thisx, thisy);
    }

Ie I render it every 10000 points.

Actually, as I type this I realize the case of homogeneous lines (and
polys) can be handled by the backend method "draw_path". One
possibility is for the LineCollection to detect the homogeneous case
len(linewidths)==1 and len(colors)==1 and call out to draw_path
instead of draw_line_collection (the same could be done for a regular
poly collection). Some extra extension code would probably be
necessary to build the path efficiently from numerix arrays, and to
handle the "chunking" problem to avoid extra long paths, but for
certain special cases (scatters and quiver w/o color mapping) it would
probably be a big win. The downside is that not all backend implement
draw_paths, but the Collection front-end could detect this and fall
back on the old approach if draw_paths is not implemented.

JDH