I made a first (and second) attempt at implementing
> draw_markers and draw_lines in the postscript backend. The
> changes are in CVS, although I left draw_markers masked as
> _draw_markers, it needs to be unmasked if you want to try
> it out.
Hey Darren, thanks for working on this.
> I found some places for speed/memory/ps-filesize
> improvements. With draw_markers masked, the script below
> took 2.43 seconds to generate and write the 1.5MB eps
> file. With draw_markers unmasked, it took 0.69 seconds to
> make a 350KB eps file.
A good start. You'll might be able to get this number down a bit
more, which I discuss below.
> 1) Circles are being drawn with draw_markers, but
> agg.path_storage has no curve information in it? Circles
> are faithfully reproduced in ps output, but it takes 50
> line segments to draw each circle in
> plot(arange(10000),'-o').
This is a wart slated for destruction. We plan to replace circles and
ellipses with splines rather than vertices. Just hasn't been done
yet.
> 2) I think each tickmark is listed in agg.path_storage
> twice, and therefore gets rendered twice in PS.
Why do you think this? Which ticks?
> 3) I expected marker paths to be terminated with the
> agg.path_cmd_end_poly code. This is not the case. What is
> the purpose of path_cmd_end_poly?
Only marker paths that are polygons have end poly (eg draw_circle). A
lot of the paths (eg tick marks) are not polygons and so don't have an
end_poly code.
> 4) I am getting an unrecognized agg.path_commands_e
> code. They should be one of 0,1,2,3,4,6,0x0F, and I am
> getting a value of 70. ?? I just ignore it and PS seems to
> render fine.
I had to track this one down myself. lines.py calls
path.end_poly()
agg_path_storage::end_poly calls
add_vertex(0.0, 0.0, path_cmd_end_poly | flags);
where flags is agg_basics path_flags_e::path_flags_close = 0x40. You
can test for end poly using the agg module with
>>> 0x40 | 6
70
>>> from matplotlib.agg import path_storage, is_end_poly
>>> is_end_poly(71)
False
>>> is_end_poly(70)
True
> 5) Im not doing anything with vec6 =
> transform.as_vec6_val(). I'm not sure what it is used for.
This is in case you want to do the affine transformation yourself.
The transform is a nonlinear part plus an affine. Note that
backend_ps is currently doing
if transform.need_nonlinear():
x,y = transform.nonlinear_only_numerix(x, y)
x, y = transform.numerix_x_y(x, y)
which is wrong -- it will fail for nonlinear transforms like log
because the numerix_x_y call does the nonlinear and the affine part
and so you will be doing the nonlinear part twice. The motivation for
separating out the nonlinear and affine parts was to let the backend
machinery do the affine part (in the great majority of cases, the
transforms are pure affine anyway). So you might want to do
if transform.need_nonlinear():
x,y = transform.nonlinear_only_numerix(x, y)
vec6 = transform.as_vec6_val()
and then set the current ps affine to vec6.
> 6) draw_lines is getting a long pathlist from agg. Rather
> than draw a straight line between two points, it is doing
> something like
> 50.106 249.850 moveto 53.826 249.850 lineto 57.546 249.850
> lineto 61.266 249.850 lineto
> and thats just for the line in the legend! The straight
> line in the actual plot has many, many intermediate
> points.
That is not surprising. matplotlib plots what you give it. If you
specify a straight line of 10000 points as you did in your example
plot(arange(10000),'-s')
matplotlib will plot all 10000 vertices of the line. It's incumbent
on the user not to pass in redundant data.
Now, onto the subject of how you might be able to make this faster.
One of the primary motivations of draw_markers is that you should only
have to set the graphics context state once. In the current
implementation, we have
while start < len(x):
to_draw = izip(x[start:end],y[start:end])
ps = ['%1.3f %1.3f marker' % point for point in to_draw]
self._draw_ps("\n".join(ps), gc, None)
start = end
end += 1000
and _draw_ps sets the gc state. Now this isn't really a huge deal,
since you are chunking the data in 1000 length buckets. But for very
large data sets (500k markers) it will result in 500 superfluous calls
to set the gc state. It might be worth implementing a push_gc method
that sets the current gc state, and then calling this at the top of
draw_markers and not inside the loop. We'll probably want to
implement this as a default gc method across backends anyway in the
near term, so it would be a worthwhile change.
Hope this helps, thanks again.
JDH