In dealing with the profiler output from some of Fernando's log plots,
I was reminded of the very inefficient way matplotlib handles marker
plots -- see my last post "log scaling fixes in backend_ps.py" for
details. Some of these problems were fixed for scatter plots using
collections, but line markers remained inefficient.
On top of this inefficiency, there have been three lingering problems
with backend design that have bothered me. 1) No path operations
(MOVETO, LINETO, etc), 2) transforms are being done in the front end
which is inefficient (some backends have transformations for free, eg
postscript) and can lead to plotting artifacts (eg Agg, which has a
concept of subpixel rendering), and 3) backends have no concept of
state or a gc stack, which can lead to lots of receptive code and
needless function calls.
I've begin to address some of these concerns with a new backend method
"draw_markers". Currently the backend has too many drawing methods,
and this is yet another one. My goal is to define a core set, many
fewer than we have today, and do away with most of them. Eg
draw_pixel, draw_line, draw_point, draw_rectangle, draw_polygon, can
all be replaced by draw_path, with paths comprised solely of MOVETO,
LINETO and (optionally) ENDPOLY.
Which leads me to question one for the backend maintainers: can you
support a draw_path method? I'm not sure that GTK and WX can. I have
no idea about FLTK, and QT, but both of these are Agg backends so it
doesn't matter. All the Agg backends automagically get these for
free. I personally would be willing to lose the native GTK and WX
backends.
I've implemented draw_markers for agg in CVS. lines.py tests for this
on the renderer so it doesn't break any backend code right now.
Basically if you implement draw_markers, it calls it, otherwise it
does what it always did. This leads to complicated code in lines.py
that I hope to flush when the port to the other backends is complete.
draw_markers is the start of fixing backend problems 1 and 2 above.
Also, I will extend the basic path operations to support splines,
which will allow for more sophisticated drawing, and better
representations of circles -- eg Knuth uses splines to draw circles in
TeX) which are a very close approximation to real circles.
I'm not putting this in backend_bases yet since I'm using the presence
of the method as the test for whether a backend is ported yet in
lines.py
def draw_markers(self, gc, path, x, y, transform):
path is a list of path elements -- see matplotlib.paths. Right now
path is only a data structure, which suffices for my simple needs
right now, but we can provide user friendly methods to facilitate the
generation of these lists down the road.
The coordinates of the "path" argument in draw_markers are display (eg
points for PS) and are simply points*dpi (this could be generalized if
need be with its own transform, but I don't see a need right now --
markers in matplotlib are by definition in points). x and y are in
data coordinates, and transform is a matplotlib.transform
Transformation instance. There are a few types of transformations
(separable, nonseparable and affine) but all three have a consistent
interface -- there is an (optional) nonlinear component, eg log or
polar -- and all provide an affine vec 6. Thus the transformation can
be done like
if transform.need_nonlinear():
x,y = transform.nonlinear_only_numerix(x, y)
# the a,b,c,d,tx,ty affine which transforms x and y
vec6 = transform.as_vec6_val()
# apply an affine transformation of x and y
This setup buys us a few things -- for large data sets, it can save
the cost of doing the transformation for backends that have
transformations built in (eg ps, when the transformation happens at
rendering). For agg, it saves the number of passes through the data
since the transformation happens on the rendering loop, which it has
to make anyway. It also allows agg to try/except the nonlinear
transformation part, and drop data points which throw a domain_error
(nonpositive log). This means you can toggle log/linear axes with the
'l' command and won't raise even if you have nonpositive data on the
log axes.
Most importantly it buys you speed, since the graphics context is
marker path one need to be set once, outside the loop, and then you
can iterate over the x,y position vertices and draw that marker at
each position. This results in a 10x performance boost for large
numbers of markers in agg
Old
N=001000: 0.24 s
N=005000: 0.81 s
N=010000: 1.30 s
N=050000: 5.97 s
N=100000: 11.46 s
N=500000: 56.87 s
New:
N=001000: 0.13 s
N=005000: 0.19 s
N=010000: 0.28 s
N=050000: 0.66 s
N=100000: 1.04 s
N=500000: 4.51 s
agg implements this in extension code, which might be harder for
backend writers to follow as an example. So I wrote a template in
backend ps, which I named _draw_markers -- the underscore prevents it
from actually being called by lines.py. It is basically there to show
other backend writers how to iterate over the data structures and use
the transform
def _draw_markers(self, gc, path, x, y, transform):
"""
I'm underscore hiding this method from lines.py right now
since it is incomplete
Draw the markers defined by path at each of the positions in x
and y. path coordinates are points, x and y coords will be
transformed by the transform
"""
if debugPS:
self._pswriter.write("% markers\n")
if transform.need_nonlinear():
x,y = transform.nonlinear_only_numerix(x, y)
# the a,b,c,d,tx,ty affine which transforms x and y
vec6 = transform.as_vec6_val()
# this defines a single vertex. We need to define this as ps
# function, properly stroked and filled with linewidth etc,
# and then simply iterate over the x and y and call this
# function at each position. Eg, this is the path that is
# relative to each x and y offset.
ps = []
for p in path:
code = p[0]
if code==MOVETO:
mx, my = p[1:]
ps.append('%1.3f %1.3f m')
elif code==LINETO:
mx, my = p[1:]
ps.append('%1.3f %1.3f l')
elif code==ENDPOLY:
fill = p[1]
if fill: # we can get the fill color here
rgba = p[2:]
vertfunc = 'some magic ps function that draws the marker relative to an x,y point'
# the gc contains the stroke width and color as always
for i in xrange(len(x)):
# for large numbers of markers you may need to chunk the
# output, eg dump the ps in 1000 marker batches
thisx = x[i]
thisy = y[i]
# apply affine transform x and y to define marker center
#draw_marker_here
print 'I did nothing!'
For PS specifically, ideally we would define a native postscript
function for the path, and call this function for each vertex. Can
you insert PS functions at arbitrary points in PS code, or do they
have to reside in the header? If the former, we may want to buffer
the input with stringio to save the functions we need, since we don't
know until runtime which functions we'll be defining.
OK, give it a whirl. Nothing is set in stone so feel free to comment
on the design. I think we could get away with just a few backend
methods:
# draw_lines could be done with a path but we may want to special
# case this for maximal performance
draw_lines
draw_markers
draw_path
... and I'll probably leave the collection methods...
Ted Drain mentioned wanting draw_ellipse for high resolution ellipse
drawing (eg or using discrete vertices). I'm not opposed to it, but I
wonder if the spline method of drawing ellipses referred to above
might not suffice here. In which case draw_ellipse would be subsumed
under draw_path.
Although what I've done is incomplete, I thought it might be better to
get something in CVS to give other backend writers time to implement
it, and to get some feedback before finishing the refactor.
Also, any feedback on the idea of removing GD, native GTK and native
WX are welcome. I'll bounce this off the user list in any case.
JDH