= Data copying =
Push the data to the backend only once, or only when required. Update
the transforms in the backend, but do not push transformed data on
every draw. This is potentially a major win, because we currently
move the data around on every draw.
Does the backend keep a copy of the untransformed data around, so that it can easily create new transformed data when its transform is updated? If so, is there a coherent mechanism for invalidating a piece of data that is being graphed in multiple plots? If not, then how does hittesting determine the correct index into the data (since, presumably, hittesting will require the exact transform in the backend)?
= Transformations =
Support a normal transformation architecture. The current draft
implementation assumes one nonlinear transformation, which happens at
a high layer, and all transformations after that are affines. In the
mpl1 draft, there are three affines: the transformation from view
limits -> axes units (AxesCoords.affineview), the transformation from
axes units to normalized figure units (AxesCoords.affineaxes), and the
transformation from normalized figure units to display
(Renderer.affinerenderer)
Do we want to use 3x3 or 4x4 to leave the door open for 3D developers?
I admit the temptation of having basic 3D support, but the problem is that it really doesn't scale well in software. Even the simple blits that we do in Chaco start to hit their limits on big, high-res LCDs that are getting cheaper every day. The approach that I think we're going to have to take in Chaco is to only let 3D be available when using the OpenGL backend, and to restrict the Agg-based backends to be 2D only.
Of course, I'm thinking about all this from an interactive standpoint, so if speed is not a concern, then there's no reason not to build in 3D support from the get-go.
How do transformations (linear and nonlinear) play with Axis features
(ticking and gridding). The ideal is a framework in which ticking,
gridding and labeling work intelligently with arbitrary, user
supplied, transformations. What is the proper transformation API?
This is something we've been puzzling over for Chaco as well. Dave Kammeyer pointed out long ago that the problem with trying to write a generic axis/grid renderer while supporting arbitrary transformations is that straight lines become curves under arbitrary transforms. The basic idea is that the backend (or GraphicsContext, in the Chaco world) needs to provide transformation-aware implementations of line_to() that automatically convert line segments into bezier curves while at the same time providing drawing methods that are guaranteed to be "straight" or aligned with screen coordinates. This way, you can get curved axes in a hyperbolic space "for free", while your ticks stay perfectly straight and the label text is screen-aligned. (Of course, to be perfectly accurate, you would need to handle polar coordinates in a special way anyway.)
= Objects that talk to the backend "primitives" =
Have just a few, fairly rich obects, that the backends need to
understand. Clear candidates are a Path, Text and Image, but despite
their names, don't confuse these with the eponymous matplotlib
matplotlib Artists, which are higher level than what I'm thinking of
here (eg matplotlib.text.Text does *a lot* of layout, and this would
be offloaded ot the backend in this conception of the Text primitive).
Each of these will carry their metadata, eg a path will carry its
stroke color, facecolor, linewidth, etc..., and Text will carry its
font size, color, etc.... We may need some optimizations down the
road, but we should start small. For now, let's call these objects
"primitives".
= How much of an intermediate artist layer do we need? =
Do we want to create high level objects like Circle, Rectangle and
Line, each of which manage a Path object under the hood? Probably,
for user convenience and general compability with matplotlib. By
using traits properly here, many current matplotlib Arists will be
thin interfaces around one or more primitives.
I included these two together because I think they both concern a very fundamental matter, which is the drawing model. If you are going to create higher-level "primitives" which encapsulate state (e.g. color, dash style, line width), then you are moving significantly away from the model of a Canvas as just a place to dump pixels/vector drawing commands, and more towards the model of Canvas as a container of stateful objects. But as soon as you do this, a whole host of questions pop up... Are the Circle, Rectangle, etc. in the "intermediate artist layer" objects in their own right, with parameters like 'radius', 'position', etc., or are they just convenience functions to create more low-level primitives on the Canvas? If the former, then you have suddenly have a hierarchical Canvas. If the latter, then is there any structure to how these primitives are held in the Canvas? Even if they are just held in a simple list, are they drawn in the same order as they appear in that list? If they were inserted by a single Circle or Rectangle intermediate artist, is there any way to ensure that they maintain coherency when re-ordering that draw order?
Also, if you have a list of these primitives, it seems natural to hittest against them for picking and interaction. Does this also happen in the same order that they appear in the list? Is there a straightforward way to make them process events in a different order? If you have lots of these little primitives, are there optimizations you can design in so that you don't have to hittest thousands of little primitives on each mouse_move event?
= Where do the plot functions live? =
In matplotlib, the plot functions are matplotlib.axes.Axes methods and
I think there is consensus that this is a poor design. Where should
these live, what should they create, etc?
Well, you can probably guess my answer to this question. It seems to me that if you're going to have a drawing model that supports stateful graphics on a canvas, plot renderers should just be glorified graphics that live on the canvas, no different from a Circle or a Rectangle or whatnot. In this case, the plot functions then just become convenience functions that create these graphics and stick them on a canvas.
I think the whole matplotlib.collections module is poorly designed,
and should be chucked wholesale, in favor of faster, more elegant,
optimizations and special cases. Just having the right Path object
will reduce the need for many of these, eg LineCollection,
PolygonCollection, etc... Also, everything should be numpy enabled,
and the sequence-of-python-tuples approach that many of the
collections take should be dropped. Obviously some of the more useful
things there, like quad meshes, need to be ported and retained.
In Chaco we can get interactive speeds just by having a few fast drawing calls at the Kiva layer: lines(), line_set(), draw_marker_at_points(). These were quite easy to implement in both the Agg and Quartz backends. We've talked about introducing another set of drawing commands that allow passing in a "style index" with every point, so that we can speed up our colormapped scatter plots and eventually do colormapped lines and such.
= Z-ordering, containers, etc =
Peter has been doing a lot of nice work on z-order and layers for
chaco, stuff that looks really useful for picking, interaction, etc...
We should look at this approach, and think carefully about how this
should be handled. Paul may be a good candidate for this, since he
has been working recently on the picking API.
I think that you should really consider integrating the event propagation model with the drawing model. The Chaco model of "containers of graphical components" is pretty straightforward and even though we've implemented it in pure python, it is responsive enough for interactivity. The nice thing about it is that there's nothing in the container/component model that is intrinsically related to plotting, so you can use it to build simple widgets that play nicely with the rest of your plot because they use the same event propagation and component drawing model. I can put together some more thorough documentation on all this, if folks are interested.
I also plan to use the SWIG
agg wrapper, so this gets rid of _backend_agg. If we can enhance the
SWIG agg wrapper, we can also do images through there, getting rid of
_image.cpp. Having a fully featured, python-exposed agg wrapper will
be a plus in mpl and beyond. But with the agg license change, I'm
open to discussion of other approaches.
How exactly are you guys wrapping Agg? I guess I need to take a look at that stuff in more detail... Kiva has been fairly stable, even though we don't do much maintenance on it, and the DisplayPDF drawing model has worked out fairly well. After Robert put together the Quartz backend for it, we can nicely verify that our Agg-based implementation of DisplayPDF is fairly good, since our plots render the same on Windows and Mac. If we had just put in a little more effort on optimization for Linux and cleaning up some outdated cruft, I think it would be in really good shape. Additionally, Phil Thompson is going to be working on porting Kiva and Enable to Qt. Kiva's Agg backend is based on Agg 2.4, which is still BSD.
The major missing piece in ft2font, which is a pretty elaborate CXX
module. Michael may want to consider alternatives, including looking
at the agg support for freetype, and the kiva/chaco approach.
Unfortunately, Chaco's font handling isn't anything to write home about.. I think the world is crying out for a nice Python library for font lookup and font metrics.
= Traits =
I think we should make a major committment to traits and use them from
the ground up. Even without the UI stuff, they add plenty to make
them worthwhile, especially the validation and notification features.
With the UI (wx only) , they are a major win for many GUI developers.
Compare the logic for sharing an x-axis using matplotlib transforms
with Axes.sharex with the approach used in mpl1.py with sync_trait-ed
affines.
Once you start using trait events and notifications extensively, you won't want to go back. It encourages a very componentized model of development that is both a world apart from normal OOP while at the same time feeling very natural.
= Axis handling =
The whole concept of the Axes object needs to be rethought, in light
of the fact that we need to support multiple axis objects on one Axes.
The matplotlib implementation assumes 1 xaxis and 1 yaxis per Axes,
and we hack two y-axis support (examples/two_scales.py) with some
transform shenanigans via twinx and multiple Axes where one is hidden,
but the approach is not scalable and is unwieldy.
This will require a fair amount of thought, but we should aim for
supporting an arbitrary number of axis obects, presumably associated
with individual artists or primitives.
...
The other important featiure for axis support is that, for the most
part, they should be arbitrarily placeable (eg a "detached" axis).
I think you should consider separating the two concerns that are being overloaded onto the Axis object: (1) an axis represents a range in data space that controls the transforms/mappings between data and screen space, and (2) an axis is a visual component that needs to be rendered at a particular place on the screen and receives events from the user (e.g. double-clicking to set its parameters).
If you create a separate DataRange object, then you can use it to drive one or more Transforms as well as multiple Axis objects. This is basically how Chaco gets synchronized axes for "free". The actual graphical Axis objects can render themselves however they want to, and their actual layout on the screen (on opposite sides of a plot, piled up in a stack on the left or right, etc.) is determined by a layout mechanism that is completely orthogonal to the issues of mapping. This also allows for "detached" axes and such.
= Chaco and Kiva =
It is a good idea for an enterprising developer to take a careful look
at the current Chaco and Kiva to see if we can further integrate with
them. I am gun shy because they seem formiddable and complex, and one
of my major goals here is to streamline and simplify, but they are
incredible pieces of work and we need to carefully consider them,
especially as we integrate other parts of the enthought suite into our
core, eg traits, increasing the possibility of synergies.
I really glad to read this, because I think there are clearly a lot of common problems that we all have to solve. At its core, Chaco is not *that* complex - it's just rather poorly documented, and that is no one's fault but mine. The structure, however, is really pretty straightforward. Its container/component model is not much more complicated than what a minimal solution to some of the problems I've outlined in previous paragraphs would entail.
I guess the key question I would ask is this: What is the vision, or driving purpose, behind mpl1? Is it to develop a better backend architecture for pylab, or something more? I ask this because some of the designs you have proposed for various pieces of mpl1 look very much like they are trying to solve the same problems that we're trying to solve in Chaco; if you really are quite prepared to "break the hell out of matplotlib", I think that now would be a really good time for collaboration.
-Peter
···
On Jul 19, 2007, at 12:18 PM, John Hunter wrote: