Reconfiguring transforms

> So, I feel like I'm going in a bit of a circle here, and I might need a
> reality check. I thought I'd better check in and see where you guys
> (who've thought about this a lot longer than I have) see this going. A
> statement of objectives of this part of the task would be helpful.
> (e.g. what's the biggest problem with how transforms work now, and what
> model would be a better fit). John, I know you've mentioned some to me
> before, e.g. the LazyValue concept is quirky and relies on C and the PDF
> stateful transforms model is close, but not quite what we need, etc. I
> feel I have a better sense of the overall code structure now, but you
> guys may have a better "gut" sense of what will fit best.

Here is a brief summary of what I see some of the problems to be with
the existing approach to transformations, and what I would like to see
improved in a refactoring. The three major objectives are clarity,
extensibility and efficiency.

Clarity:

  The existing transformation framework, written in C++ and
  making extensive use of deferred evaluation of binary operation
  trees and values by reference, is difficult for most developers to
  understand (and hence enhance). Additionally, since all the heavy
  lifting is done in C++, python developers who are not versed in C++
  have an additional barrier to making contributions.

Indeed!

Extensibilty:

  We would like to make it fairly easy for users to add additional
  non-linear transformations. The current framework requires adding a
  new function at the C++ layer, and hacking into axes.py to support
  additional functions. We would like the existing nonlinear
  transformations (log and polar) to be part of a general
  infrastructure where users could supply their own nonlinear
  functions which map (possibly nonseparable) (xhat, yhat) ->
  separable (x, y). There are two parts to this: one pretty easy and
  one pretty hard.

  The easy part is supporting a transformation which has a separation
  callable that takes, eg an Nx2 array and returns and Nx2 array. For
  log, this will simply be log(XY), for polar, it will be
  r*cos(X[:,0]), r*sin(X[:,1]). Presumably we will want to take
  advantage of masked arrays to support invalid transformations, eg
  log of nonpositive data.

  The harder part is to support axis, tick and label layout
  generically. Currently we do this by special casing log and polar,
  either with special tick locators and formatters (log) or special
  derived Axes (polar).

Another hard part is grids. More generally, a straight line in
x,y becomes curved in x',y'. Ideally, a sequence of points plotted
on a straight line should lie directly on the transformed line. This
would make the caps on the polar_bar demo follow the arcs of the grid.

The extreme case is map projections, where for some projections, a
straight line will not even be connected.

Another issue is zooming and panning. For amusement, try it with
polar_demo.

Efficiency:

  There are three parts to the efficiency question: the efficiency of
  the transformation itself, the efficiency with which transformation
  data structures are updated in the presence of viewlim changes
  (panning and zooming, window resizing) and the efficiency in getting
  transformed data to the backends. My guess is that the new design
  may be slower or not dramatically faster for the first two (which
  are not the bottleneck in most cases anyhow) but you might get
  sigificant savings on the 3rd.

Changing the internal representation of things like collections so that
the transform can be done using numpy vectors will help a lot.

  - Paul

···

On Wed, Sep 12, 2007 at 01:11:54PM -0500, John Hunter wrote:

On 9/12/07, Michael Droettboom <mdroe@...31...> wrote:

Extensibilty:

  We would like to make it fairly easy for users to add additional
  non-linear transformations. The current framework requires adding a
  new function at the C++ layer, and hacking into axes.py to support
  additional functions. We would like the existing nonlinear
  transformations (log and polar) to be part of a general
  infrastructure where users could supply their own nonlinear
  functions which map (possibly nonseparable) (xhat, yhat) ->
  separable (x, y). There are two parts to this: one pretty easy and
  one pretty hard.

  The easy part is supporting a transformation which has a separation
  callable that takes, eg an Nx2 array and returns and Nx2 array. For
  log, this will simply be log(XY), for polar, it will be
  r*cos(X[:,0]), r*sin(X[:,1]). Presumably we will want to take
  advantage of masked arrays to support invalid transformations, eg
  log of nonpositive data.

  The harder part is to support axis, tick and label layout
  generically. Currently we do this by special casing log and polar,
  either with special tick locators and formatters (log) or special
  derived Axes (polar).

Another hard part is grids. More generally, a straight line in
x,y becomes curved in x',y'. Ideally, a sequence of points plotted
on a straight line should lie directly on the transformed line. This
would make the caps on the polar_bar demo follow the arcs of the grid.

The extreme case is map projections, where for some projections, a
straight line will not even be connected.

Just wanted to chime in because I've done some thinking on this problem for Chaco. Right now chaco's coordinate transformation process ("mapping") is handled by explicit objects that subclass from 1D and 2D mapper base classes. We're talking about moving to a scheme where the DisplayPDF GraphicsContext is extended into a MathematicalCanvas that is both aware of the transformation stack and is also aware of "screen" properties such as subpixel alignment and such. You would then be able to hand off dataspace coordinates to methods like move_to(), line_to(), rect(), etc., so you could move_to() a dataspace coordinate and then draw a screen-aligned box. The MathCanvas would also have additional methods like geodesic_to() for rendering manifold-aware grids and axes. (Of course, grids aren't necessary geodesics all the time.)

I don't know if discontinuous map projections could be handled cleanly in such a framework, without the renderer querying the canvas about screenspace limits of the current transformation.

Another issue is zooming and panning. For amusement, try it with
polar_demo.

Yes, one of the problems with non-linear transformations is that panning is very much a screen space interaction, and you have to map it back into data space to do proper data clipping and transformation. Unfortunately (and this is a problem even with logarithmic plots), the user may sometimes want to view things on the screen that are outside the valid domain of the coordinate transform, in which case the code handling the interaction (the "tool", in chaco parlance) has to be smart enough to maintain screen-space coordinates only.

-Peter

···

On Sep 12, 2007, at 3:27 PM, Paul Kienzle wrote: