Feature request: nice grouping in postscript output

Hi John et al., I just had a chance to play with

    > matplotlib for a few minutes, and I'm very encouraged!

Glad to hear it .... keep me informed of bugs and feature requests.

    > A feature request: I frequently use Adobe Illustrator to
    > touch up postscript files that contain my figures. In this
    > case, it is REALLY handy when the postscript files group
    > nicely. Knowing little-to-nothing about PostScript and
    > Illustrator, I have no idea how hard the behavior is to
    > implement, but it would be fantastic if it did.

I know nothing about Illustrator, and have been learning postscript as
I go, so bear with me.

    > I just tried a few things with the axes_demo and the
    > errorbar_demo in the examples directory. I liked that the
    > points grouped together. I didn't like that in the
    > errorbar_demo that the points and the errorbars grouped
    > together almost inseparably.

What do you mean by "grouped together". I assume this has something
to do with editing in Illustrator, but can you explain in more detail?

Learning (a little bit of) postscript has been a mind opening
experience. I know a lot of programming languages, and postscript
introduced me to several new ideas. It is difficult to take a
(somewhat) state independent OO representation of a graphical object
and translate it into the postscript state machine efficiently,
especially, when the postscript backend has to act like the other
backends at the interface level.

Simple example: suppose you want to draw all the axis tick labels,
each of which has the same font information. The abstract interface
makes a separate call for each label, which causes the postscript
backend to generate the same font information over and over again. A
smart postscript backend keep track of this information so it wouldn't
needlessly regenerate the information leading to file bloat. I would
like to make these improvements, but my first goal was to get
something that works.

Most of the improvements I've envisioned for the PS backend have been
in the realm of file size efficiency (I've seen some damn large PS
files in my day). So I'm interested to get your feedback about these
other areas that I don't yet understand.

    > With the demos tested, the primary curve or points grouped
    > with a rectangle around the plotting region that had no
    > fill or stroke but seemed to clip the contents to within
    > that box. I wonder if it would be nicer to produce
    > postscript output where the clipping is done before
    > rendering to a file, thus eliminating the need for this
    > rather strangely behaved box?

Could you also give me some detail here? Is the "box" the rectangular
border of the axes? With regards to a specific demo, what is "the
primary curve" and "rectangle"?

I do use postscript clipping of lines and other objects etc so that
they do not extend beyond the axes borders. Generally, I think this
is *a good thing*. The general organization of matplotlib is figures
contain axes which contain lines, patches and text. Normally, I don't
want lines, patches and text spilling out of their axes containers.

Can you explain a little more what you are trying to achieve in
Illustrator so I can get a better idea of what is missing? What
exactly is the 'strangely behaved box'?

    > Also, the generated plots have some two boxes, one with a
    > white stroke and one with a white fill, surrounding the
    > figure. These, too, seem unnecessary.

Yes, this is a holdover from the GUI. In a GUI presentation, the
plots look nicer with a boundary -- see eg,
http://matplotlib.sourceforge.net/screenshots/subplot_demo_large.png
where the gray border is the default figure background -- matlab does
this. So the figure (which contains the axes) renders a rectangular
border with a fill color. For the postscript backend, I simply made
these white and when I print on white paper, I never see them. They
can easily be done away with by commenting out the line

   self._figurePatch.draw(drawable)

in backends/backend_ps.py. I don't really have a problem removing it
entirely as I don't see much need for it in the PS backend, unless
someone wants to frame their plots with background rectangle. I
mainly left it in their for vestigial compatibility with the other
backends. But, so I can get a better understanding of the twisted
mind of Illustrator, could you explain to me what kind of problem this
is causing you?

JDH

John Hunter wrote:

    > I just tried a few things with the axes_demo and the
    > errorbar_demo in the examples directory. I liked that the
    > points grouped together. I didn't like that in the
    > errorbar_demo that the points and the errorbars grouped
    > together almost inseparably.

What do you mean by "grouped together". I assume this has something
to do with editing in Illustrator, but can you explain in more detail?

Basically, there are two select modes in Illustrator. The first, "Selection tool", selects a whole group of paths. The second, the "Direct selection tool" selects the path segment or otherwise smallest path portion possible.

Let's take the example of 5 circles which have been drawn and then grouped in Illustrator. Clicking on one with the selection tool will select all of them, because they are all grouped. Clicking one with the direct select tool will only get one (or actually probably only a single path component between anchor points or the anchor point itself if it was clicked).

There is some way Illustrator extracts this information from all PostScript files, but it probably just makes intelligent guesses when it's dealing with "foreign" PS. (I think it must embed Illustrator-specific comments or other directives when it saves an "Illustrator .eps".)

Learning (a little bit of) postscript has been a mind opening
experience. I know a lot of programming languages, and postscript
introduced me to several new ideas. It is difficult to take a
(somewhat) state independent OO representation of a graphical object
and translate it into the postscript state machine efficiently,
especially, when the postscript backend has to act like the other
backends at the interface level.

(Sounds like OpenGL!) I know less about PS than you, but it seems one way to go about doing what you describe is to build a virtual PS engine and render to it, and have it spit out only the state-changing instructions it received.

> A smart postscript backend keep track of this information so it wouldn't

needlessly regenerate the information leading to file bloat. I would
like to make these improvements, but my first goal was to get
something that works

Always a good first step! :slight_smile: Seriously, I realized that things were probably not very baroque yet, so I thought I'd pipe up to let you know about what is, IMHO, an important feature of good PS rendering.

    > With the demos tested, the primary curve or points grouped
    > with a rectangle around the plotting region that had no
    > fill or stroke but seemed to clip the contents to within
    > that box. I wonder if it would be nicer to produce
    > postscript output where the clipping is done before
    > rendering to a file, thus eliminating the need for this
    > rather strangely behaved box?

Could you also give me some detail here? Is the "box" the rectangular
border of the axes?

Yes.

With regards to a specific demo, what is "the
primary curve" and "rectangle"?

The "primary curve" consists of the main data points, either plotted as points/circles (in the case of the errorbar demo) or as line segments (in the axes_demo). "box" == "rectangle".

I do use postscript clipping of lines and other objects etc so that
they do not extend beyond the axes borders.

Yes, I see what you mean -- with this clipping box, the leftmost circle in the errorbars demo does not extend beyond the axes, and is therefore half cut-off. I'm not sure if this is desirable or not, but at least with the current behavior I could just go in and remove the clipping box. FYI, the circles, the errorbars (vertical lines), the "caps" on the bars, and the clipping rectangle all group together in Illustrator.

There is something that seems inconsistent to me with the current behavior -- the lower error "caps" that are completely beyond the clipping rectangle aren't present in the PS file at all. However, the errorbar does extend below the clipping rectangle to the position where the cap would be. Would things be more consistent if, when a clipping rectangle is used to do the clipping, all primitives get rendered and only the clipping rectangle handles clipping?

Generally, I think this
is *a good thing*. The general organization of matplotlib is figures
contain axes which contain lines, patches and text. Normally, I don't
want lines, patches and text spilling out of their axes containers.

Yes, I just wonder about the explicit-ness of a decision about whether it's matplotlib or PS that does clipping. I don't know enough to feel strongly, but if file-size is a factor, it should presumably be done by matplotlib. On the other hand, I think optimizations (even for file size) should happen later and for now maybe rendering everything to PS and letting it handle clipping is best. On the third(!) hand, huge files are clearly undesirable and perhaps the best plan is what seems to already be done -- any primitives totally outside the clipping area aren't drawn, but otherwise, they are drawn with PS itself doing the clipping. This point is just food for thought.

Can you explain a little more what you are trying to achieve in
Illustrator so I can get a better idea of what is missing? What
exactly is the 'strangely behaved box'?

"strangely behaved" == if you remove one corner from a clipping rectangle, it then becomes a clipping triangle that leaves half of your plot normal and the other half disappears. This happens up to some distance away from the corner you just deleted. That's why I call it strangely behaved, but I think I do understand it.

    > Also, the generated plots have some two boxes, one with a
    > white stroke and one with a white fill, surrounding the
    > figure. These, too, seem unnecessary.

Yes, this is a holdover from the GUI. In a GUI presentation, the
plots look nicer with a boundary -- see eg,
http://matplotlib.sourceforge.net/screenshots/subplot_demo_large.png
where the gray border is the default figure background -- matlab does
this. So the figure (which contains the axes) renders a rectangular
border with a fill color. For the postscript backend, I simply made
these white and when I print on white paper, I never see them. They
can easily be done away with by commenting out the line

   self._figurePatch.draw(drawable)

in backends/backend_ps.py. I don't really have a problem removing it
entirely as I don't see much need for it in the PS backend, unless
someone wants to frame their plots with background rectangle. I
mainly left it in their for vestigial compatibility with the other
backends. But, so I can get a better understanding of the twisted
mind of Illustrator, could you explain to me what kind of problem this
is causing you?

No real problem, I'm just (mildly) against idea of invisible primitives in PS files. (This probably stems from me dealing with PS output from matlab5 many years ago when I remember sorting through layer after layer after layer of "strangely behaved rectangles" just to manipulate my data. It's quite funny to me that matplotlib produces the most matlab-like PS files I've seen in a while! Still nowhere near the number of layers, though!)

Anyhow, I'd love to dive into the code and help you with the PS/Illustrator improvements, but I have no time at the moment...

Cheers!
Andrew