large postscript files

John_Hunter · September 6, 2006, 1:49pm

which is 17 bytes long. 17*80000 = 1.36MB. Maybe we dont

> need as many sig figs, that could cut the size down by
> maybe 25%.

We could make the fmt string for PS and SVG output floats a
configurable parameter and be done with it. As you know, we've spent
a lot of time trying to get the right string that doesn't produce
visual errors and I'm not inclined to change the default. But we
could make it an rc param for those who want to trade accuracy for
space.

JDH

_Darren_Dale1 · September 6, 2006, 2:08pm

Good idea. Should we make it a kwarg or an rc parameter? The number of rc
parameters continues to grow...

···

On Wednesday 06 September 2006 09:49, John Hunter wrote:

    > which is 17 bytes long. 17*80000 = 1.36MB. Maybe we dont
    > need as many sig figs, that could cut the size down by
    > maybe 25%.

We could make the fmt string for PS and SVG output floats a
configurable parameter and be done with it. As you know, we've spent
a lot of time trying to get the right string that doesn't produce
visual errors and I'm not inclined to change the default. But we
could make it an rc param for those who want to trade accuracy for
space.

Alan_G_Isaac1 · September 6, 2006, 2:43pm

Does anyone really care about 25% enough to make this
worthwhile? Just wondering.

Cheers,
Alan Isaac

···

On Wed, 6 Sep 2006, John apparently wrote:

could make it an rc param for those who want to trade
accuracy for space.

_Chris_Barker · September 6, 2006, 4:42pm

Alan G Isaac wrote:

Does anyone really care about 25% enough to make this worthwhile? Just wondering.

I tend to think not. You put 80,000 points in a PS, it's going to be big. That's all there is to it, it's the nature of Postscript.

I do think clipping is a good idea though.

What is the maximum precision in Postscript? It seems unlikely that you could plot 80,000 points and not have number of them overlap, unless it's clipped, so removing essentially redundant points may be another way to to go.

-Chris

···

--
Christopher Barker, Ph.D.
Oceanographer

NOAA/OR&R/HAZMAT (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception

Chris.Barker@...259...

Joris_De_Ridder · September 7, 2006, 3:52pm

[CB]: What is the maximum precision in Postscript? It seems unlikely that you
   [CB]: could plot 80,000 points and not have number of them overlap, unless
   [CB]: it's clipped, so removing essentially redundant points may be another
   [CB]: way to to go.

This seems to be the way that xmgrace does it. I have been comparing
postscript output of mpl and xmgrace, and found that mpl incorporates the
fulll 80000 points while xmgrace only retains roughly 17000 of them.

So I downloaded the xmgrace source code and tried to figure out why this is
the case. Their postscript driver routines are in psdrv.c and are called by routines
in draw.c. Looking into the latter reveiled that they have indeed a purge_dense_points()
function, which removes the points that you wouldn't see on the plot anyway.

As far as I understand, the algorithm works as follows:

1) Define what you mean by "dense" points
    a) init what you mean by far_away
    b) start at the first point, and loop over the points until you find one far_away
    c) hop to that one, continue looping until you find again a point far_away
    d) hop to that one, continue looping... etc until you looped over all points
    e) count the number of hops.
        If all points turn out to be far away from each other, 'far_away'
        was too conservative -> increase far_away and repeat steps b)-e).
        In the other case: you found a definition of "dense"

2) Sift out the dense points
Similar to 1) but only keep the points you hopped to, i.e. the ones that
are far_away from each other.

It seems to work well for xmgrace, I never encountered any problems of too many
points purged. I do think it needs to be coded in C to work efficiently, though, i.e.
Python would be too slow. Perhaps the mpl developers might be interested to include
it one day? The gain in postscript file size would be huge...

Cheers,
Joris

Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm

_Chris_Barker · September 7, 2006, 4:56pm

Joris De Ridder wrote:

As far as I understand, the algorithm works as follows:

1) Define what you mean by "dense" points
    a) init what you mean by far_away b) start at the first point, and loop over the points until you find one far_away
    c) hop to that one, continue looping until you find again a point far_away
    d) hop to that one, continue looping... etc until you looped over all points
    e) count the number of hops.
        If all points turn out to be far away from each other, 'far_away'
        was too conservative -> increase far_away and repeat steps b)-e).
        In the other case: you found a definition of "dense"

This is an odd way to do it. It seems to me that the only way to define "dense" is by resolution. What resolution do you want to be able to display? As PS is scalable, this isn't an easy question, but it is at least limited by the resolution of PS -- does PS use floating or fixed point, and what precision? Resolution could also be a user-settable property.

Another way to define resolution is by how you are drawing the points. If you are drawing dots at 1pt diameter, then there is no reason to plot two separate dots that are only 0.1pt from each-other.

-Chris

···

--
Christopher Barker, Ph.D.
Oceanographer

NOAA/OR&R/HAZMAT (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception

Chris.Barker@...259...