eps export files crash text processors

Martin_Manns · July 13, 2006, 2:32pm

Hi all:

> When I use matplotlib for a scatter plot with both dots and connecting
> lines, the exported eps file is huge, if the distances between many points
> are small. I think of this as a bug, since no preview tiff is included in
> the generated eps and a variety of text processing applications (including
> OpenOffice) crash when I try to import the eps. Ghostscript takes forever,
> too. Is there anything that I can do in order to export reasonable eps
> files?

I suggest upgrading to 0.87.3.

However, I still do see no working Gentoo ebuild. Is there any out there?

The bigger problem is that each file format has basic characteristics
and limitations. If you draw a million markers and line segments, you
are inevitably going to have a big postscript file, unless the
postscript backend somehow detects the fact that almost all of your
points are indistinguishable and therefore deletes most of them--and
this is really asking too much of a plotting backend, I think. (An
alternative is to generate a pixel image and make the postscript from
that; this is what matlab does under some circumstances, but it can
result in big files of poor quality.)

The problems with crashing applications do not occur, if only dots are
printed. Furthermore, only if line lengths shrink to infinitesimal values,
ghostscript keeps processing them with a CPU load of 100% and Gigabytes of
RAM consumption. Even if this may be a ghostscript bug, I think that
a postscript backend for scientific plotting should look for lines that
have a length of less than epsilon (what might be 1/1e6 inch or
something) and exclude them from the eps. Looking for dots that lie close
together however surely would be too much to ask for but this does not
lead ghostscript, OpenOffice, etc. to fail. (Printers normally time out.)

Finally, we dont include tiff previews in our eps files, so this is not a bug.

At least for me, including a tiff preview would really be beneficial.
I did not want to call this behavior a bug, but please consider such an
option.

Your options include: filter your points beforehand so you only plot
points that are distinct; or use a pixel-based format like png, which
keeps the file size under control.

Unfortunately, this much easier in the proof of error that I sent
you than in my actual problem, since I then have to create a copy
of the data just for plotting.

Thus, for now I use png images but I still would prefer eps.

Thank you for all of your answers.

Martin

_Darren_Dale1 · July 13, 2006, 8:40am

> The bigger problem is that each file format has basic characteristics
> and limitations. If you draw a million markers and line segments, you
> are inevitably going to have a big postscript file, unless the
> postscript backend somehow detects the fact that almost all of your
> points are indistinguishable and therefore deletes most of them--and
> this is really asking too much of a plotting backend, I think. (An
> alternative is to generate a pixel image and make the postscript from
> that; this is what matlab does under some circumstances, but it can
> result in big files of poor quality.)

The problems with crashing applications do not occur, if only dots are
printed. Furthermore, only if line lengths shrink to infinitesimal values,
ghostscript keeps processing them with a CPU load of 100% and Gigabytes of
RAM consumption. Even if this may be a ghostscript bug, I think that
a postscript backend for scientific plotting should look for lines that
have a length of less than epsilon (what might be 1/1e6 inch or
something) and exclude them from the eps.

I spent quite a bit of effort streamlining eps file creation. Adding a step to
compare each point with the last one written will add a lot of overhead in
order to deal with a relatively rare case such as yours. There are lots of
users who want mpl to run as fast as possible, so please try to understand
that there has to be some tradeoffs.

> Finally, we dont include tiff previews in our eps files, so this is not a
> bug.

At least for me, including a tiff preview would really be beneficial.
I did not want to call this behavior a bug, but please consider such an
option.

I dont think we can consider this until we bring the file size down by
improving the font handling. But you can submit a feature request at the
sourceforge site, although a patch would be preferred.

> Your options include: filter your points beforehand so you only plot
> points that are distinct; or use a pixel-based format like png, which
> keeps the file size under control.

Unfortunately, this much easier in the proof of error that I sent
you than in my actual problem, since I then have to create a copy
of the data just for plotting.

Just so for the postscript backend.

Darren

···

On Thursday 13 July 2006 10:32, Martin Manns wrote:

PGM · July 13, 2006, 3:00pm

> I suggest upgrading to 0.87.3.

However, I still do see no working Gentoo ebuild. Is there any out there?

You can find one at http://bugs.gentoo.org/show_bug.cgi?id=136429

Problem is that you'd have to get the whole gentooscience overlay, so, here's
the matplotlib ebuild and the corresponding files. Just uncompress in
$OVERLAY/dev-python, run ebuild matplotlib-0.87.4.ebuild digest, and emerge.
(You may wanna get the whole overlay anyway as you'll probably need some other
packages, or you could just let me know offlist).

Word of advice: there've been some changes in the USE keywords, check them
with emerge -pv first. The "wxwindows " is still buggy, but I was able to get
a working installation with USE="agg gtk tcltk" emerge -v matplotlib

matplotlib-0.87.4.tar.gz (3.88 KB)