pcolor and imshow PDF sizes

Hi,
In a previous email I pointed out that I was having problems with pcolormesh
output as a PDF: the files are really big and impractical even for smallish
arrays (1000x1000 pixels). I don't have that problem using imshow, which
presumably resamples the image or somesuch wizardry :slight_smile: Here's an example

import pylab
data = pylab.randn((512*512)).reshape((512,512))
#First imshow. I use the dpi keyword "just in case"
pylab.imshow ( data, interpolation='nearest')
pylab.savefig ("/tmp/imshow_72.pdf",dpi=72)
pylab.savefig ("/tmp/imshow.pdf")
#Now pcolormesh
pylab.pcolormesh ( data )
pylab.savefig ("/tmp/pcolor_72.pdf",dpi=72)
pylab.savefig ("/tmp/pcolor.pdf")

This results in the following files:
166K /tmp/imshow_72.pdf
307K /tmp/imshow.pdf
2.6M /tmp/pcolor_72.pdf
2.7M /tmp/pcolor.pdf

So: in the imshow case, the dpi keyword makes a difference (good!), but if you
compare the pcolormesh and imshow filesizes you immediately notice a large
difference. The rendering of the pcolor files is also very slow, line-by-line
sort of thing. I presume that the different patches are stored as vectors,
and that's why there's no change with setting dpi to 300 or to 72.

Is this the expected behaviour?
Thanks,
J

pcolormesh is outputting the data as vectors, since the mesh can be non-rectilinear, that's really the only thing that PDF supports. Besides, that's the only way to get a truly resolution-independent PDF. Since imshow is limited to uniform, rectilinear images, and PDF has built-in support for those, the file is much smaller and the drawing more efficient.

That said, there has been for some time experimental functionality to support drawing some elements "pre-rasterized" (meaning as images) to save on file size. This is actually already working in some backends (including PDF), it just hasn't been exposed to the user in a nice way yet. Eric Bruning had an elegant solution to add pre/post draw callbacks that would have really helped with this [1], but I don't know where all that ended. It would be great to pick that ball up and get it going again. If nothing else, it should be an easy fix to add a "rasterized" kwarg to pcolormesh -- but I don't recall if that's the interface that we arrived at the last time this came up.

[1] http://www.mail-archive.com/matplotlib-devel@lists.sourceforge.net/msg03490.html

Cheers,
Mike

Jose G锟絤ez-Dans wrote:

路路路

Hi,
In a previous email I pointed out that I was having problems with pcolormesh output as a PDF: the files are really big and impractical even for smallish arrays (1000x1000 pixels). I don't have that problem using imshow, which presumably resamples the image or somesuch wizardry :slight_smile: Here's an example

import pylab
data = pylab.randn((512*512)).reshape((512,512))
#First imshow. I use the dpi keyword "just in case"
pylab.imshow ( data, interpolation='nearest')
pylab.savefig ("/tmp/imshow_72.pdf",dpi=72)
pylab.savefig ("/tmp/imshow.pdf")
#Now pcolormesh
pylab.pcolormesh ( data )
pylab.savefig ("/tmp/pcolor_72.pdf",dpi=72)
pylab.savefig ("/tmp/pcolor.pdf")

This results in the following files:
166K /tmp/imshow_72.pdf
307K /tmp/imshow.pdf
2.6M /tmp/pcolor_72.pdf
2.7M /tmp/pcolor.pdf

So: in the imshow case, the dpi keyword makes a difference (good!), but if you compare the pcolormesh and imshow filesizes you immediately notice a large difference. The rendering of the pcolor files is also very slow, line-by-line sort of thing. I presume that the different patches are stored as vectors, and that's why there's no change with setting dpi to 300 or to 72.

Is this the expected behaviour? Thanks,
J

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-users
聽聽
--
Michael Droettboom
Science Software Branch
Operations and Engineering Division
Space Telescope Science Institute
Operated by AURA for NASA

That said, there has been for some time experimental functionality to
support drawing some elements "pre-rasterized" (meaning as images) to
save on file size. This is actually already working in some backends
(including PDF), it just hasn't been exposed to the user in a nice way
yet. Eric Bruning had an elegant solution to add pre/post draw
callbacks that would have really helped with this [1], but I don't know
where all that ended.

The precursor thread to the one Mike linked:
http://www.mail-archive.com/matplotlib-devel@lists.sourceforge.net/msg02659.html

I had proposed a get/set_rasterized method on each artist, and then
some internal details to make sure the rasterized property was checked
before the artist was drawn. That allowed raster rendering on a
per-artist basis. The patch in the thread above shows the changes
needed, which weren't many, so you could try to apply them if you
build matplotlib yourself.

I think the discussion wound up trailing off with nothing merged to
trunk since there were some broader design decisions needed on event
handling, etc.

-Eric

路路路

It would be great to pick that ball up and get it
going again. If nothing else, it should be an easy fix to add a
"rasterized" kwarg to pcolormesh -- but I don't recall if that's the
interface that we arrived at the last time this came up.

[1]
http://www.mail-archive.com/matplotlib-devel@lists.sourceforge.net/msg03490.html

Cheers,
Mike

Jose G贸mez-Dans wrote:

Hi,
In a previous email I pointed out that I was having problems with pcolormesh
output as a PDF: the files are really big and impractical even for smallish
arrays (1000x1000 pixels). I don't have that problem using imshow, which
presumably resamples the image or somesuch wizardry :slight_smile: Here's an example

import pylab
data = pylab.randn((512*512)).reshape((512,512))
#First imshow. I use the dpi keyword "just in case"
pylab.imshow ( data, interpolation='nearest')
pylab.savefig ("/tmp/imshow_72.pdf",dpi=72)
pylab.savefig ("/tmp/imshow.pdf")
#Now pcolormesh
pylab.pcolormesh ( data )
pylab.savefig ("/tmp/pcolor_72.pdf",dpi=72)
pylab.savefig ("/tmp/pcolor.pdf")

This results in the following files:
166K /tmp/imshow_72.pdf
307K /tmp/imshow.pdf
2.6M /tmp/pcolor_72.pdf
2.7M /tmp/pcolor.pdf

So: in the imshow case, the dpi keyword makes a difference (good!), but if you
compare the pcolormesh and imshow filesizes you immediately notice a large
difference. The rendering of the pcolor files is also very slow, line-by-line
sort of thing. I presume that the different patches are stored as vectors,
and that's why there's no change with setting dpi to 300 or to 72.

Is this the expected behaviour?
Thanks,
J