Millions of data points saved to pdf

nertskull <nertskull@...287...> writes:

If I change that line the "if True:" then I get MUCH better results.
But I also get enormous file sizes.

That's interesting! It means that your pdf viewing program (which one,
by the way? Adobe Reader or some alternative?) is slow at compositing a
large number of prerendered markers, or perhaps it just renders each of
them again and again instead of prerendering, and does so more slowly
than if they were part of the same path.

I've taken a subset of 10 of my 750 graphs.

Those 10, before changing the backend, would make file sizes about about
290KiB. After changing the backend, if I use plot(x, y, '-') I still
get a file size about 290KiB.

But after changing the backend, if I use plot(x, y, '.') for my markers,
my file size is no 21+ MB. Just for 10 of my graphs. I'm afraid making
all 750 in the same pdf may be impossible at those size.

Does using ',' (comma) instead of '.' (full stop) as the marker help? I
think the '.' marker is a circle, just at a small size, while the ','
marker is just two very short lines in the pdf backend. If the ','
marker produces an acceptable file size but its shape is not good
enough, we could experiment with creating a marker of intermediate
complexity.

One thing that I never thought about much is the precision in the
numbers the pdf backend outputs in the file. It seems that they are
being output with a fixed precision of ten digits after the decimal
point, which is probably overkill. There is currently no way to change
this except by editing the source code - the critical line is

        r = ("%.10f" % obj).encode('ascii')

where 10 is the number of digits used. The same precision is used for
all floating-point numbers, including various transformation matrices,
so I can't offer a simple rule for how large deviations you will cause
by reducing the precision - you could experiment by making one figure
with the existing code and another with '%.3f', and see if the latter
looks good enough at the kind of zoom levels you are going to use (and
if it really reduces the file size much - there's a compression layer on
top of the ASCII representation).

That reminds me: one thing that could have an effect is the
pdf.compression setting, which defaults to 6 but you can set it to 9
to make the compressed size a little bit smaller, at the expense of
spending more time when writing the file. That's not going to be a major
difference, though.

Is there anyway to have reasonable pdf sizes as well as this improved
performance for keeping them in vector format?

Like others have recommended, rendering huge clouds of single points is
a problematic task. I think it's an entirely valid thing to ask for, but
it's not likely that there will be a perfect solution, and some other
way of visualizing the data may be needed. Bokeh (suggested by Benjamin
Root) looks like something that could fit your needs better than a pdf
file in a viewer.

···

--
Jouni K. Sepp�nen
http://www.iki.fi/jks

Dear colleagues,
I had a similar issues with a large
plot and several thousands of elements printed under Linux and Qt4Agg back-end.
At the PDF render I got some vector overlay and distortion of markers
in the drawing, so I’ve changed the plotting output into a two step process,
generating first a high resolution “.png” file and the using
the Python image library to compress it into a much smaller .jpeg image
output, which produces a browser friendly file or input source for Adobe
.pdf editors like OpenOffice.
Source:
import Image
# size for jpg and png output (16000 x 12000 pixel)
w = 80
h = 60
#
dpi_resolution = 400
fig.set_size_inches(w,h)
DPI = fig.get_dpi()
print "DPI:", DPI
Size = fig.get_size_inches()
print "Size in Inches", Size
myformats = plt.gcf().canvas.get_supported_filetypes()
print "Supported formats are: " + str(myformats)
mybackend = plt.get_backend()
print "Backend used is: " + str(mybackend)
# save screen copy
fig.savefig('myplot.png', format='png', dpi= (dpi_resolution))
# JPEG compression with quality of 10
myimage = Image.open('myplot.png')
myimage = myimage.resize((16000, 12000), Image.ANTIALIAS)
#quality = 10% .. very high compression with few blurs
quality_val = 10
myimage.save('myplot.jpg', 'JPEG', quality=quality_val)
The visual result looks acceptable with
no distortion. This process gives some control about compression and quality.

Hope this is useful.
Regards,
Claude

···

**
Claude Falbriard
Certified IT Specialist L2 - Middleware
AMS Hortolândia / SP - Brazil
phone: +55 13 9 9760 0453
cell: +55 13 9 8117 3316
e-mail: claudef@…3779…**

From:
Jouni K. Seppänen <jks@…397…>

To:
matplotlib-users@…1544…ceforge.net,

Date:
02/05/2014 12:55

Subject:
Re: [Matplotlib-users]
Millions of data points saved to pdf


`nertskull <nertskull@…287…> writes:

If I change that line the “if True:” then I get MUCH better
results.

But I also get enormous file sizes.

That’s interesting! It means that your pdf viewing program (which one,

by the way? Adobe Reader or some alternative?) is slow at compositing a

large number of prerendered markers, or perhaps it just renders each of

them again and again instead of prerendering, and does so more slowly

than if they were part of the same path.

I’ve taken a subset of 10 of my 750 graphs.

Those 10, before changing the backend, would make file sizes about
about

290KiB. After changing the backend, if I use plot(x, y, ‘-’)
I still

get a file size about 290KiB.

But after changing the backend, if I use plot(x, y, ‘.’) for my markers,

my file size is no 21+ MB. Just for 10 of my graphs. I’m
afraid making

all 750 in the same pdf may be impossible at those size.

Does using ‘,’ (comma) instead of ‘.’ (full stop) as the marker help? I

think the ‘.’ marker is a circle, just at a small size, while the ‘,’

marker is just two very short lines in the pdf backend. If the ‘,’

marker produces an acceptable file size but its shape is not good

enough, we could experiment with creating a marker of intermediate

complexity.

One thing that I never thought about much is the precision in the

numbers the pdf backend outputs in the file. It seems that they are

being output with a fixed precision of ten digits after the decimal

point, which is probably overkill. There is currently no way to change

this except by editing the source code - the critical line is

    r = ("%.10f" % obj).encode('ascii')

where 10 is the number of digits used. The same precision is used for

all floating-point numbers, including various transformation matrices,

so I can’t offer a simple rule for how large deviations you will cause

by reducing the precision - you could experiment by making one figure

with the existing code and another with ‘%.3f’, and see if the latter

looks good enough at the kind of zoom levels you are going to use (and

if it really reduces the file size much - there’s a compression layer on

top of the ASCII representation).

That reminds me: one thing that could have an effect is the

pdf.compression setting, which defaults to 6 but you can set it to 9

to make the compressed size a little bit smaller, at the expense of

spending more time when writing the file. That’s not going to be a major

difference, though.

Is there anyway to have reasonable pdf sizes as well as this improved

performance for keeping them in vector format?

Like others have recommended, rendering huge clouds of single points is

a problematic task. I think it’s an entirely valid thing to ask for, but

it’s not likely that there will be a perfect solution, and some other

way of visualizing the data may be needed. Bokeh (suggested by Benjamin

Root) looks like something that could fit your needs better than a pdf

file in a viewer.

Jouni K. Seppänen

[http://www.iki.fi/jks](http://www.iki.fi/jks)


"Accelerate Dev Cycles with Automated Cross-Browser Testing - For
FREE

Instantly run your Selenium tests across 300+ browser/OS combos. Get

unparalleled scalability from the best Selenium testing platform available.

Simple to use. Nothing to install. Get started now for free."

[http://p.sf.net/sfu/SauceLabs](http://p.sf.net/sfu/SauceLabs)


Matplotlib-users mailing list

Matplotlib-users@lists.sourceforge.net

[https://lists.sourceforge.net/lists/listinfo/matplotlib-users](https://lists.sourceforge.net/lists/listinfo/matplotlib-users)

`