Millions of data points saved to pdf

I am trying to create a multipage pdf of about 750 different graphs.

Each graph has around 5,000 - 15,000 data points, giving me roughly 7
million points across the pdf. I make it in a large pdf with a page length
of about 20 inches, and then plot about 10 graphs to a page. So I end up
with basically 75 pages in my pdf. I'm basically trying to graph a line of
XY data points.

The problem, is the pdf is unbearably slow when plotting as a scatter plot
or as a line with markers.

If I make a regular line plot, with no markers, just a single line, it is
plotted and the pdf is fine. But then it connects my points which I don't
want.

I assume this is all because its making the pdf in vector format. And when
I convert it to single lines, I only have ~750 line vectors. But when I try
to scatter plot, or line plot with markers, I end up with millions of
vectors.

I've tried the 'rasterized=True' and that definitely works. But the quality
is really bad. I need to be able to zoom in close on the pdf and still see
rough resolution of the points.

For clarity, I don't actually need to see each individual points. The
graphs have two lines on them, and I just need to be able to distinguish
between the two lines. The two lines are just made up of thousands of
points each.

Is there anyway to keep scalable vectors and do this? Or will I just be
forced to go to a rasterized image file in order to load the pdf in a
reasonable time.

Thanks.

···

--
View this message in context: http://matplotlib.1069221.n5.nabble.com/Millions-of-data-points-saved-to-pdf-tp43338.html
Sent from the matplotlib - users mailing list archive at Nabble.com.

Hi,

when reading the number of points you have in each plot, I have to ask
why you need so many (plotted) data points. If you plot e.g. every
10th or 50th data point, you reduce the number of points by a factor
of 10 (or 50). This should make the PDF smaller and faster and even if
you zoom into each plot, you should be able to see enough details (of
course, if there are one or two outliers you might not see them). And
probably you are not able to distinguish between two data points if
they are too close to each other so you probably don't need every data
point.

Cheers,

Dominik

I am trying to create a multipage pdf of about 750 different
graphs.

Each graph has around 5,000 - 15,000 data points, giving me roughly
7 million points across the pdf. I make it in a large pdf with a
page length of about 20 inches, and then plot about 10 graphs to a
page. So I end up with basically 75 pages in my pdf. I'm
basically trying to graph a line of XY data points.

The problem, is the pdf is unbearably slow when plotting as a
scatter plot or as a line with markers.

If I make a regular line plot, with no markers, just a single line,
it is plotted and the pdf is fine. But then it connects my points
which I don't want.

I assume this is all because its making the pdf in vector format.
And when I convert it to single lines, I only have ~750 line
vectors. But when I try to scatter plot, or line plot with
markers, I end up with millions of vectors.

I've tried the 'rasterized=True' and that definitely works. But
the quality is really bad. I need to be able to zoom in close on
the pdf and still see rough resolution of the points.

For clarity, I don't actually need to see each individual points.
The graphs have two lines on them, and I just need to be able to
distinguish between the two lines. The two lines are just made up
of thousands of points each.

Is there anyway to keep scalable vectors and do this? Or will I
just be forced to go to a rasterized image file in order to load
the pdf in a reasonable time.

Thanks.

-- View this message in context:
http://matplotlib.1069221.n5.nabble.com/Millions-of-data-points-saved-to-pdf-tp43338.html

------------------------------------------------------------------------------

"Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE

Instantly run your Selenium tests across 300+ browser/OS combos.
Get unparalleled scalability from the best Selenium testing
platform available. Simple to use. Nothing to install. Get started
now for free." http://p.sf.net/sfu/SauceLabs
_______________________________________________ Matplotlib-users
mailing list Matplotlib-users@lists.sourceforge.net
matplotlib-users List Signup and Options

- --
Dominik Klaes
Deputy student representative of the AIfA
Argelander-Institut f�r Astronomie
Room 2.027a
Auf dem H�gel 71
53121 Bonn

Telefon: 0228/73-5773
E-Mail: dklaes@...1721...
Homepage: http://www.astro.uni-bonn.de/~dklaes/

···

On 05/01/2014 02:09 PM, nertskull wrote:
Sent from the matplotlib - users mailing list archive at Nabble.com.

How about different line styles or colors instead of markers?

···


Sent from Mailbox

On Thu, May 1, 2014 at 2:10 PM, nertskull <nertskull@…287…> wrote:

I am trying to create a multipage pdf of about 750 different graphs.

Each graph has around 5,000 - 15,000 data points, giving me roughly 7

million points across the pdf. I make it in a large pdf with a page length

of about 20 inches, and then plot about 10 graphs to a page. So I end up

with basically 75 pages in my pdf. I’m basically trying to graph a line of

XY data points.

The problem, is the pdf is unbearably slow when plotting as a scatter plot

or as a line with markers.

If I make a regular line plot, with no markers, just a single line, it is

plotted and the pdf is fine. But then it connects my points which I don’t

want.

I assume this is all because its making the pdf in vector format. And when

I convert it to single lines, I only have ~750 line vectors. But when I try

to scatter plot, or line plot with markers, I end up with millions of

vectors.

I’ve tried the ‘rasterized=True’ and that definitely works. But the quality

is really bad. I need to be able to zoom in close on the pdf and still see

rough resolution of the points.

For clarity, I don’t actually need to see each individual points. The

graphs have two lines on them, and I just need to be able to distinguish

between the two lines. The two lines are just made up of thousands of

points each.

Is there anyway to keep scalable vectors and do this? Or will I just be

forced to go to a rasterized image file in order to load the pdf in a

reasonable time.

Thanks.

View this message in context: http://matplotlib.1069221.n5.nabble.com/Millions-of-data-points-saved-to-pdf-tp43338.html

Sent from the matplotlib - users mailing list archive at Nabble.com.


"Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE

Instantly run your Selenium tests across 300+ browser/OS combos. Get
unparalleled scalability from the best Selenium testing platform available.

Simple to use. Nothing to install. Get started now for free."

http://p.sf.net/sfu/SauceLabs


Matplotlib-users mailing list

Matplotlib-users@lists.sourceforge.net

matplotlib-users List Signup and Options

Suppose each data point is only 1 point (1/72 ") in diameter.
A solid line across a 20" page is less than 1500 points.
You're using a fraction of a page per graph and trying to
plot 5,000-15,000 points per graph. This is pointless (pun
intended) for visual display, especially since you do not
care about the individual points. What happens if you
decimate the points? Is the result acceptable?

Perhaps you could do even better than that, given your
posted description. Fit a line to the points, and only
plot the fitted line. Or use something like `hexbin`.

Alan Isaac

What do you consider a gap?
Perhaps if you know that you can find those in your data and if you really want to visualize the gaps, plot those instead of the data.

···


Sent from Mailbox

On Thu, May 1, 2014 at 2:41 PM, Alan G Isaac <alan.isaac@…287…> wrote:

Suppose each data point is only 1 point (1/72 ") in diameter.

A solid line across a 20" page is less than 1500 points.

You’re using a fraction of a page per graph and trying to

plot 5,000-15,000 points per graph. This is pointless (pun

intended) for visual display, especially since you do not

care about the individual points. What happens if you

decimate the points? Is the result acceptable?

Perhaps you could do even better than that, given your

posted description. Fit a line to the points, and only

plot the fitted line. Or use something like hexbin.

Alan Isaac


"Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE

Instantly run your Selenium tests across 300+ browser/OS combos. Get
unparalleled scalability from the best Selenium testing platform available.

Simple to use. Nothing to install. Get started now for free."

http://p.sf.net/sfu/SauceLabs


Matplotlib-users mailing list

Matplotlib-users@lists.sourceforge.net

matplotlib-users List Signup and Options

No we definitely aren't really interested in the gaps. Gaps are just where
we were unable to collect the data.

I don't know if we can attach pictures to this thread or not, but I'm going
to try.

The attached is roughly what I want, but with all 750 as vectors.

I want to see the 'movement' of the line, but I need the gaps to remain, so
I know where they are.

The problem with plotting a reduced data set, is I lose some of the very
small sections of line. I'll play around with that idea, but we want to be
able to zoom in on a vector file, and see the tiny areas of less than
10points that would be lost if we plot a reduced data set.

But what it sounds like, is it is unlikely this will work in vector graphics
form. Its just too much to do without reducing the dataset.

<http://matplotlib.1069221.n5.nabble.com/file/n43344/figure_1.png>

···

--
View this message in context: http://matplotlib.1069221.n5.nabble.com/Millions-of-data-points-saved-to-pdf-tp43338p43344.html
Sent from the matplotlib - users mailing list archive at Nabble.com.

This makes me wonder if you would be better served with something like bokeh:

http://bokeh.pydata.org/

Cheers!

Ben Root

···

On Thu, May 1, 2014 at 9:28 AM, nertskull <nertskull@…287…> wrote:

No we definitely aren’t really interested in the gaps. Gaps are just where

we were unable to collect the data.

I don’t know if we can attach pictures to this thread or not, but I’m going

to try.

The attached is roughly what I want, but with all 750 as vectors.

I want to see the ‘movement’ of the line, but I need the gaps to remain, so

I know where they are.

The problem with plotting a reduced data set, is I lose some of the very

small sections of line. I’ll play around with that idea, but we want to be

able to zoom in on a vector file, and see the tiny areas of less than

10points that would be lost if we plot a reduced data set.

But what it sounds like, is it is unlikely this will work in vector graphics

form. Its just too much to do without reducing the dataset.

<http://matplotlib.1069221.n5.nabble.com/file/n43344/figure_1.png>

View this message in context: http://matplotlib.1069221.n5.nabble.com/Millions-of-data-points-saved-to-pdf-tp43338p43344.html
Sent from the matplotlib - users mailing list archive at Nabble.com.


"Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE

Instantly run your Selenium tests across 300+ browser/OS combos. Get

unparalleled scalability from the best Selenium testing platform available.

Simple to use. Nothing to install. Get started now for free."

http://p.sf.net/sfu/SauceLabs


Matplotlib-users mailing list

Matplotlib-users@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/matplotlib-users