Millions of data points saved to pdf

nertskull <nertskull@...287...> writes:

The problem, is the pdf is unbearably slow when plotting as a scatter plot
or as a line with markers.

If I make a regular line plot, with no markers, just a single line, it is
plotted and the pdf is fine. But then it connects my points which I don't
want.

Others have commented on the volume of data, but that paragraph makes
me curious: are you saying that the results are acceptable if you do
something like

  plot(x, y, '-')

but not if you do

  plot(x, y, 'o') or plot(x, y, '-o')?

The amount of data in the pdf file should be within a constant factor in
all cases, but the '-' case there are only moveto and lineto commands,
while the two other cases render markers as something called an XObject,
which is repeated a lot of times on the page. I wonder if the overhead
from using an XObject is making the rendering application slow.

Does it help at all to use a simpler marker, e.g. plot(x, y, ',')? One
change you could try if you're feeling adventurous is the following
function in lib/matplotlib/backends/backend_pdf.py:

    def draw_markers(self, gc, marker_path, marker_trans, path, trans,
                     rgbFace=None):
        # For simple paths or small numbers of markers, don't bother
        # making an XObject
        if len(path) * len(marker_path) <= 10:
            RendererBase.draw_markers(self, gc, marker_path, marker_trans,
                                      path, trans, rgbFace)
            return
        # ...

The comment is not quite right: only if the path is short *and* the
number of markers is small does the XObject code get skipped. You could
just change the if statemt to "if True:" and rerun your code (possibly
with the ',' marker style). If that helps, it's evidence that we need to
revisit the condition for using XObjects for markers.

···

--
Jouni K. Sepp�nen
http://www.iki.fi/jks

That definitely helps. Here’s what I did.

First.

Yeah, the results are totally acceptable if I do '-' as my

line/marker. The pdf renders and loads just fine.

If I do 'o' or even ','  as my marker, then the pdf is horrendously

slow. I’m talking minutes to render a page.

So, I tried your idea of altering the backend

If I change that line the "if True:" then I get MUCH better

results. But I also get enormous file sizes.

I've taken a subset of 10 of my 750 graphs.

Those 10, before changing the backend, would make file sizes about

about 290KiB. After changing the backend, if I use plot(x, y, ‘-’)
I still get a file size about 290KiB.

But after changing the backend, if I use plot(x, y, '.') for my

markers, my file size is no 21+ MB. Just for 10 of my graphs. I’m
afraid making all 750 in the same pdf may be impossible at those
size.

BUT, at least now I can render those 10 in vector format.  Before it

took the pdf minutes to load a page. Now it only takes maybe 15-20
seconds to load a page of 10 graphs.

So that definitely helped.  Thanks!

Is there anyway to do this even better?  At this rate I'd have to

split my pdf file into multiple chunks, which really isn’t ideal to
have to send people 70 pdf files.

Is there anyway to have reasonable pdf sizes as well as this

improved performance for keeping them in vector format?

Thanks again.

View this message in context: Sent from the at Nabble.com.

···

On 05/01/2014 01:19 PM, Jouni K.
Seppänen [via matplotlib] wrote:

  nertskull

<[hidden email] >
writes:

  > The problem, is the pdf is unbearably slow when plotting as a

scatter plot

  > or as a line with markers.


  >


  > If I make a regular line plot, with no markers, just a single

line, it is

  > plotted and the pdf is fine.  But then it connects my points

which I don’t

  > want.




  Others have commented on the volume of data, but that paragraph

makes

  me curious: are you saying that the results are acceptable if you

do

  something like




    plot(x, y, '-')




  but not if you do




    plot(x, y, 'o')  or  plot(x, y, '-o')?




  The amount of data in the pdf file should be within a constant

factor in

  all cases, but the '-' case there are only moveto and lineto

commands,

  while the two other cases render markers as something called an

XObject,

  which is repeated a lot of times on the page. I wonder if the

overhead

  from using an XObject is making the rendering application slow.




  Does it help at all to use a simpler marker, e.g. plot(x, y, ',')?

One

  change you could try if you're feeling adventurous is the

following

  function in lib/matplotlib/backends/backend_pdf.py:




      def draw_markers(self, gc, marker_path, marker_trans, path,

trans,

                       rgbFace=None):


          # For simple paths or small numbers of markers, don't

bother

          # making an XObject


          if len(path) * len(marker_path) <= 10:


              RendererBase.draw_markers(self, gc, marker_path,

marker_trans,

                                        path, trans, rgbFace)


              return


          # ...




  The comment is not quite right: only if the path is short *and*

the

  number of markers is small does the XObject code get skipped. You

could

  just change the if statemt to "if True:" and rerun your code

(possibly

  with the ',' marker style). If that helps, it's evidence that we

need to

  revisit the condition for using XObjects for markers.




  --

  Jouni K. Seppänen


  [http://www.iki.fi/jks](http://www.iki.fi/jks)

  "Accelerate Dev Cycles with Automated Cross-Browser Testing - For

FREE

  Instantly run your Selenium tests across 300+ browser/OS combos.

Get

  unparalleled scalability from the best Selenium testing platform

available.

  Simple to use. Nothing to install. Get started now for free."


  [http://p.sf.net/sfu/SauceLabs](http://p.sf.net/sfu/SauceLabs)

  _______________________________________________


  Matplotlib-users mailing list


  [hidden email]


  [https://lists.sourceforge.net/lists/listinfo/matplotlib-users](https://lists.sourceforge.net/lists/listinfo/matplotlib-users)

      If you reply to this email, your

message will be added to the discussion below:

http://matplotlib.1069221.n5.nabble.com/Millions-of-data-points-saved-to-pdf-tp43338p43348.html

To unsubscribe from Millions of data points saved to pdf, click
here.

    [NAML](http://matplotlib.1069221.n5.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml)

Re: Millions of data points saved to pdf
matplotlib - users mailing list archive

As others tried to explain to you, plotting that many points in a plot
does not make any sense. The only thing that makes sense is to
down-sample your data to a manageable size. Depending on which features
of your data you are interested in, there are different methods for
doing that.

PS: which viewer are you using to render the PDF? I believe different
renders may have substantially different performances in rendering such
PDFs...

Cheers,
Daniele

···

On 01/05/2014 19:50, nertskull wrote:

Is there anyway to have reasonable pdf sizes as well as this improved
performance for keeping them in vector format?