Debian + mpl 0.98.5 - Can we reduce the size of generated doc?

Hello,
the problem is this:

$ ls -l python-matplotlib-doc_0.98.5-1_all.deb
-rw-r--r-- 1 morph morph 91141234 2008-12-16 10:39
python-matplotlib-doc_0.98.5-1_all.deb

90M of doc package is a "little bit"... and expanded it's

$ du . -hs
119M

In this package we install: doc/build/html/
doc/build/latex/Matplotlib.pdf examples/*

So I dig into a bit to identify the cause of such a big jump (0.98.3
has a 16M doc package) and the biggest dir seems to be
html/_static/plot_directive/mpl_examples/pylab_examples with >50M (a
full result of "find examples/ html/ -type d -exec du -s {} \;" is
attached; if you need I can add a full file list). In that particular
dir (like in many other) i see a png, and hi-res png and a pdf file:
are those all needed?

At the end, is there a way we packagers can reduce this package size?
(a fast stat showed it would be the second biggest -doc package in
Debian.)

Thanks,

find_du_mpl-doc_0.98.5.txt (2.2 KB)

···

--
Sandro Tosi (aka morph, Morpheus, matrixhasu)
My website: http://matrixhasu.altervista.org/
Me at Debian: http://wiki.debian.org/SandroTosi

One question to ask is whether we need both the .pdf and .html manuals, or whether that could be broken up into two packages. That seems like an easy one if that fits Debian policies (which I know nothing about) -- and wouldn't degrade the documentation experience at all.

Is there any way to transparently gzip compress the html files (as I believe Debian does with manpages)?

Another option would be to not generate the high-res png and pdf examples. (Either or both). Just removing them after the fact won't work, since one would end up with broken links from the docs. We would also have to change the html to not include those links. It should be fairly simple to provide an option to the doc build system to do this, and I'm happy to implement that if we all agree that's the direction to take.

Beyond that, we'd be looking at selectively including certain examples, which I worry would add additional maintenance burden if there's too much divergence between the "full" and "small" manuals. I think that road should be a last resort.
Mike

Sandro Tosi wrote:

···

Hello,
the problem is this:

$ ls -l python-matplotlib-doc_0.98.5-1_all.deb
-rw-r--r-- 1 morph morph 91141234 2008-12-16 10:39
python-matplotlib-doc_0.98.5-1_all.deb

90M of doc package is a "little bit"... and expanded it's

$ du . -hs
119M

In this package we install: doc/build/html/
doc/build/latex/Matplotlib.pdf examples/*

So I dig into a bit to identify the cause of such a big jump (0.98.3
has a 16M doc package) and the biggest dir seems to be
html/_static/plot_directive/mpl_examples/pylab_examples with >50M (a
full result of "find examples/ html/ -type d -exec du -s {} \;" is
attached; if you need I can add a full file list). In that particular
dir (like in many other) i see a png, and hi-res png and a pdf file:
are those all needed?

At the end, is there a way we packagers can reduce this package size?
(a fast stat showed it would be the second biggest -doc package in
Debian.)

Thanks,
  ------------------------------------------------------------------------

------------------------------------------------------------------------------
SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
The future of the web can't happen without you. Join us at MIX09 to help
pave the way to the Next Web now. Learn more and register at
http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/
------------------------------------------------------------------------

_______________________________________________
Matplotlib-devel mailing list
Matplotlib-devel@lists.sourceforge.net
matplotlib-devel List Signup and Options

--
Michael Droettboom
Science Software Branch
Operations and Engineering Division
Space Telescope Science Institute
Operated by AURA for NASA

This is fine with me. Just add an rc option

  doc.minimal_footprint

or something like that which drops the high res and pdf. This will
prbably reduce the size 50-60%

I have also removed the mpl_data symlink in my doc tree, and am still
testing before I commit, because that is causing our binaries to get
much larger on platforms which do not properly support linking.
Instead, we'll refer explicitly to ../lib/matplotlib/mpl-data in the
docs. I am leaving the mpl_examples symlink because of all the
relative path woes in pyplot.

Other than simple optimizations like this, I am discinclined to try
and build smaller manuals simply to reduce their size. The feature
that caused the explosion in size between 98.3 and 98.5 is the
gallery, and image enhanced examples, which is arguably the most
useful feature on the site.

···

On Tue, Dec 16, 2008 at 8:10 AM, Michael Droettboom <mdroe@...31...> wrote:

Another option would be to not generate the high-res png and pdf
examples. (Either or both). Just removing them after the fact won't
work, since one would end up with broken links from the docs. We would
also have to change the html to not include those links. It should be
fairly simple to provide an option to the doc build system to do this,
and I'm happy to implement that if we all agree that's the direction to
take.

It seems like the documentation should be a separately installable package as far as package managers are concerned.

···

On Tue, Dec 16, 2008 at 9:39 AM, John Hunter <jdh2358@…149…> wrote:

On Tue, Dec 16, 2008 at 8:10 AM, Michael Droettboom <mdroe@…31…> wrote:

Another option would be to not generate the high-res png and pdf

examples. (Either or both). Just removing them after the fact won’t

work, since one would end up with broken links from the docs. We would

also have to change the html to not include those links. It should be

fairly simple to provide an option to the doc build system to do this,

and I’m happy to implement that if we all agree that’s the direction to

take.

This is fine with me. Just add an rc option

doc.minimal_footprint

or something like that which drops the high res and pdf. This will

prbably reduce the size 50-60%

I have also removed the mpl_data symlink in my doc tree, and am still

testing before I commit, because that is causing our binaries to get

much larger on platforms which do not properly support linking.

Instead, we’ll refer explicitly to …/lib/matplotlib/mpl-data in the

docs. I am leaving the mpl_examples symlink because of all the

relative path woes in pyplot.

Other than simple optimizations like this, I am discinclined to try

and build smaller manuals simply to reduce their size. The feature

that caused the explosion in size between 98.3 and 98.5 is the

gallery, and image enhanced examples, which is arguably the most

useful feature on the site.

One question to ask is whether we need both the .pdf and .html manuals, or
whether that could be broken up into two packages. That seems like an easy
one if that fits Debian policies (which I know nothing about) -- and
wouldn't degrade the documentation experience at all.

Well, if you're talking about Matplotlib.pdf, the whole manual in pdf
format, then maybe it doesn't worth: we already have a separate
package for documentation, so I think that all doc should be there
(with an adequate size :slight_smile: ).

Is there any way to transparently gzip compress the html files (as I believe
Debian does with manpages)?

sadly, the compression of manpage is handled directly by "man"
executable, that uncompress and pipe the manpage into a PAGER. html
pages are rendered by web browser, and in case of local documentation,
the pages are read from local disk (so so web server to
"uncompress"-on-the-fly).

Another option would be to not generate the high-res png and pdf examples.
(Either or both).

that would be great. while pdf images are not that useful (at least
viewer to look at an image) hires could be "dropped" because even
"normal-res" images seems to give a clear idea of the power of mpl.

Just removing them after the fact won't work, since one
would end up with broken links from the docs. We would also have to change
the html to not include those links. It should be fairly simple to provide
an option to the doc build system to do this, and I'm happy to implement
that if we all agree that's the direction to take.

FWIW, I really like to see such an option :slight_smile:

Beyond that, we'd be looking at selectively including certain examples,
which I worry would add additional maintenance burden if there's too much
divergence between the "full" and "small" manuals. I think that road should
be a last resort.

If the size it's reasonable, then I would like to generate all the
examples, because... well, I like them! :slight_smile:

Cheers,

···

On Tue, Dec 16, 2008 at 15:10, Michael Droettboom <mdroe@...31...> wrote:
from my pov: I'm browsing web pages, I don't want to spawn a pdf
--
Sandro Tosi (aka morph, Morpheus, matrixhasu)
My website: http://matrixhasu.altervista.org/
Me at Debian: http://wiki.debian.org/SandroTosi

Another option would be to not generate the high-res png and pdf
examples. (Either or both). Just removing them after the fact won't
work, since one would end up with broken links from the docs. We would
also have to change the html to not include those links. It should be
fairly simple to provide an option to the doc build system to do this,
and I'm happy to implement that if we all agree that's the direction to
take.

This is fine with me. Just add an rc option

doc.minimal_footprint

or something like that which drops the high res and pdf. This will
prbably reduce the size 50-60%

yes yes please :slight_smile: It would be really great to have it

full pylab_examples: 53M
pylab_examples no hires: 26M
pylab_examples no hires and no pdf: 12M

completely another story! :wink:

I have also removed the mpl_data symlink in my doc tree, and am still
testing before I commit, because that is causing our binaries to get
much larger on platforms which do not properly support linking.
Instead, we'll refer explicitly to ../lib/matplotlib/mpl-data in the
docs. I am leaving the mpl_examples symlink because of all the
relative path woes in pyplot.

Another patch I can remove as soon as another release is done :slight_smile:

Other than simple optimizations like this, I am discinclined to try
and build smaller manuals simply to reduce their size. The feature

Exactly the same goal I have in mind: I don't want to reduce the
information, but only those parts that adds "little" to the end users
while add a lot of space (if you get what I mean).

that caused the explosion in size between 98.3 and 98.5 is the
gallery, and image enhanced examples, which is arguably the most
useful feature on the site.

And even as a local reference!! I had once to display dates and I was
sure I've seen an example that did it, but I didn't remember which
one, so I run all of them to find it. Now I would have to look up a
nice page full of images: much better!

Cheers,

···

On Tue, Dec 16, 2008 at 15:39, John Hunter <jdh2358@...149...> wrote:

On Tue, Dec 16, 2008 at 8:10 AM, Michael Droettboom <mdroe@...31...> wrote:

--
Sandro Tosi (aka morph, Morpheus, matrixhasu)
My website: http://matrixhasu.altervista.org/
Me at Debian: http://wiki.debian.org/SandroTosi

Hello Darren,

It seems like the documentation should be a separately installable package
as far as package managers are concerned.

We already have separate the doc from other mpl parts, to be precise
we have these pkgs:

python-matplotlib - the real module - 2.3M
python-matplotlib-data - data pkg - 1.1M
python-matplotlib-dbg - debug symbols - 11M
python-matplotlib-doc - documentation - 87M

So reducing -doc package is something I'd like to archive, before
upload the package into Debian archive.

Cheers,

···

On Tue, Dec 16, 2008 at 16:03, Darren Dale <dsdale24@...149...> wrote:
--
Sandro Tosi (aka morph, Morpheus, matrixhasu)
My website: http://matrixhasu.altervista.org/
Me at Debian: http://wiki.debian.org/SandroTosi

Thanks Sandro for working on it. Btw, having some form of the gallery
as a Debian package would be useful, I was missing this recently, when
I was hacking without an internet connection.

Ondrej

···

On Tue, Dec 16, 2008 at 10:57 PM, Sandro Tosi <morph@...12...> wrote:

Hello Darren,

On Tue, Dec 16, 2008 at 16:03, Darren Dale <dsdale24@...149...> wrote:

It seems like the documentation should be a separately installable package
as far as package managers are concerned.

We already have separate the doc from other mpl parts, to be precise
we have these pkgs:

python-matplotlib - the real module - 2.3M
python-matplotlib-data - data pkg - 1.1M
python-matplotlib-dbg - debug symbols - 11M
python-matplotlib-doc - documentation - 87M

So reducing -doc package is something I'd like to archive, before
upload the package into Debian archive.