Bus error related to ft2font on Mac OS X (10.6), gcc-4.2, apparently 0.99 branch related

Hi,

as announced on the devel list here my report on "my" Bus error.

I first noticed the Bus error with a freshly compiled version from
today's git. A ``import matplotlib.figure`` was sufficient to produce
the bus error. ``python2.6 -v`` showed that it appears when
matplotlib.ft2font is imported, dynamically loaded from
matplotlib/ft2font.so.

I don't know much about this C++ stuff (ft2font.cpp), but I did a
bisect. Unfortunately, it seems (to me) that bisect acts on the
timeline, not respecting the branch structure, hence it gets it a bit
wrong, at least not right enough to enable me to find the offending
commit.

``git bisect`` finds "some" first bad commit, but due to the commits
in other branches after the first real bad commit it gets it a bit
wrong. The binary search then skips too far.

Nevertheless, I found that e05c2fa32f0fc31 fails, its parent cb609d5
fails too. The very first ancestor of this tree in 2011 I can find is
05631088 (2011-02-20): That one succeeds. But it has some nonstandard
setupext.py. So my test script for ``git bisect run`` cannot be
applied. Its only child is df25e31309b, with a standard setupext.py:
It succeeds.

git bisect seems to work on the full timeline, so it's useless here.
Manual bisecting (using gitk on cb609d5415e): 9a93a5c4 (2011-02-24)
fails; 2ab8582f (2011-02-21) fails; df25e3130 (2011-02-20, the merge
of 0.98 into 0.99) succeeds (see above). 13894992 (2011-02-20, the
merge of 0.99 into 1.0) fails.

So I conclude the failure is introduced somewhere in the 0.99 branch.

Compiling randomly while searching the history: e38440f2 (2010-08-18) fails.

A git blame _src/ft2font.cpp shows that most lines are due to Michael
Droettboom in 6b643862. Unfortunately this is just "Standardizing
formatting of C/C++ code."

The 1.0.0 release 668a769fb fails.

Finally testing the 6b643862 ("Standardizing formatting [...]",
2010-06-24): fails

The next one with modifications to ft2font.cpp is b5ce84214f2
(2010-06-10): fails, its predecessor 97b98e33c: fails.

Next with modifications to ft2font.cpp is 7c228264e (2010-06-04):
fails, its predecessor e8f143c78: fails.

Next one is 857adaee2 (2010-04-16): fails, its predecessor 5a9d580b81: fails.

There is no other modification to ft2font.cpp apparently in 2010 on this branch.

Btw, python2.6-32 signals "Bus error", while python-64 exits with
"Abort trap". The Python is self-compiled Python 2.6:

Python 2.6.5 (r265:79063, Jul 18 2010, 12:14:53)
[GCC 4.2.1 (Apple Inc. build 5659)] on darwin

I'm building by patching setupext.py to include /usr/local/.

Can anyone maybe provide with a pointer what I should try to sort this
out, aside of updating my freetype2 (what shouldn't count as a
solution, it should just work also with not fully-recent freetype2).
I'm not sure if I'm doing something stupid wrong, but since it
succeeds before the 0.99 branch is merged in I suspect something
non-trivial.

I wonder why I did not notice this before on my machine. Admittedly,
I did not compile in 2011 at all, I think. But in Autumn 2010, I did,
and with success. So I wonder how this error was making its way
around my machine? I remember that the git mirror of the svn was no
longer maintained the time I last worked on matplotlib.

Friedrich

Friedrich, just curious. Is your Git mpl repo a clean clone from github.com/matplotlib and not from astraw’s experimental repo, right? I haven’t had issues with bisect before and so I wonder if somehow you might have rebased astraw’s repo with mpl’s repo, which could have introduced issues?

Just speculating out loud.

Ben Root

···

On Wednesday, November 9, 2011, Friedrich Romstedt <friedrichromstedt@…287…> wrote:

Hi,

as announced on the devel list here my report on “my” Bus error.

I first noticed the Bus error with a freshly compiled version from
today’s git. A import matplotlib.figure was sufficient to produce
the bus error. python2.6 -v showed that it appears when

matplotlib.ft2font is imported, dynamically loaded from
matplotlib/ft2font.so.

I don’t know much about this C++ stuff (ft2font.cpp), but I did a
bisect. Unfortunately, it seems (to me) that bisect acts on the

timeline, not respecting the branch structure, hence it gets it a bit
wrong, at least not right enough to enable me to find the offending
commit.

git bisect finds “some” first bad commit, but due to the commits

in other branches after the first real bad commit it gets it a bit
wrong. The binary search then skips too far.

Nevertheless, I found that e05c2fa32f0fc31 fails, its parent cb609d5
fails too. The very first ancestor of this tree in 2011 I can find is

05631088 (2011-02-20): That one succeeds. But it has some nonstandard
setupext.py. So my test script for git bisect run cannot be
applied. Its only child is df25e31309b, with a standard setupext.py:

It succeeds.

git bisect seems to work on the full timeline, so it’s useless here.
Manual bisecting (using gitk on cb609d5415e): 9a93a5c4 (2011-02-24)
fails; 2ab8582f (2011-02-21) fails; df25e3130 (2011-02-20, the merge

of 0.98 into 0.99) succeeds (see above). 13894992 (2011-02-20, the
merge of 0.99 into 1.0) fails.

So I conclude the failure is introduced somewhere in the 0.99 branch.

Compiling randomly while searching the history: e38440f2 (2010-08-18) fails.

A git blame _src/ft2font.cpp shows that most lines are due to Michael
Droettboom in 6b643862. Unfortunately this is just “Standardizing
formatting of C/C++ code.”

The 1.0.0 release 668a769fb fails.

Finally testing the 6b643862 (“Standardizing formatting […]”,
2010-06-24): fails

The next one with modifications to ft2font.cpp is b5ce84214f2
(2010-06-10): fails, its predecessor 97b98e33c: fails.

Next with modifications to ft2font.cpp is 7c228264e (2010-06-04):
fails, its predecessor e8f143c78: fails.

Next one is 857adaee2 (2010-04-16): fails, its predecessor 5a9d580b81: fails.

There is no other modification to ft2font.cpp apparently in 2010 on this branch.

Btw, python2.6-32 signals “Bus error”, while python-64 exits with
“Abort trap”. The Python is self-compiled Python 2.6:

Python 2.6.5 (r265:79063, Jul 18 2010, 12:14:53)
[GCC 4.2.1 (Apple Inc. build 5659)] on darwin

I’m building by patching setupext.py to include /usr/local/.

Can anyone maybe provide with a pointer what I should try to sort this

out, aside of updating my freetype2 (what shouldn’t count as a
solution, it should just work also with not fully-recent freetype2).
I’m not sure if I’m doing something stupid wrong, but since it

succeeds before the 0.99 branch is merged in I suspect something
non-trivial.

I wonder why I did not notice this before on my machine. Admittedly,
I did not compile in 2011 at all, I think. But in Autumn 2010, I did,

and with success. So I wonder how this error was making its way
around my machine? I remember that the git mirror of the svn was no
longer maintained the time I last worked on matplotlib.

Friedrich

Can you get a traceback from gdb? The following should do it:

     gdb python2.6

at the gdb prompt, type "run", then at the Python prompt, reproduce the error using "import matplotlib.figure". It should crash, then type "bt" to get a traceback. That may illustrate the source of the error.

Also of note, when using bisect -- the distutils build doesn't always rebuild enough if only header files change. I recommend clearing out the build directory before each compile when using bisect to track down a C++-related change.

Mike

···

On 11/09/2011 10:44 PM, Friedrich Romstedt wrote:

Hi,

as announced on the devel list here my report on "my" Bus error.

I first noticed the Bus error with a freshly compiled version from
today's git. A ``import matplotlib.figure`` was sufficient to produce
the bus error. ``python2.6 -v`` showed that it appears when
matplotlib.ft2font is imported, dynamically loaded from
matplotlib/ft2font.so.

I don't know much about this C++ stuff (ft2font.cpp), but I did a
bisect. Unfortunately, it seems (to me) that bisect acts on the
timeline, not respecting the branch structure, hence it gets it a bit
wrong, at least not right enough to enable me to find the offending
commit.

``git bisect`` finds "some" first bad commit, but due to the commits
in other branches after the first real bad commit it gets it a bit
wrong. The binary search then skips too far.

Nevertheless, I found that e05c2fa32f0fc31 fails, its parent cb609d5
fails too. The very first ancestor of this tree in 2011 I can find is
05631088 (2011-02-20): That one succeeds. But it has some nonstandard
setupext.py. So my test script for ``git bisect run`` cannot be
applied. Its only child is df25e31309b, with a standard setupext.py:
It succeeds.

git bisect seems to work on the full timeline, so it's useless here.
Manual bisecting (using gitk on cb609d5415e): 9a93a5c4 (2011-02-24)
fails; 2ab8582f (2011-02-21) fails; df25e3130 (2011-02-20, the merge
of 0.98 into 0.99) succeeds (see above). 13894992 (2011-02-20, the
merge of 0.99 into 1.0) fails.

So I conclude the failure is introduced somewhere in the 0.99 branch.

Compiling randomly while searching the history: e38440f2 (2010-08-18) fails.

A git blame _src/ft2font.cpp shows that most lines are due to Michael
Droettboom in 6b643862. Unfortunately this is just "Standardizing
formatting of C/C++ code."

The 1.0.0 release 668a769fb fails.

Finally testing the 6b643862 ("Standardizing formatting [...]",
2010-06-24): fails

The next one with modifications to ft2font.cpp is b5ce84214f2
(2010-06-10): fails, its predecessor 97b98e33c: fails.

Next with modifications to ft2font.cpp is 7c228264e (2010-06-04):
fails, its predecessor e8f143c78: fails.

Next one is 857adaee2 (2010-04-16): fails, its predecessor 5a9d580b81: fails.

There is no other modification to ft2font.cpp apparently in 2010 on this branch.

Btw, python2.6-32 signals "Bus error", while python-64 exits with
"Abort trap". The Python is self-compiled Python 2.6:

Python 2.6.5 (r265:79063, Jul 18 2010, 12:14:53)
[GCC 4.2.1 (Apple Inc. build 5659)] on darwin

I'm building by patching setupext.py to include /usr/local/.

Can anyone maybe provide with a pointer what I should try to sort this
out, aside of updating my freetype2 (what shouldn't count as a
solution, it should just work also with not fully-recent freetype2).
I'm not sure if I'm doing something stupid wrong, but since it
succeeds before the 0.99 branch is merged in I suspect something
non-trivial.

I wonder why I did not notice this before on my machine. Admittedly,
I did not compile in 2011 at all, I think. But in Autumn 2010, I did,
and with success. So I wonder how this error was making its way
around my machine? I remember that the git mirror of the svn was no
longer maintained the time I last worked on matplotlib.

Friedrich

------------------------------------------------------------------------------
RSA(R) Conference 2012
Save $700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1
_______________________________________________
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
matplotlib-users List Signup and Options

2011/11/10 Benjamin Root <ben.root@...1304...>:

``git bisect`` finds "some" first bad commit, but due to the commits
in other branches after the first real bad commit it gets it a bit
wrong. The binary search then skips too far.

Friedrich, just curious. Is your Git mpl repo a clean clone from
github.com/matplotlib and *not* from astraw's experimental repo, right? I
haven't had issues with bisect before and so I wonder if somehow you might
have rebased astraw's repo with mpl's repo, which could have introduced
issues?

No issues like that, clean clone (although I forked it and then cloned that).

For the bisect, without further reading it'll be speculation. I
guess, bisecting on the basis of branches is difficult, just imagine
you have merged in some branch. Since you can specify only one "good"
commit as starting point, if the merge occured later, the whole other
branch would have to be considered for bisecting. I guess that's not
what bisect does.

The machanism, as I imagine it, to make bisect not work, is like this:
The good commit is on branch A, bad commits are on branch B, and they
are intermangled in the time line. So bisect might just hit always,
up to some point, the good commits, concluding that everything between
them is good too, what is wrong (because only those from A are good,
the B ones not).

Furthermore, Michael is right, while bisecting I didn't ``rm build/``
properly; I just did ``python2.6 setup.py clean``. Later on I did
that properly, after I noticed that the offending commit reported by
bisect actually runs cleanly. I then wrote a test script for ``git
bisect run`` that applies all those steps, so I couldn't keep
forgetting it any longer :slight_smile:

Friedrich

···

On Wednesday, November 9, 2011, Friedrich Romstedt > <friedrichromstedt@...287...> wrote:

2011/11/10 Michael Droettboom <mdroe@...86...>:

Can you get a traceback from gdb? The following should do it:

gdb python2\.6

For some reason I cannot load python2.6 from gdb:

This GDB was configured as "x86_64-apple-darwin"...Reading symbols for
shared libraries .. done

(gdb) run
Starting program:
/Library/Frameworks/Python.framework/Versions/2.6/bin/python2.6
Reading symbols for shared libraries +. done

Program received signal SIGTRAP, Trace/breakpoint trap.
0x8fe01030 in __dyld__dyld_start ()
(gdb) The program is running. Exit anyway? (y or n) y

This is the same as superuser.

I instead fell "back" to dtrace (``dapptrace -b 100m -U -p <pid>``,
which requires superuser privilegue, hence my note above). dtrace is
a pretty decent tool to trace function calls etc. on the kernel level,
and ships with OS X 10.6. I send the gzip'ed output attached to an
off-list mail, the full log is 4.8 MB even gzip'ed. I found it by
first inspection useful to grep for ft2font.so and libfreetype. Make
your own conclusions. From what my naked eye can see, freetype itself
seems to be not the problem. The last thing freetype appears to do is
to return from FT_GetSfnt_Name.

Friedrich

Running bisect in this way, did you arrive at a more conclusive determination about which commit may have introduced the problem?

Mike

···

On 11/10/2011 05:16 PM, Friedrich Romstedt wrote:

Furthermore, Michael is right, while bisecting I didn't ``rm build/`` properly; I just did ``python2.6 setup.py clean``. Later on I did that properly, after I noticed that the offending commit reported by bisect actually runs cleanly. I then wrote a test script for ``git bisect run`` that applies all those steps, so I couldn't keep forgetting it any longer :slight_smile: Friedrich

2011/11/11 Michael Droettboom <mdroe@...86...>:

Furthermore, Michael is right, while bisecting I didn't ``rm build/``
properly; I just did ``python2.6 setup.py clean``. Later on I did that
properly, after I noticed that the offending commit reported by bisect
actually runs cleanly. I then wrote a test script for ``git bisect
run`` that applies all those steps, so I couldn't keep forgetting it
any longer :slight_smile: Friedrich

Running bisect in this way, did you arrive at a more conclusive
determination about which commit may have introduced the problem?

No, I didn't, but I found it manually (kind-of), while trying to find
anchor points for git bisect:

If you use gitk on cb609d5415e, and scroll down to "Merge branch
'v0.99.x' into v1.0.x" (13894992d8), you'll see a couple of merges.
Here, up to the merge into v1.0.x, things work. In the v1.0.x branch,
everything down to the beginning of 2010 [sic] what I tested failed,
including the 1.0.0 release 668a769fb.

I was wrong in my conclusion in my first mail that it's the v0.99.x
branch, which introduces the bug, it's apparently the v1.0.x branch.

I was planning to check some early commit after some merges in 2009 on
the v1.0.x branch, after 1982fba643, and the first commit in 2010 on
the v1.0.x branch, bbcb85a663bbb. If one is good and one is bad I'd
have let it run bisect overnight.

1982fba643 (the first unmerged, see above) is not properly updated for
new libpng. The first out of 10/2009 does not work either, for same
reason. The first out of 11/2009 does not work too. The first of
12/2009 also not. The first in 01/2010 fails to compile too. First
of 02/2010: fails compiling. First of 03/2010: compiles, and fails on
the import level with Bus error.

So I'm screwed for today. I have to dig out my patch for that libpng
issue and incorporate it into the test script.

So far the bug arised < 03/2010. sic. sigh.

Friedrich

···

On 11/10/2011 05:16 PM, Friedrich Romstedt wrote:

2011/11/11 Michael Droettboom <mdroe@...86...>:

Running bisect in this way, did you arrive at a more conclusive
determination about which commit may have introduced the problem?

Yes, do you know Final Fantasy? "You gonna loose it ... Tracking ...
Tracking ... Found it." af9954d46e.

I don't know which part of that commit breaks it, maybe you can have a
look? It's a commit by you. Maybe it's just the evil font cache. :slight_smile:

It's not the ft2font, notably, this was apparently imported properly;
it's just some initialisation code of matplotlib that seems to fail
while importing matplotlib.figure.

I verified clearly; the commit mentioned fails, and its predecessor succeeds.

I did patch the _png.cpp to make it work; it didn't comply with
libpng-1.4 that time.

I can upload the branches for testing the two commits to my repo.

So far,
Friedrich

Very odd. Given there's no C++ changes here, I'm very surprised. Shooting in the dark here: does deleting ~/.matplotlib/fontList.cache help at all?

Mike

···

On 11/11/2011 05:34 PM, Friedrich Romstedt wrote:

2011/11/11 Michael Droettboom<mdroe@...86...>:

Running bisect in this way, did you arrive at a more conclusive
determination about which commit may have introduced the problem?

Yes, do you know Final Fantasy? "You gonna loose it ... Tracking ...
Tracking ... Found it." af9954d46e.

I don't know which part of that commit breaks it, maybe you can have a
look? It's a commit by you. Maybe it's just the evil font cache. :slight_smile:

It's not the ft2font, notably, this was apparently imported properly;
it's just some initialisation code of matplotlib that seems to fail
while importing matplotlib.figure.

I verified clearly; the commit mentioned fails, and its predecessor succeeds.

I did patch the _png.cpp to make it work; it didn't comply with
libpng-1.4 that time.

I can upload the branches for testing the two commits to my repo.

So far,
Friedrich

2011/11/11 Michael Droettboom <mdroe@...86...>:

Very odd. Given there's no C++ changes here, I'm very surprised. Shooting
in the dark here: does deleting ~/.matplotlib/fontList.cache help at all?

Nope :frowning:

I'm pretty much surprised too. I wonder why noone else has this issue?

I replaced the font_manager.py with that of the good commit and it still fails.

I reverted than back to the bad font_manager.py, and replaced the
mathtext.py with that of the good one. And it fails ....

I then replaced both font_manager.py as well as mathtext.py with the
good ones, and it .... still fails!

I could not believe this and checked out the good commit once more,
and this one .... fails now too ...

I have no idea what's going on here. To me this looks like black
magic. I didn't confuse the commits, I have logs where the good
commit succeeds. Are there any other caches involved in matplotlib?
I don't know of any.

I cleaned the build/ directory properly. A also nuked the
site-packages/matplotlib in between.

The build+run logs of the "good" commit before and after the "magic
trigger" are diff'ed exactly the same with the exception that in one
python2.6 -v continues with lines.py and in the other not. :frowning:

Friedrich

To give the valuable information in the beginning: It appears it
cannot handle /Library/Fonts/NISC18030.ttf. It tries to load it via
ft2font.FT2Font() but that gives the Bus error. The ttf file dates to
28 Jan 2010. It is 7108232 bytes large. I don't know why it cannot
be loaded.

Until it had to recreate the fontcache, it never tried to load that.
It appears to me I used matplotlib since before that file appeared, or
at least matplotlib never tried to load it, or succeeded before in
loading it. The "first bad" commit mentioned in the last email(s) was
that one introducing a mechanism to throw the fontcache away if the
matplotlib version number does not match the version number stored in
the fontcache.

2011/11/12 Friedrich Romstedt <friedrichromstedt@...287...>:

2011/11/11 Michael Droettboom <mdroe@...86...>:

Very odd. Given there's no C++ changes here, I'm very surprised. Shooting
in the dark here: does deleting ~/.matplotlib/fontList.cache help at all?

I guess it might have to do with it: Removing the font cache might
have made the "good" commit 8c200dab4680efd5201 fail. Or rather
keeping the old font cache might have made the "good" commit not fail
in the beginning. Whatever the causal relation is, I will try to
investigate playing with the existence of the font cache.

I want to verify that the existence of the fontcache file influences
the test result.

-== Trying to verify the influcence of the fontcache file ==-

I could not believe this and checked out the good commit once more,
and this one .... fails now too ...

Verifying that there's no further magic, after a clean reboot (you
never know, and I went asleep), I'm trying both commits again, without
the font cache in action:

"good" 8c200dab4680efd5201: Bus error.
"bad" af9954d46e5d: Bus error.

So everything like yesterday evening, without the font cache.

Putting the font cache back into action now (from the moved file).
Keeping the moved file for reference (i.e., copying it).

"good" 8c200dab4680efd5201: Succeeding.
"bad" af9954d46e5d: Bus error.

So the existence of the font cache file makes the apparently "good"
commit succeeding, althought it probably shouldn't succeed. It is a
pity that it's not vice versa: That the existence of the font cache
file would make the "bad" commit fail, s.t. it (and the current
matplotlib) would succeed without it.

-== Bisecting again, this time without font cache file ==-

Removing the font cache file again (keeping the copy).

-= Trying to find some good commit in the past =-

Trying 1982fba643 (one from 2009): Bus error. This commit's test run
differs from the previous Bus errors by the following additional lines
from python2.6 -v:

# /Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/commands.pyc
matches /Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/commands.py
import commands # precompiled from
/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/commands.pyc

This happens directly after importing ft2font.so.

Nevertheless it fails, so going further back to past ...

Trying the v0.99.0rc1 ac387d18b: Bus error.
This cannot be! It is a commit back from 2009, and I used matplotlib that time!

Still keeping trying to find a commit not exhibiting the Bus error ...

-== The first commit from 2009 ==-

Trying the first from 2009 1dcaee87fc: Bus error.

I grep'ed for "import .*commands" in the lib/ folder, and found that
it is only used by font_manager.py, and in that file on non-win32
platforms only the body of the get_fontconfig_fonts imports it. I
will augment that function body with prints to track it down. It
appears it is the fontmanager that crashes.

Probably it issues some commands that makes Python crash, and when the
fontcache existed and was not versionchecked (before the former "bad"
commit), it was simply loaded. Now it tried to rebuild it and fails
in that. It was running the time I used it because I also had a
working font cache that time maybe.

For some reason the log output did not appear in the test. Running
the test manually shows the log output. What? Apparently it is
missing because of some buffering issue. If I pass the output through
a pipe, like when logging, apparently Python switches buffering.

It might well be that the buffering truncates the whole -v output.

Apparently the -v output goes to sys.stderr, and the sys.stdout is
buffered when piping.

Patching sys.stderr: ``python2.6 -v -c "import sys; sys.stderr =
open('x.txt', 'w'); import matplotlib.figure"``. The -v output up to
the Python 2.6.5 statement goes to stdout, after that it goes to
stderr apparently. The output to x.txt is truncated in the middle of
a sentence:

# /Library/Frameworks/Python.framework/Versions/2.6/lib/python

and does not contain the ``commands`` import log. Turning buffering off:

python2.6 -v -c "import sys; sys.stderr = open('x.txt', 'w', 0);
import matplotlib.figure"

it ends again at the ``commands`` import.

It is noteworthy that the function augmented by the logging statements
exits cleanly. The function is called in whole matplotlib only once,
in font_manager.py, in findSystemFonts(). There find SystemFonts()
loops over the return value, which is {}, so no looping at all.

The crash appears in FontManager.__init__() somewhere between loading
the ttffiles and loading the afmfiles.

The crash appears in createFontList().

Setting matplotlib's ``matplotlib.verbose`` to level 'debug_annoying'
via the cmdline script yields that the bus occures after the following
last log message:

createFontDict: /Library/Fonts/NISC18030.ttf

Augmenting the createFontList() function by print statements yields this:

createFontDict: /Library/Fonts/Arial Narrow.ttf
Friedrich: ft2font.FT2Font(/Library/Fonts/Arial Narrow.ttf) ...
Friedrich: ft2font.FT2Font(/Library/Fonts/Arial Narrow.ttf) succeeded.
createFontDict: /Library/Fonts/NISC18030.ttf
Friedrich: ft2font.FT2Font(/Library/Fonts/NISC18030.ttf) ...
./runtest.sh: line 1: 7150 Bus error python2.6 -v -u -c
"import matplotlib; matplotlib.verbose.set_level('debug-annoying');
import matplotlib.figure" 2>&1

So it appears it cannot handle /Library/Fonts/NISC18030.ttf.

Any ideas?

Friedrich

2011/11/12 Friedrich Romstedt <friedrichromstedt@...287...>:

To give the valuable information in the beginning: It appears it
cannot handle /Library/Fonts/NISC18030.ttf. It tries to load it via
ft2font.FT2Font() but that gives the Bus error. The ttf file dates to
28 Jan 2010. It is 7108232 bytes large. I don't know why it cannot
be loaded.

A quick googling of "NISC18030.ttf matplotlib" yields this interesting
result: http://groups.google.com/group/sage-devel/browse_thread/thread/2c538915abc99946

From there:

"[...] and hope for John Hunter to be able
to replicate the problem and come up with something better in the next
few weeks (or I'll come back to it later)."

So I think we have at least replicated it.

What troubles me is that I was a 10.6 user from the beginning since,
say, mid 2010. So my initial working fontcache was built that time.
On 10.6. How do I analyse if the respective TTF is in the font cache
(file)?

OK, I loaded the font manager from the pickle just via matplotlib from
2009; and /Library/Fonts/NISC18030.ttf is not amongst
``matplotlib.font_manager.fontManager.ttffiles``.

I didn't know until today that the CXX appearing in the build process
actually refers to a package and is not just an alteration of C++ to
make it more shell-friendly, as I believed until now.

Maybe someone with some insights in CXX can help? I see that I can do
that too, but it'll take probably much longer than when you, dear
recipient, do it.

From the post referenced above it *seems* that it might have to do

something with creating a Python Int from NULL? But since my
knowledgability is low on CXX, as mentioned, I would not give my word
for this.

The explanation why it didn't try to index that file follows:

$ stat -f "...." /Library/Fonts/NISC18030.ttf
Last accessed or modified: 1321107464 = 12 Nov 2011
Last changed: 1264652963 = 28 Jan 2010
Time of Birth: 1292365840 = 14 Dec 2010

There you go. I guess some Mac OS X 10.6 update (probably a combo
update) installed it. I will not go into details here of checking the
pax files or something, I just think we see that it was born on my Mac
later than I started using matplotlib. I never deleted the fontcache
while using matplotlib, but I vaguely remember that I had a problem
with another user than "me".

I remember also some post on a failing matplotlib on OS X 10.6; which
we were not able to solve, but I'll look into that now.

Friedrich

2011/11/12 Friedrich Romstedt <friedrichromstedt@...287...>:

2011/11/12 Friedrich Romstedt <friedrichromstedt@...287...>:

To give the valuable information in the beginning: It appears it
cannot handle /Library/Fonts/NISC18030.ttf. It tries to load it via
ft2font.FT2Font() but that gives the Bus error. The ttf file dates to
28 Jan 2010. It is 7108232 bytes large. I don't know why it cannot
be loaded.

A quick googling of "NISC18030.ttf matplotlib" yields this interesting
result: http://groups.google.com/group/sage-devel/browse_thread/thread/2c538915abc99946

And this: http://trac.sagemath.org/sage_trac/ticket/7022. Actually I
got the above from that.

From there (username is "was", probably William Stein): "All it does

is take the plane vanilla matplotlib-0.99.1.spkg spkg and add a little
script that simply rebuilds f2font.so again using *exactly* the same
command lines used by distutils to build that extension. That's it.
For some reason -- probably involving environment variables (?) --
this fixes the problem. I consider this a temporary 1-sage release
solution until the matplotlib developers (or me) come up with a real
fix."

I downloaded that script, and reproduced the functionality with my
framework Python 2.6. The resulting ft2font.so differs binarily from
the original ft2font, and ... indeed it runs smoothly with that
ft2font.so.

What the hell is happening here? Is it really CXX related? What
environmental variables are set by the mighty distutils?

Feel free to start a new thread on -devel as soon as you have some
solution or idea :slight_smile:

Friedrich

2011/11/12 Friedrich Romstedt <friedrichromstedt@...287...>:

2011/11/12 Friedrich Romstedt <friedrichromstedt@...287...>:

A quick googling of "NISC18030.ttf matplotlib" yields this interesting
result: http://groups.google.com/group/sage-devel/browse_thread/thread/2c538915abc99946

And this: http://trac.sagemath.org/sage_trac/ticket/7022. Actually I
got the above from that.

From there (username is "was", probably William Stein): "All it does
is take the plane vanilla matplotlib-0.99.1.spkg spkg and add a little
script that simply rebuilds f2font.so again using *exactly* the same
command lines used by distutils to build that extension. That's it.
For some reason -- probably involving environment variables (?) --
this fixes the problem. I consider this a temporary 1-sage release
solution until the matplotlib developers (or me) come up with a real
fix."

I downloaded that script, and reproduced the functionality with my
framework Python 2.6. The resulting ft2font.so differs binarily from
the original ft2font, and ... indeed it runs smoothly with that
ft2font.so.

What the hell is happening here? Is it really CXX related? What
environmental variables are set by the mighty distutils?

I made up a patched gcc-4.2 bash script that puts the ``env`` output
together with the command to run in logfiles based on timecode.

The result is that the only differences in environmental variables are:

1) PLAT=macosx-10.5-intel
2) MACOSX_DEPLOYMENT_TARGET=10.5

I don't know anything about (1). (2) was set at compile time and is
correct, but I will check if it affects the thing.

I will try to reproduce the original ft2font.so generated by distutils
by manual commands, to see what ingredience makes it fail in the end.

PLAT has apparently no effect on the byte file size at least.
MACOSX_DEPLOYMENT_TARGET makes the file size increase to about the
size of the original file.

The files do not match binary, I guess there's a time stamp somewhere
and a compression involved. Even when the file sizes matched by byte,
the contents still differ binary. I will focus on whether
MACOSX_DEPLOYMENT_TARGET breaks it or not.

Unsetting MACOSX_DEPLOYMENT_TARGET and using hence the default
``10.6`` makes it work.

Recompiling manually with MACOSX_DEPLOYMENT_TARGET=10.5 and removing
the fontcache generated by the last run makes it fail.

So to me this looks pretty much like a gcc-4.2 bug.

MACOSX_DEPLOYMENT_TARGET has nothing todo with the source code. It
*should* just add a legacy layer. What it apparently does is to
compile for 10.5 instead, and maybe add a legacy layer for 10.6? Just
speculating.

So I think we found it, but we cannot solve it apparently.

Only thing is to build libraries for 10.6 with the python.org OS X
10.6-only version, so that we can set the deployment target to 10.6
when building the library (matplotlib).

I'm cc'ing the sage people manually since I'm not on sage-devel and
don't need it at all. William, Ondrej, FYI.

So far,
Friedrich

This is my summary of what I found out.

2011/11/12 Friedrich Romstedt <friedrichromstedt@...287...>:

So to me this looks pretty much like a gcc-4.2 bug.

MACOSX_DEPLOYMENT_TARGET has nothing todo with the source code. It
*should* just add a legacy layer. What it apparently does is to
compile for 10.5 instead, and maybe add a legacy layer for 10.6? Just
speculating.

So I think we found it, but we cannot solve it apparently.

Only thing is to build libraries for 10.6 with the python.org OS X
10.6-only version, so that we can set the deployment target to 10.6
when building the library (matplotlib).

Hi Mike,

I think it might be that there now, in 2011, with OS X 10.6, there is
no "good" commit anymore. The mechanism was as follows, for those
commits which were apparently "good": The fontcache was loaded without
any change. For the "bad" commits, it was attempted to be recreated,
but this lead to Bus error, and hence it was not written. When the
fontcache is missing, all commits that incorporate the source code
leading to reading that ttf file fail. They didn't fail until the
deployment target bug was introduced into gcc-4.2. They also didn't
fail until there was a ttf file present that triggers probably a
special code route.

It probably might even work with gcc-4.0? I consider that the source
code offending to the bug is in matplotlib from the beginning, as it
isn't apparently a programming mistake. But still it might be that
you find something that triggers it and that can be solved.

If it would not appear with gcc-4.0 that would explain why we have so
little amount of reports on that issue. It seems when using the
python.org Python, which is, probably with the exception of 10.6-only
Python, compiled with gcc-4.0, suffices to circumvent the bug. I'm
not interested in using gcc-4.0, since I compiled libpng, libjpg,
libtiff, libfreetype etc.pp. using gcc-4.2. I, for my own purpose,
will probably recompile only Python without the 10.5 target. This
will sort it out. But I don't know if that is a solution for
packagers always.

I think the offending binary instruction is either in ft2font.so or in
libfreetype.dylib. In the former case, it might result from
ft2font.cpp or from the CXX stuff I didn't understand. In the latter
case, upgrading libfreetype might help, but not likely, since to let
the error propagate to there it must depend on the deployment target
variable used to compile ft2font.so (since the whole Bus error depends
on that). So it is not proabable that the offending instruction is in
libfreetype.dylib.

Friedrich

2011/11/12 Friedrich Romstedt <friedrichromstedt@...287...>:

This is my summary of what I found out.

Some small follow-up regarding what might trigger the bug:

http://comments.gmane.org/gmane.comp.python.matplotlib.general/1115 is
a report by Chris Barker indicating as a side-effect that
NISC18030.ttf was present even in 2005. It "could not be loaded" that
time. I.e. it didn't cause a Bus error. That it was attempted to be
loaded indicates that the fontcache was to be rebuilt that time, so
the file must be present.

Google Code Archive - Long-term storage for Google Code Project Hosting. indicates that, in
2008, on a 10.5 OS X the file could "not be loaded" too. Again, just
the attempt implies that the fontcache was rebuilt. So the file must
be present, except if the font_manager.py logic of early 2009 is the
result of a dramatic change since then.

It appears very probable that the Bus error is not triggered on 10.5,
but only on 10.6, when building with MACOSX_DEPLOYMENT_TARGET=10.5.
It remains unclear starting from which patch version of 10.6 it
appears, and also if it is a gcc-4.2 only issue. In the case it is
gcc-4.2 related, it would explain the rarity, because gcc-4.2 was
introduced in 10.6, so who would build with 10.5 deployment target?
If 10.5 is targeted, you mostly need to use gcc-4.0 anyway. (This is
something I overlooked myself for my own decision until now.)

Friedrich

2011/11/12 Friedrich Romstedt <friedrichromstedt@...287...>:

$ stat -f "...." /Library/Fonts/NISC18030.ttf
Last accessed or modified: 1321107464 = 12 Nov 2011
Last changed: 1264652963 = 28 Jan 2010
Time of Birth: 1292365840 = 14 Dec 2010

The file might have been created earlier; the date 14 Dec 2010 is the
day where I reinstalled my Mac after a HDD crash from backup.

I have checked if I have backups older than that on one of the Time
Machine disks but that is negative. But since Time Machine uses
hardlinks to link the files between different backups the file backed
up in the oldest backup from 27 Dec 2010 might have still the date of
birth we're looking for. Assumed it didn't issue a completely new
backup after restoring from the old one.

I'm interested in this because I wonder how I ever got a working fontcache.

It might be that I compiled matplotlib first differently, with
python.org Python, hence gcc-4.0, and if we assume that it works under
gcc-4.0, I would have ended up with a proper fontcache, and was free
to compile with gcc-4.2 + 10.5 deployment target. Then the fontcache
lived on all that years since Mid 2009 untouched. Until now, where it
attempted to recreate it, with the gcc-4.2 + 10.5 targeted matplotlib,
failing on that.

I guess that the NISC18030.ttf in the backup has the date of birth of
the first backup ever, meaning that it was probably present from the
very beginning. This is suggested by the posts back to 2005, where
the file existed on that ``bsd`` machine of William Stein, iirc. I
strongly believe I just got a working intermediate matplotlib, which
created the everlasting (or not) fontcache.

Friedrich

Thanks for all the time you've devoted to this. It does look like possibly some kind of compiler bug. The font loads and renders fine on Linux, for what it's worth (just as a data point).

To confirm this theory: if you move NISC1803.ttf somewhere temporary, delete ~/.matplotlibrc/fontList.cache and then import matplotlib, do you get the crash? That at least confirms that loading this font file triggers the bug (wherever the bug may be). Test with matplotlib 1.1.0 or git master so we have a sense of the current behavior.

Mike

···

On 11/13/2011 06:05 PM, Friedrich Romstedt wrote:

2011/11/12 Friedrich Romstedt<friedrichromstedt@...287...>:

$ stat -f "...." /Library/Fonts/NISC18030.ttf
Last accessed or modified: 1321107464 = 12 Nov 2011
Last changed: 1264652963 = 28 Jan 2010
Time of Birth: 1292365840 = 14 Dec 2010

The file might have been created earlier; the date 14 Dec 2010 is the
day where I reinstalled my Mac after a HDD crash from backup.

I have checked if I have backups older than that on one of the Time
Machine disks but that is negative. But since Time Machine uses
hardlinks to link the files between different backups the file backed
up in the oldest backup from 27 Dec 2010 might have still the date of
birth we're looking for. Assumed it didn't issue a completely new
backup after restoring from the old one.

I'm interested in this because I wonder how I ever got a working fontcache.

It might be that I compiled matplotlib first differently, with
python.org Python, hence gcc-4.0, and if we assume that it works under
gcc-4.0, I would have ended up with a proper fontcache, and was free
to compile with gcc-4.2 + 10.5 deployment target. Then the fontcache
lived on all that years since Mid 2009 untouched. Until now, where it
attempted to recreate it, with the gcc-4.2 + 10.5 targeted matplotlib,
failing on that.

I guess that the NISC18030.ttf in the backup has the date of birth of
the first backup ever, meaning that it was probably present from the
very beginning. This is suggested by the posts back to 2005, where
the file existed on that ``bsd`` machine of William Stein, iirc. I
strongly believe I just got a working intermediate matplotlib, which
created the everlasting (or not) fontcache.

2011/11/14 Michael Droettboom <mdroe@...86...>:

Thanks for all the time you've devoted to this. It does look like possibly
some kind of compiler bug. The font loads and renders fine on Linux, for
what it's worth (just as a data point).

To confirm this theory: if you move NISC1803.ttf somewhere temporary, delete
~/.matplotlibrc/fontList.cache and then import matplotlib, do you get the
crash? That at least confirms that loading this font file triggers the bug
(wherever the bug may be). Test with matplotlib 1.1.0 or git master so we
have a sense of the current behavior.

Hi Mike,

the following fonts on my system are offending:

/Library/Fonts/NISC18030.ttf
/Library/Fonts/AppleMyungjo.ttf
/Library/Fonts/Gungseouche.ttf

With these fonts made unfindable by matplotlib (:file:`*.ttf_`) it
exits cleanly.

I will provide with a patch to matplotlib for an rc setting
"fonts.bus-error : ...", e.g. ``fonts.bus-error : NISC18030.ttf,
AppleMyungjo.ttf, Gungseouche.ttf`` in the next days.

It was clear from the beginning (well, from the point I got a handle
on it), that loading the font makes the 2009 matplotlib crash. The
only question unanswered is where the codepath is that triggers this
compiler bug (I think the compiler but hypothesis is not disproven and
works well atm). If the code path is in ft2font.cpp, we could (you
could) reformulate ft2font.cpp in an equivalent way with the exception
that it is not equivalent in crashing. You might want to augment
ft2font.cpp by printf() or something to see if the crash appears
inside a call to libfreetype or if all those calls return cleanly.

To my understanding, since recompiling ft2font.so without
MACOSX_DEPLOYMENT_TARGET different from 10.6 helps, ft2font.cpp should
be the culprit resp. victim. The only alternative I'm seeing would be
that it has to to do with the load mechanism of the dylib, but I deem
this rather unlikely. Well, unlikely is not the best word in this
context, since all this things here were pretty unlikely.

If the codepath is in libfreetype this would be an issue for their list. ...

Friedrich

Hi,

2011/11/14 Michael Droettboom <mdroe@...86...>:

Thanks for all the time you've devoted to this. It does look like possibly
some kind of compiler bug. The font loads and renders fine on Linux, for
what it's worth (just as a data point).

To confirm this theory: if you move NISC1803.ttf somewhere temporary, delete
~/.matplotlibrc/fontList.cache and then import matplotlib, do you get the
crash? That at least confirms that loading this font file triggers the bug
(wherever the bug may be). Test with matplotlib 1.1.0 or git master so we
have a sense of the current behavior.

Hi Mike,

the following fonts on my system are offending:

/Library/Fonts/NISC18030.ttf
/Library/Fonts/AppleMyungjo.ttf
/Library/Fonts/Gungseouche.ttf

With these fonts made unfindable by matplotlib (:file:`*.ttf_`) it
exits cleanly.

I will provide with a patch to matplotlib for an rc setting
"fonts.bus-error : ...", e.g. ``fonts.bus-error : NISC18030.ttf,
AppleMyungjo.ttf, Gungseouche.ttf`` in the next days.

I just took the time to recompile the whole thingy, including
supporting libraries. I used:

– libfreetype-2.4.9
– matplotlib-1.1.0
– MACOSX_DEPLOYMENT_TARGET=10.5
– The files noted in the citation above are in place (i.e.,
accessible as .ttf files)

My theory was that a compiler error triggers the error with the font
files in question. Because recompiling ft2font.so with a different
MACOSX_DEPLOYMENT_TARGET made the crash disappear I supposed that
ft2font would trigger that compiler error. It needed to be a compiler
error because that environment variable was the only change that made
the crash disappear. Now it is the question if with more recent
software that error still persists. I have found that this is not the
case. I recompiled with the libraries noted above (all compiled from
source), and I can successfully import matplotlib.figure. This import
previously provoked the crash. So I believe that either I was wrong
in some respect, or the more recent software toolchain no longer
provokes the crash, because its code changed. Since it works just
flawlessly on my system now, I see little need to implement the
mechanism for excluding font files from being loaded – if it is not
needed I will not code it.

Friedrich

P.S.: Of course I moved the font cache before, so that it is recreated
when importing matplotlib.figure for the first time.
P.P.S.: One more difference is that the current Python is not a
framework Python anymore, but a regular Python.

···

Am 14. November 2011 15:04 schrieb Friedrich Romstedt <friedrichromstedt@...287...>:

It was clear from the beginning (well, from the point I got a handle
on it), that loading the font makes the 2009 matplotlib crash. The
only question unanswered is where the codepath is that triggers this
compiler bug (I think the compiler but hypothesis is not disproven and
works well atm). If the code path is in ft2font.cpp, we could (you
could) reformulate ft2font.cpp in an equivalent way with the exception
that it is not equivalent in crashing. You might want to augment
ft2font.cpp by printf() or something to see if the crash appears
inside a call to libfreetype or if all those calls return cleanly.

To my understanding, since recompiling ft2font.so without
MACOSX_DEPLOYMENT_TARGET different from 10.6 helps, ft2font.cpp should
be the culprit resp. victim. The only alternative I'm seeing would be
that it has to to do with the load mechanism of the dylib, but I deem
this rather unlikely. Well, unlikely is not the best word in this
context, since all this things here were pretty unlikely.

If the codepath is in libfreetype this would be an issue for their list. ...

Friedrich