unicode and thoughts on mathtext

Just wanted to let you know that I finished adding unicode support for
agg and postscript. The changes are in CVS; see
examples/unicode_demo.py

I'm not a big consumer of unicode so this is lightly tested but it
does work with the western unicode strings and fontfile names I tested
on agg and ps.

In the process of getting text layout right in PS I discovered that
glyph.horiAdvance doesn't do what I thought, since it effectively
"snaps to pixel". This was causing all kinds of layout badness in
postscript unicode (postscript doesn't support unicode, so you have to
layout the strings "by hand" character-by-character). The trick was
to expose glyph.linearHoriAdvance which is the device independent
version; likewise I discovered that there were various kerning modes,
some of which are more appropriate for device independent layout. I
think this error also underlies some of the current layout badness in
mathtext, which was also using glyph.horiAdvance.

I made a furtive attempt to add kerning to mathtext, but then
discovered that the cm truetype fonts do not have kerning information
in them at all. I took Robert Kern's (no pun intended) advice to get
the kerning information from the tfm files using tftopl, but these are
in "display device" coordinates so I am not sure how to properly use
it (multiply by an EM??).

But I'm kind of down on the Bakoma cm truetype fonts in any case,
because of their noncommercial license restrictions and because some
of the glyphs look terrible. For example, check out the "t" in

  title(r'$\rm{this\ is\ a\ test}$')

Also there is the unresolved problem with how exactly the vertical
offset works in the cmex file, which neither Paul nor I were able to
figure out despite days of banging our heads against it.

Now that I cam getting my head around unicode, I'm considering a new
solution for mathtext, some of which we've touched on in previous
threads:

* ship the umbelleck fonts with mpl (no license restrictions)

* rebuild the data tables to map TeX names to unicode codes (I think
   Robert pointed out a link to an existing map, but it was GPLd and
   there was some discussion of whether we could rip out the tables).
   Right now, mathtext maps TeX symbols to (fontname, glyphindex)
   tuples, which is just plain dumb. Hmm, it occurs to me suddenly
   that I can use the existing tables to build the unicode tables
   since I can use the font module to map glypindex -> unicode.

* Rather than hardcode the font names with the symbol, query all the
   fontfiles on the system to see which unicode characters they
   provide. Thus one could do simple mathtext (eg super/sub,
   equations) with the default font (eg Vera) of you were only using
   symbols provided by Vera.

* Fix the basic layout problems -- some of this resulted from the
   glyph.horiAdvance problem, and some of it from not handling
   kerning, and some of it is still hard, eg cross font kerning. If
   we modify text.py to support embedded mathtext, this would be less
   of an issue, particularly now that we have unicode. Eg, you can
   use unicode text in the font of your choice to do accents and many
   special characters, and fall back on mathtext only for the
   super/sub scripting and other equation like stuff.

JDH