Let me make sure I understand this. If we map mathtext
> characters to unicode, and use freefont for now, will that
> help prepare MPL for STIX fonts? If there is an option
> available now that moves MPL in the direction of a
> permanent solution, then it seems like the decision is
> already made.
What follows is a long post of getting unicode fonts to work with
mathtext, which is a very important goal. But there is another goal
which is also important that may serve your thesis needs well: the
ability to farm out text handling to TeX/LaTeX, either for ps or png
using dvitopng.
Now, on to the unicode question.
In principle we should be able to substitute any set of unicode fonts
with any other, since they will all use the same encoding. Last time
I looked into replacing the bakoma fonts I spent a while looking at
the umbelleek fonts, but I came to the conclusion that they do not use
a unicode encoding, despite their author's later advocacy of unicode
http://www.tug.org/TUGboat/Articles/tb19-3/tb60kinch.pdf
So I think freefont is a better path to pursue (I wasn't aware of
these until reading Baptiste's post); even though they are GPLd, they
will ease the path to integration with other unicode fontsets later
> Can we come up with some kind of a plan or
> design document for what steps we need to take? I will
> pick at it after work, if I understand what needs to be
> done.
I am happy to help, offer advice and pointers and so on, but there is
no definite set of steps I can lay out. The person who has their
boots on wading through the mud will have to make many of these
decisions. There is no 1-to-1 mapping between TeX symbols and
unicode. Most unicode symbols (ancient cypriot) have no TeX
equivalent and many TeX symbols have no unicode equivalent (eg there
is no unicode symbol for each of \sqrt, \Sqrt, \SQRT)
So some creative ways to handles these cases will have to be devised;
a good start would be to google search
tex unicode
and do a little reading to get the lay of the land. There have been
previous efforts at mapping characters between TeX and unicode, and
I've worked on this before (see below). Also, search the archives for
any posts by Robert Kern on the issue of mathtext --- they are all
filled with sage advice and wonderful links that you will never find
even if you google for 1000 years. Unfortunately, the sourceforge
search engine is as sucky as their stats engine, so finding these
posts may be difficult.
> Now that the new formatter is complete, I have to find new
> ways to procrastinate. I will defend in August.
Hmm, in my experience, the having nothing to do is only the 2nd best
motivator for working on an open source project. The best one, of
course, is having a dissertation you should be working on. I'll try
and keep up with you
Included below is a hodge-podge of some stuff I drudged out of my
examples and test directories related to fonts, mathtext and unicode
-- collectively they provide the tools required to put all these
pieces together.
The following is a script to parse a unicode -> text mapping found at
http://www.cl.cam.ac.uk/~mgk25/ucs/examples/ -- grab the file TeX.txt
and run this script on it. The code parses that file builds a
dictionary from TeX->unicode
items = []
for line in file('TeX.txt'):
line = line.strip()
if not line.find('\\\\'): continue
vals = line.split('\t')
for val in vals:
tup = val.split(' ')
if len(tup)!=2: continue
code, sym = tup
if not sym.startswith('\\'): continue
items.append((sym, code))
for k,v in items:
o = ord(v.decode('utf-8'))
#print k,v,o, hex(o)
print " r'%s' : %d," % (k, o)
and generates output like
peds-pc311:~/python/mplsupport/test> python parse_tex.py
r'\alpha' : 945,
r'\iota' : 953,
r'\varrho' : 1009,
r'\beta' : 946,
r'\kappa' : 954,
r'\sigma' : 963,
which you can use to create a dictionary mapping tex syms to unicode
indices. You can save this dictionary as a _mathtext_data dict, for
use by the mathtext module.
The next task is to take a set of fonts and build a mapping from
unicode index to fontname, glyph index. This will require some
mastery of ft2font. Last time I was working on this I wrote
examples/font_indexing.py, mainly as a reminder to myself, on how to
use the module to extract the relevant information from font files,
character names, glyph indexes and character codes. I now wish I had
added more comments <wink>. You may want to try this example, read
over it, and make sure you understand what it is doing (add comments
as you learn and commit the updates to CVS).
Many fonts have multiple character maps. Normally the 0 charmap is
unicode if there is a unicode char map. Let's look at the freefont
files and see how we can use the ft2font to find the font with \alpha
(should be at character code 945 from the results above). Below is
some code I wrote to iterate over a list of ttf files and print the
character codes, glyph indices and character names contained in those
files. I'm running this over all the fonts in the freefont dirs and
grepping for 945 and alpha to eliminate the noise
> python find_unicode_texsyms.py /usr/share/fonts/truetype/freefont/*.ttf|grep 945|grep alpha
produces the following output
FreeMonoBoldOblique.ttf 0 447 945 alpha
FreeMonoBoldOblique.ttf 2 447 945 alpha
FreeMonoBold.ttf 0 612 945 alpha
FreeMonoBold.ttf 2 612 945 alpha
FreeMonoOblique.ttf 0 651 945 alpha
FreeMonoOblique.ttf 2 651 945 alpha
FreeMono.ttf 0 679 945 alpha
FreeMono.ttf 2 679 945 alpha
FreeSansBoldOblique.ttf 0 394 945 alpha
FreeSansBoldOblique.ttf 2 394 945 alpha
FreeSansBold.ttf 0 438 945 alpha
FreeSansBold.ttf 2 438 945 alpha
FreeSansOblique.ttf 0 457 945 alpha
FreeSansOblique.ttf 2 457 945 alpha
FreeSans.ttf 0 570 945 alpha
FreeSans.ttf 2 570 945 alpha
FreeSerifBoldItalic.ttf 0 546 945 alpha
FreeSerifBoldItalic.ttf 2 546 945 alpha
FreeSerifBold.ttf 0 530 945 alpha
FreeSerifBold.ttf 2 530 945 alpha
FreeSerifItalic.ttf 0 527 945 alpha
FreeSerifItalic.ttf 2 527 945 alpha
FreeSerif.ttf 0 566 945 alpha
FreeSerif.ttf 2 566 945 alpha
As mentioned, selecting charmap 0 is suppose to select a unicode
character map, and apparently charmap 2 is such a map. So you have
\alpha in a bunch of different styles (plain, bold, italic, etc -- how
to deal with all of this choice in the context of TeX/mathtext fonts
like rm, it, tt etc is where some of the artistry referred to above
comes in).
Below is the code that generated this output -- hopefully it will give
you some more insight into how to use ft2font [BTW, if you take this
on, it would be really helpful if right now you open a notes file and
start a tutorial to self about what you are learning. I have to
relearn this stuff myself every time I work on it (and I wrote most of
the font code and the examples). There is no better time to write
helpful documentation than when learning. Someone may have to do this
again one day, and that someone may be you!]
import sys, os
from glob import glob
from matplotlib.font_manager import fontManager
from matplotlib.ft2font import FT2Font
from matplotlib.cbook import reverse_dict
for fname in sys.argv[1:]:
#for fname in fontManager.ttffiles:
font = FT2Font(fname)
print 'loaded', fname, font.num_charmaps
for i in range(font.num_charmaps):
font.set_charmap(i)
cmap = font.get_charmap()
items = cmap.items()
items.sort()
fname = os.path.split(fname)[-1]
for gind, code in items:
name = font.get_glyph_name(gind)
print fname, i, gind, code, name
OK, so now we have some mappings from TeX -> unicode and some idea of
how to map unicode symbols tofont names and glyph indices. Another
tool which you can look at to understand font handling and glyph
rendering is in the mpl examples dir. The following builds a standard
font table in a plot window
> ./font_table_ttf.py /usr/share/fonts/truetype/freefont/FreeSans.ttf
This should be enough for tonight :-). We can talk by phone tomorrow
if you think it would help, or you can post questions here. It's good
to get some of this on record. I've spent many hours working on this
problem, but have never had the time and stamina to see it through.
mathtext in matplotlib has a lot of promise but the current
implementation is not satisfactory. Getting a good set of unicode
fonts working would be a significant step forward.
Thanks!
JDH