Since you asked
I may not have mentioned this but the style conventions for mpl code
are
functions : lower or lower_score_separated
variables and attributes : lower or lowerUpper
classes : Upper or MixedUpper
OK
Also, I am not too fond of the dict of dicts -- why not use variable
names?
I used a dict of dicts because this allowed me to generate separate
picle files (for each one of the dicts in the top-level dict) and
anything else (see the final script) by their coresponding top-level
dict name. I thought it was better, for practical/speed reasons, to
have separate pickle files, for every dict.
for line in file(fname):
if line[:2]!=' 0': continue # using continue avoids unneccesary indent
Thanks for the tip!
uninum = line[2:6].strip().lower()
type1name = line[12:37].strip()
texname = line[83:110].strip()
uninum = int(uninum, 16)
I thought that the idea was to allow users to write unicode strings
directly in TeX (OK, this isn't much of an excuse :). That's why I
used the eval approach, to get the dict keys (or values) to be unicode
strings. I'm also aware that indexing by ints is faster, and that the
underlying FT2 functions work with ints... OK, I'm now convinced that
your approach is better
pickle.dump((uni2type1, type12uni, uni2tex, tex2uni), file('unitex.pcl','w'))
# An example
unichar = int('00d7', 16)
print uni2tex.get(unichar)
print uni2type1.get(unichar)
Also, I am a little hesitant to use pickle files for the final
mapping. I suggest you write a script that generates the python code
contains the dictionaries you need (that is how much of _mathext_data
was generated.
The reason why I used pickle - from the Python docs:
···
On 6/22/06, John Hunter <jdhunter@...5...> wrote:
Strings can easily be written to and read from a file. Numbers take a
bit more effort, since the read() method only returns strings, which
will have to be passed to a function like int(), which takes a string
like '123' and returns its numeric value 123. However, when you want
to save more complex data types like lists, dictionaries, or class
instances, things get a lot more complicated.
Rather than have users be constantly writing and debugging code to
save complicated data types, Python provides a standard module called
pickle. This is an amazing module that can take almost any Python
object (even some forms of Python code!), and convert it to a string
representation; this process is called pickling. Reconstructing the
object from the string representation is called unpickling. Between
pickling and unpickling, the string representing the object may have
been stored in a file or data, or sent over a network connection to
some distant machine.
So I thought that pickling was the obvious way to go. And, of course,
unpickling with cPickle is very fast. I also think that no human being
should change the automaticaly generated dicts. Rather, we should put
a separate python file (i.e. _mathtext_manual_data.py) where anybody
who wants to manually override the automaticaly generated values, or
add new (key, value) pairs can do so.
The idea:
_mathtext_manual_data.py:
uni2text = {key1:value1, key2:value2}
tex2uni = {}
uni2type1 = {}
type12uni = {}
uni2tex.py:
from cPickle import load
uni2tex = load(open('uni2tex.cpl'))
try:
import _mathtext_manual_data
uni2tex.update(_mathtext_manual_data.uni2tex)
except (TypeError, SyntaxError): # Just these exceptions should be raised
raise
except: # All other exceptions should be silent
pass
Finally, I added lines for automatically generating pretty much
everything that can be automatically generated
stix-tbl2py.py
'''A script for seemlesly copying the data from the stix-tbl.ascii*
file to a set
of python dicts. Dicts are then pickled to coresponding files, for
later retrieval.
Currently used table file:
http://www.ams.org/STIX/bnb/stix-tbl.ascii-2005-09-24
'''
import pickle
tablefilename = 'stix-tbl.ascii-2005-09-24'
dictnames = ['uni2type1', 'type12uni', 'uni2tex', 'tex2uni']
dicts = {}
# initialize the dicts
for name in dictnames:
dicts[name] = {}
for line in file(tablefilename):
if line[:2]!=' 0': continue
uninum = int(line[2:6].strip().lower(), 16)
type1name = line[12:37].strip()
texname = line[83:110].strip()
if type1name:
dicts['uni2type1'][uninum] = type1name
dicts['type12uni'][type1name] = uninum
if texname:
dicts['uni2tex'][uninum] = texname
dicts['tex2uni'][texname] = uninum
template = '''# Automatically generated file.
from cPickle import load
%(name)s = load(open('%(name)s.pcl'))
try:
import _mathtext_manual_data
%(name)s.update(_mathtext_manual_data.%(name)s)
except (TypeError, SyntaxError): # Just these exceptions should be raised
raise
except: # All other exceptions should be silent
pass
'''
# pickling the dicts to corresponding .pcl files
# automatically generating .py module files, used by importers
for name in dictnames:
pickle.dump(dicts[name], open(name + '.pcl','w'))
file(name + '.py','w').write(template%{'name':name})
# An example
from uni2tex import uni2tex
from uni2type1 import uni2type1
unichar = u'\u00d7'
uninum = ord(unichar)
print uni2tex[uninum]
print uni2type1[uninum]
Cheers,
Edin