Questions about mathtext, unicode conversion etc.

Hi all,

Is it that the code in the mathtext module looks ugly or is it just me
not understanding it?
Also, if anyone has some good online sources about parsing etc. on the
net, I vwould realy appreciate it.

Considering the folowing code (picked on random, from mathtext.py)

···

===
def math_parse_s_ft2font(s, dpi, fontsize, angle=0):
    """
    Parse the math expression s, return the (bbox, fonts) tuple needed
    to render it.

    fontsize must be in points

    return is width, height, fonts
    """

    major, minor1, minor2, tmp, tmp = sys.version_info
    if major==2 and minor1==2:
        raise SystemExit('mathtext broken on python2.2. We hope to
get this fixed soon')

    cacheKey = (s, dpi, fontsize, angle)
    s = s[1:-1] # strip the $ from front and back
    if math_parse_s_ft2font.cache.has_key(cacheKey):
        w, h, bfonts = math_parse_s_ft2font.cache[cacheKey]
        return w, h, bfonts.fonts.values()

    bakomaFonts = BakomaTrueTypeFonts()
    Element.fonts = bakomaFonts
    handler.clear()
    expression.parseString( s )

    handler.expr.set_size_info(fontsize, dpi)

    # set the origin once to allow w, h compution
    handler.expr.set_origin(0, 0)
    xmin = min([e.xmin() for e in handler.symbols])
    xmax = max([e.xmax() for e in handler.symbols])
    ymin = min([e.ymin() for e in handler.symbols])
    ymax = max([e.ymax() for e in handler.symbols])

    # now set the true origin - doesn't affect with and height
    w, h = xmax-xmin, ymax-ymin
    # a small pad for the canvas size
    w += 2
    h += 2

    handler.expr.set_origin(0, h-ymax)

    Element.fonts.set_canvas_size(w,h)
    handler.expr.render()
    handler.clear()

    math_parse_s_ft2font.cache[cacheKey] = w, h, bakomaFonts
    return w, h, bakomaFonts.fonts.values()

math_parse_s_ft2font.cache = {}

I don't understand, for example, what does the statement:

expression.parseString( s )

do?

"expression" is defined globaly, and is called (that is - its method)
only once in the above definition of the function, but I don't
understand - what does that particular line do?!?

------
Regarding the unicode support in mathtext, mathtext currently uses the
folowing dictionary for getting the glyph info out of the font files:

latex_to_bakoma = {

    r'\oint' : ('cmex10', 45),
    r'\bigodot' : ('cmex10', 50),
    r'\bigoplus' : ('cmex10', 55),
    r'\bigotimes' : ('cmex10', 59),
    r'\sum' : ('cmex10', 51),
    r'\prod' : ('cmex10', 24),
...
}

I managed to build the following dictionary(little more left to be done):
tex_to_unicode = {
r'\S' : u'\u00a7',
r'\P' : u'\u00b6',
r'\Gamma' : u'\u0393',
r'\Delta' : u'\u0394',
r'\Theta' : u'\u0398',
r'\Lambda' : u'\u039b',
r'\Xi' : u'\u039e',
r'\Pi' : u'\u03a0',
r'\Sigma' : u'\u03a3',
r'\Upsilon' : u'\u03a5',
r'\Phi' : u'\u03a6',
r'\Psi' : u'\u03a8',
r'\Omega' : u'\u03a9',
r'\alpha' : u'\u03b1',
r'\beta' : u'\u03b2',
r'\gamma' : u'\u03b3',
r'\delta' : u'\u03b4',
r'\varepsilon' : u'\u03b5',
r'\zeta' : u'\u03b6',
r'\eta' : u'\u03b7',
r'\vartheta' : u'\u03b8',
r'\iota' : u'\u03b9',
r'\kappa' : u'\u03ba',
r'\lambda' : u'\u03bb',
r'\mu' : u'\u03bc',
r'\nu' : u'\u03bd',
r'\xi' : u'\u03be',
r'\pi' : u'\u03c0',
r'\varrho' : u'\u03c1',
r'\varsigma' : u'\u03c2',
r'\sigma' : u'\u03c3',
r'\tau' : u'\u03c4',
r'\upsilon' : u'\u03c5',
r'\varphi' : u'\u03c6',
r'\chi' : u'\u03c7',
r'\psi' : u'\u03c8',
r'\omega' : u'\u03c9',
r'\ell' : u'\u2113',
r'\wp' : u'\u2118',
r'\Omega' : u'\u2126',
r'\Re' : u'\u211c',
r'\Im' : u'\u2111',
r'\aleph' : u'\u05d0',
r'\aleph' : u'\u2135',
r'\spadesuit' : u'\u2660',
r'\heartsuit' : u'\u2661',
r'\diamondsuit' : u'\u2662',
r'\clubsuit' : u'\u2663',
r'\flat' : u'\u266d',
r'\natural' : u'\u266e',
r'\sharp' : u'\u266f',
r'\leftarrow' : u'\u2190',
r'\uparrow' : u'\u2191',
r'\rightarrow' : u'\u2192',
r'\downarrow' : u'\u2193',
r'\Rightarrow' : u'\u21d2',
r'\Leftrightarrow' : u'\u21d4',
r'\leftrightarrow' : u'\u2194',
r'\updownarrow' : u'\u2195',
r'\forall' : u'\u2200',
r'\exists' : u'\u2203',
r'\emptyset' : u'\u2205',
r'\Delta' : u'\u2206',
r'\nabla' : u'\u2207',
r'\in' : u'\u2208',
r'\ni' : u'\u220b',
r'\prod' : u'\u220f',
r'\coprod' : u'\u2210',
r'\sum' : u'\u2211',
r'-' : u'\u2212',
r'\mp' : u'\u2213',
r'/' : u'\u2215',
r'\ast' : u'\u2217',
r'\circ' : u'\u2218',
r'\bullet' : u'\u2219',
r'\propto' : u'\u221d',
r'\infty' : u'\u221e',
r'\mid' : u'\u2223',
r'\wedge' : u'\u2227',
r'\vee' : u'\u2228',
r'\cap' : u'\u2229',
r'\cup' : u'\u222a',
r'\int' : u'\u222b',
r'\oint' : u'\u222e',
r':' : u'\u2236',
r'\sim' : u'\u223c',
r'\wr' : u'\u2240',
r'\simeq' : u'\u2243',
r'\approx' : u'\u2248',
r'\asymp' : u'\u224d',
r'\equiv' : u'\u2261',
r'\leq' : u'\u2264',
r'\geq' : u'\u2265',
r'\ll' : u'\u226a',
r'\gg' : u'\u226b',
r'\prec' : u'\u227a',
r'\succ' : u'\u227b',
r'\subset' : u'\u2282',
r'\supset' : u'\u2283',
r'\subseteq' : u'\u2286',
r'\supseteq' : u'\u2287',
r'\uplus' : u'\u228e',
r'\sqsubseteq' : u'\u2291',
r'\sqsupseteq' : u'\u2292',
r'\sqcap' : u'\u2293',
r'\sqcup' : u'\u2294',
r'\oplus' : u'\u2295',
r'\ominus' : u'\u2296',
r'\otimes' : u'\u2297',
r'\oslash' : u'\u2298',
r'\odot' : u'\u2299',
r'\vdash' : u'\u22a2',
r'\dashv' : u'\u22a3',
r'\top' : u'\u22a4',
r'\bot' : u'\u22a5',
r'\bigwedge' : u'\u22c0',
r'\bigvee' : u'\u22c1',
r'\bigcap' : u'\u22c2',
r'\bigcup' : u'\u22c3',
r'\diamond' : u'\u22c4',
r'\cdot' : u'\u22c5',
r'\lceil' : u'\u2308',
r'\rceil' : u'\u2309',
r'\lfloor' : u'\u230a',
r'\rfloor' : u'\u230b',
r'\langle' : u'\u27e8',
r'\rangle' : u'\u27e9',
r'\dag' : u'\u2020',
r'\ddag' : u'\u2021',
}

unicode_to_tex is straight forward.
Am I on the right track? What should I do next?

I also noticed that some TeX commands (commands in the sense that they
can have arguments enclosed in brackets {}) are defined as only
symbols: \sqrt alone, for example, displays just the begining of the
square root:√, and \sqrt{123} triggers an error.

That's it for now
Thanks in advance,
Edin

Hi Edin,

Edin Salković wrote:

Hi all,

<snip>

Also, if anyone has some good online sources about parsing etc. on the
net, I vwould realy appreciate it.

Everything that David Mertz wrote about text processing in his excellent "Charming Python" articles:
<http://gnosis.cx/publish/tech_index_cp.html>

and "Building Recursive Descent Parsers with Python":
http://www.onlamp.com/pub/a/python/2006/01/26/pyparsing.html

If you were after a book on the subject, Mertz's book "Text Processing in Python" <http://gnosis.cx/TPiP/> would be an obvious choice or you could pick up any book about writing compilers.

Gary R.