incorrectly rendered unicode (korean) character in ps (fonttype=3) and pdf backend

_Jae-Joon_Lee · May 27, 2008, 11:27pm

Hello,

I wanted to render some Korean text in my matplotlib figure and this
is how I did (with python2.5 and trunk version of matplotlib).

# -*- coding: utf-8 -*-
from pylab import *

import matplotlib.font_manager as fm
fp=fm.FontProperties(fname="/users/research/lee/.fonts/Eunjin.ttf", size=100)

plot([1,2,3])
text(1.,1.5, u'이', fontproperties=fp)
savefig("test.eps")
show()

It works fine with GtkAgg (and saving to png file also).
But ps (fonttype=3) and pdf backends seem to render the characters incorrectly.
Rendering with ps backends (fonttype=42) IS correct on the other hand.
See attached eps and pdf files. "test_correct.eps" is the correct one
made with ps fonttype=42.
"test_wrong.eps" is from fonttype=3.

Just in case, my rc file contains
text.usetex : False
ps.useafm : False

If I use ps backend with fonttype=3 (the wrong one), the embedded font
in the output eps file for the above character is defined as follows.

/uniC774{917 0 67 -2 833 678 _sc
gsave 0 343 translate
false CharStrings /cho12-1 get exec
grestore false CharStrings /jung21-1 get exec
}_d

It is a composition of two glyphs (/cho12-1 and /jung21-1), and my
guess is the first glyph is somehow misplaced. If I manually change
the translation of the first glyph to something like (0, -150) instead
the current value of (0, 343), the output looks okay.

I tried a few other Korean fonts but the results were similar. Some of
the glyphs are misplaced (in ps(type=3) and pdf backends).
Although I cannot rule out that the fonts I used have wrong font
information, but my inclination is this could be a bug in
"pprdrv_tt2.cpp" (or a related header, e.g.. truetype.h).

The translation of each glyphs seems to be handled by following code
(around line 594 of pprdrv_tt2.cpp).

      if( flags & ARGS_ARE_XY_VALUES )
    {
        if( arg1 != 0 || arg2 != 0 )
      stream.printf("gsave %d %d translate\n", topost(arg1), topost(arg2) );
    }

It would be great if someone who is expert on this font issue can look
into this.
Just in case, arg1=0, arg2=206, font->HUPM= 300, font->unitsPerEm=600
for this particular glyph.
The later two are used inside "topost".
Regards,

-JJ

test_correct.eps.gz (161 KB)

test_wrong.eps.gz (2.97 KB)

test_wrong.pdf (10.5 KB)

Michael_Droettboom · May 28, 2008, 12:44pm

The code that generates the Type 3 fonts for us is fairly old (1995), and certainly predates the widespread adoption of Unicode, so I'm somewhat not surprised this doesn't work. I'll have a brief look to see if there are any obvious fixes, but we're unlikely to implement a full-fledged Unicode rendering system (like Pango) any time soon. Since I don't read Korean, can you provide an image file of what the resulting compound glyph is supposed to look like?

As a workaround, you could try using the Cairo backend to generate Postscript and PDF. It may do a better job. You could also try other Korean fonts -- they may not use compound glyph composition.
Cheers,
Mike

Jae-Joon Lee wrote:

···

Hello,

I wanted to render some Korean text in my matplotlib figure and this
is how I did (with python2.5 and trunk version of matplotlib).

# -*- coding: utf-8 -*-
from pylab import *

import matplotlib.font_manager as fm
fp=fm.FontProperties(fname="/users/research/lee/.fonts/Eunjin.ttf", size=100)

plot([1,2,3])
text(1.,1.5, u'이', fontproperties=fp)
savefig("test.eps")
show()

It works fine with GtkAgg (and saving to png file also).
But ps (fonttype=3) and pdf backends seem to render the characters incorrectly.
Rendering with ps backends (fonttype=42) IS correct on the other hand.
See attached eps and pdf files. "test_correct.eps" is the correct one
made with ps fonttype=42.
"test_wrong.eps" is from fonttype=3.

Just in case, my rc file contains
text.usetex : False
ps.useafm : False

If I use ps backend with fonttype=3 (the wrong one), the embedded font
in the output eps file for the above character is defined as follows.

/uniC774{917 0 67 -2 833 678 _sc
gsave 0 343 translate
false CharStrings /cho12-1 get exec
grestore false CharStrings /jung21-1 get exec
}_d

It is a composition of two glyphs (/cho12-1 and /jung21-1), and my
guess is the first glyph is somehow misplaced. If I manually change
the translation of the first glyph to something like (0, -150) instead
the current value of (0, 343), the output looks okay.

I tried a few other Korean fonts but the results were similar. Some of
the glyphs are misplaced (in ps(type=3) and pdf backends).
Although I cannot rule out that the fonts I used have wrong font
information, but my inclination is this could be a bug in
"pprdrv_tt2.cpp" (or a related header, e.g.. truetype.h).

The translation of each glyphs seems to be handled by following code
(around line 594 of pprdrv_tt2.cpp).

      if( flags & ARGS_ARE_XY_VALUES )
    {
        if( arg1 != 0 || arg2 != 0 )
      stream.printf("gsave %d %d translate\n", topost(arg1), topost(arg2) );
    }

It would be great if someone who is expert on this font issue can look
into this.
Just in case, arg1=0, arg2=206, font->HUPM= 300, font->unitsPerEm=600
for this particular glyph.
The later two are used inside "topost".
Regards,

-JJ
  ------------------------------------------------------------------------

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
------------------------------------------------------------------------

_______________________________________________
Matplotlib-devel mailing list
Matplotlib-devel@lists.sourceforge.net
matplotlib-devel List Signup and Options

--
Michael Droettboom
Science Software Branch
Operations and Engineering Division
Space Telescope Science Institute
Operated by AURA for NASA

Michael_Droettboom · May 28, 2008, 12:48pm

Correction -- no need to send the image. You said png output was correct, so I'll just compare against that.

Michael Droettboom wrote:

···

The code that generates the Type 3 fonts for us is fairly old (1995), and certainly predates the widespread adoption of Unicode, so I'm somewhat not surprised this doesn't work. I'll have a brief look to see if there are any obvious fixes, but we're unlikely to implement a full-fledged Unicode rendering system (like Pango) any time soon. Since I don't read Korean, can you provide an image file of what the resulting compound glyph is supposed to look like?

As a workaround, you could try using the Cairo backend to generate Postscript and PDF. It may do a better job. You could also try other Korean fonts -- they may not use compound glyph composition.
Cheers,
Mike

Jae-Joon Lee wrote:


Hello,

I wanted to render some Korean text in my matplotlib figure and this
is how I did (with python2.5 and trunk version of matplotlib).

# -*- coding: utf-8 -*-
from pylab import *

import matplotlib.font_manager as fm
fp=fm.FontProperties(fname="/users/research/lee/.fonts/Eunjin.ttf", size=100)

plot([1,2,3])
text(1.,1.5, u'이', fontproperties=fp)
savefig("test.eps")
show()

It works fine with GtkAgg (and saving to png file also).
But ps (fonttype=3) and pdf backends seem to render the characters incorrectly.
Rendering with ps backends (fonttype=42) IS correct on the other hand.
See attached eps and pdf files. "test_correct.eps" is the correct one
made with ps fonttype=42.
"test_wrong.eps" is from fonttype=3.

Just in case, my rc file contains
text.usetex : False
ps.useafm : False

If I use ps backend with fonttype=3 (the wrong one), the embedded font
in the output eps file for the above character is defined as follows.

/uniC774{917 0 67 -2 833 678 _sc
gsave 0 343 translate
false CharStrings /cho12-1 get exec
grestore false CharStrings /jung21-1 get exec
}_d

It is a composition of two glyphs (/cho12-1 and /jung21-1), and my
guess is the first glyph is somehow misplaced. If I manually change
the translation of the first glyph to something like (0, -150) instead
the current value of (0, 343), the output looks okay.

I tried a few other Korean fonts but the results were similar. Some of
the glyphs are misplaced (in ps(type=3) and pdf backends).
Although I cannot rule out that the fonts I used have wrong font
information, but my inclination is this could be a bug in
"pprdrv_tt2.cpp" (or a related header, e.g.. truetype.h).

The translation of each glyphs seems to be handled by following code
(around line 594 of pprdrv_tt2.cpp).

      if( flags & ARGS_ARE_XY_VALUES )
    {
        if( arg1 != 0 || arg2 != 0 )
      stream.printf("gsave %d %d translate\n", topost(arg1), topost(arg2) );
    }

It would be great if someone who is expert on this font issue can look
into this.
Just in case, arg1=0, arg2=206, font->HUPM= 300, font->unitsPerEm=600
for this particular glyph.
The later two are used inside "topost".
Regards,

-JJ
  ------------------------------------------------------------------------

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
------------------------------------------------------------------------

_______________________________________________
Matplotlib-devel mailing list
Matplotlib-devel@lists.sourceforge.net
matplotlib-devel List Signup and Options

--
Michael Droettboom
Science Software Branch
Operations and Engineering Division
Space Telescope Science Institute
Operated by AURA for NASA

Michael_Droettboom · May 28, 2008, 1:38pm

I seem to have found a fix. The key point was this comment in pprdrv_tt2.cpp:

        else /* The tt spec. does not clearly indicate */
            { /* whether these values are signed or not. */
              arg1 = *(signed char *)(glyph++);
              arg2 = *(signed char *)(glyph++);
        }

By adding the cast to (signed char *), things seem to work. I guess that's what the spec does in practice! This doesn't seem to break Latin compound glyphs that used to work (the only thing that has been extensively tested up until now), so I'm pretty confident that's the correct fix. This has been fixed in SVN. Please try with more Korean characters and let me know if you still see anything strange.

Cheers,
Mike

Michael Droettboom wrote:

···

Correction -- no need to send the image. You said png output was correct, so I'll just compare against that.

Michael Droettboom wrote:


The code that generates the Type 3 fonts for us is fairly old (1995), and certainly predates the widespread adoption of Unicode, so I'm somewhat not surprised this doesn't work. I'll have a brief look to see if there are any obvious fixes, but we're unlikely to implement a full-fledged Unicode rendering system (like Pango) any time soon. Since I don't read Korean, can you provide an image file of what the resulting compound glyph is supposed to look like?

As a workaround, you could try using the Cairo backend to generate Postscript and PDF. It may do a better job. You could also try other Korean fonts -- they may not use compound glyph composition.
Cheers,
Mike

Jae-Joon Lee wrote:


Hello,

I wanted to render some Korean text in my matplotlib figure and this
is how I did (with python2.5 and trunk version of matplotlib).

# -*- coding: utf-8 -*-
from pylab import *

import matplotlib.font_manager as fm
fp=fm.FontProperties(fname="/users/research/lee/.fonts/Eunjin.ttf", size=100)

plot([1,2,3])
text(1.,1.5, u'이', fontproperties=fp)
savefig("test.eps")
show()

It works fine with GtkAgg (and saving to png file also).
But ps (fonttype=3) and pdf backends seem to render the characters incorrectly.
Rendering with ps backends (fonttype=42) IS correct on the other hand.
See attached eps and pdf files. "test_correct.eps" is the correct one
made with ps fonttype=42.
"test_wrong.eps" is from fonttype=3.

Just in case, my rc file contains
text.usetex : False
ps.useafm : False

If I use ps backend with fonttype=3 (the wrong one), the embedded font
in the output eps file for the above character is defined as follows.

/uniC774{917 0 67 -2 833 678 _sc
gsave 0 343 translate
false CharStrings /cho12-1 get exec
grestore false CharStrings /jung21-1 get exec
}_d

It is a composition of two glyphs (/cho12-1 and /jung21-1), and my
guess is the first glyph is somehow misplaced. If I manually change
the translation of the first glyph to something like (0, -150) instead
the current value of (0, 343), the output looks okay.

I tried a few other Korean fonts but the results were similar. Some of
the glyphs are misplaced (in ps(type=3) and pdf backends).
Although I cannot rule out that the fonts I used have wrong font
information, but my inclination is this could be a bug in
"pprdrv_tt2.cpp" (or a related header, e.g.. truetype.h).

The translation of each glyphs seems to be handled by following code
(around line 594 of pprdrv_tt2.cpp).

      if( flags & ARGS_ARE_XY_VALUES )
    {
        if( arg1 != 0 || arg2 != 0 )
      stream.printf("gsave %d %d translate\n", topost(arg1), topost(arg2) );
    }

It would be great if someone who is expert on this font issue can look
into this.
Just in case, arg1=0, arg2=206, font->HUPM= 300, font->unitsPerEm=600
for this particular glyph.
The later two are used inside "topost".
Regards,

-JJ
  ------------------------------------------------------------------------

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
------------------------------------------------------------------------

_______________________________________________
Matplotlib-devel mailing list
Matplotlib-devel@lists.sourceforge.net
matplotlib-devel List Signup and Options

--
Michael Droettboom
Science Software Branch
Operations and Engineering Division
Space Telescope Science Institute
Operated by AURA for NASA

_Jae-Joon_Lee · May 29, 2008, 4:25am

Hi Mike,

Yes, it is correct now.
I tried bunch of Korean characters and fonts and they were all fine.
Thanks a lot!

Unfortunately, there's an other problem. Although the glyphs are
rendered at correct location, some of the characters are still
incorrect (pdf backends is fine only ps output is wrong).
I'm attaching pdf one (correct) and ps one (incorrect). You will
easily notice what's wrong.
When a single glyph is consisted of overlapping closed paths, current
ps font definition does not fill those overlapping regions (the source
code has a comment that it assumes the paths are detached.).
In the output font definition, paths are filled with "eofill" command.
If I replace "eofill" with simple "fill" (line 384 of pprdrv_tt.cpp),
then characters are rendered correctly. I tried other non-korean
characters to see if such change is harmful, but no such case is found
as far as I see.
As I know little about postscipt and ttf font, I'm not sure this is a
correct way (although it works fine for me). So, it will be
appreciated if Mike or others confirm this and make a change.

Thanks very much in advance.
Regards,

-JJ

ps. Although it is a very simple fix, a diff file is also attached.

test.pdf (11.1 KB)

test.eps.gz (3.17 KB)

pprdrv_tt.diff (534 Bytes)

···

On Wed, May 28, 2008 at 9:38 AM, Michael Droettboom <mdroe@...31...> wrote:

I seem to have found a fix. The key point was this comment in
pprdrv_tt2.cpp:

      else /* The tt spec. does not clearly indicate */
          { /* whether these values are signed or not. */
            arg1 = *(signed char *)(glyph++);
            arg2 = *(signed char *)(glyph++);
      }

By adding the cast to (signed char *), things seem to work. I guess that's
what the spec does in practice! This doesn't seem to break Latin compound
glyphs that used to work (the only thing that has been extensively tested up
until now), so I'm pretty confident that's the correct fix. This has been
fixed in SVN. Please try with more Korean characters and let me know if you
still see anything strange.

Cheers,
Mike

Michael Droettboom wrote:

Correction -- no need to send the image. You said png output was correct,
so I'll just compare against that.

Michael Droettboom wrote:

The code that generates the Type 3 fonts for us is fairly old (1995), and
certainly predates the widespread adoption of Unicode, so I'm somewhat not
surprised this doesn't work. I'll have a brief look to see if there are any
obvious fixes, but we're unlikely to implement a full-fledged Unicode
rendering system (like Pango) any time soon. Since I don't read Korean, can
you provide an image file of what the resulting compound glyph is supposed
to look like?

As a workaround, you could try using the Cairo backend to generate
Postscript and PDF. It may do a better job. You could also try other
Korean fonts -- they may not use compound glyph composition.
Cheers,
Mike

Jae-Joon Lee wrote:

Hello,

I wanted to render some Korean text in my matplotlib figure and this
is how I did (with python2.5 and trunk version of matplotlib).

# -*- coding: utf-8 -*-
from pylab import *

import matplotlib.font_manager as fm
fp=fm.FontProperties(fname="/users/research/lee/.fonts/Eunjin.ttf",
size=100)

plot([1,2,3])
text(1.,1.5, u'이', fontproperties=fp)
savefig("test.eps")
show()

It works fine with GtkAgg (and saving to png file also).
But ps (fonttype=3) and pdf backends seem to render the characters
incorrectly.
Rendering with ps backends (fonttype=42) IS correct on the other hand.
See attached eps and pdf files. "test_correct.eps" is the correct one
made with ps fonttype=42.
"test_wrong.eps" is from fonttype=3.

Just in case, my rc file contains
text.usetex : False
ps.useafm : False

If I use ps backend with fonttype=3 (the wrong one), the embedded font
in the output eps file for the above character is defined as follows.

/uniC774{917 0 67 -2 833 678 _sc
gsave 0 343 translate
false CharStrings /cho12-1 get exec
grestore false CharStrings /jung21-1 get exec
}_d

It is a composition of two glyphs (/cho12-1 and /jung21-1), and my
guess is the first glyph is somehow misplaced. If I manually change
the translation of the first glyph to something like (0, -150) instead
the current value of (0, 343), the output looks okay.

I tried a few other Korean fonts but the results were similar. Some of
the glyphs are misplaced (in ps(type=3) and pdf backends).
Although I cannot rule out that the fonts I used have wrong font
information, but my inclination is this could be a bug in
"pprdrv_tt2.cpp" (or a related header, e.g.. truetype.h).

The translation of each glyphs seems to be handled by following code
(around line 594 of pprdrv_tt2.cpp).

           if( flags & ARGS_ARE_XY_VALUES )
               {
                   if( arg1 != 0 || arg2 != 0 )
                       stream.printf("gsave %d %d translate\n",
topost(arg1), topost(arg2) );
               }

It would be great if someone who is expert on this font issue can look
into this.
Just in case, arg1=0, arg2=206, font->HUPM= 300, font->unitsPerEm=600
for this particular glyph.
The later two are used inside "topost".
Regards,

-JJ

------------------------------------------------------------------------

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
------------------------------------------------------------------------

_______________________________________________
Matplotlib-devel mailing list
Matplotlib-devel@lists.sourceforge.net
matplotlib-devel List Signup and Options

--
Michael Droettboom
Science Software Branch
Operations and Engineering Division
Space Telescope Science Institute
Operated by AURA for NASA