Pdf with cyrillic

Hello! I tried to make cyrillic symbols and words in pdf searchable, but it still is not. Also I cannot highlight them.
Is it possible to fix it somehow?

I have a lot of words in my dedrogramm and I need a possibility to search them.

Example:
import numpy as np
import matplotlib.pyplot as plt
import math

def figsize(wcm,hcm): plt.figure(figsize=(wcm/2.54,hcm/2.54))
figsize(13,9)

x = np.linspace(0,2*math.pi,100)
y = np.sin(x)
plt.plot(x,y,’-’)
plt.xlabel(u"Ось абсцисс axis y")
plt.savefig(‘test.pdf’)

Thank you, for your attention.

You can use the pgf backend:


plt.savefig(‘test.pdf’, backend='pgf')

1 Like

Thank you, but then other error:

RuntimeError: xelatex not found. Install it or change rcParams['pgf.texsystem'] to an available TeX implementation.

So, it means that I need to install latex on local system.

I tried https://miktex.org for Windows, but there is other error:

ValueError: Error measuring '\\sffamily\\fontsize{9.000000}{10.800000}\\bfseries\\selectfont LINDT Шоколад Белый Ваниль к/уп(Lindt&SprungliGmbH):20'
LaTeX Output:
! Misplaced alignment tab character &.

Sentence has a both latin and cyrillic symbols. Problem with &.
I tried both XeLaTeX/LuaLaTeX recommended at Typesetting with XeLaTeX/LuaLaTeX — Matplotlib 3.5.1 documentation

MikTex is up to date.

Hm, I’m afraid I can’t help you here. Did you update your PATH (see second item in Typesetting with XeLaTeX/LuaLaTeX — Matplotlib 3.5.1 documentation)?
On my Windows computer, I have TexLive installed, xelatex is in the PATH, and rcParams['pgf.texsystem'] is left at its default.

1 Like

PATH exist.
And there is file xelatex:

I will try Tex Live, maybe it will help. Thanks for your help.

Seems related to common_texification misses & (ampersand) · Issue #15493 · matplotlib/matplotlib · GitHub.

1 Like

I need to either replace characters “&” or wait for a new release sometime, am I right?

Indeed, you need to replace them; there is no fix implemented so far.

1 Like

Thanks for the answer.

Because we pass strings on to an underlying latex implementation, I’m not even sure we should be trying to auto-escape (as we are as likely to get it wrong and escape something we should not as we are to fail to escape something we should).


If you set the pdf compression to 0 you can see the source of the problem:

plt.rcParams['pdf.compression'] = 0
45.2734375 0 Td
[ ( axis x) ] TJ
ET
q 0.01 0 0 0.01 0 0 cm /F1-DejaVuSans-uni041E Do Q
q 0.01 0 0 0.01 7.87109375 0 cm /F1-DejaVuSans-uni0441 Do Q
q 0.01 0 0 0.01 13.369140625 0 cm /F1-DejaVuSans-uni044C Do Q
q 0.01 0 0 0.01 22.44140625 0 cm /F1-DejaVuSans-uni0430 Do Q
q 0.01 0 0 0.01 28.5693359375 0 cm /F1-DejaVuSans-uni0431 Do Q
q 0.01 0 0 0.01 34.736328125 0 cm /F1-DejaVuSans-uni0441 Do Q
q 0.01 0 0 0.01 40.234375 0 cm /F1-DejaVuSans-uni0446 Do Q
q 0.01 0 0 0.01 47.041015625 0 cm /F1-DejaVuSans-uni0438 Do Q
q 0.01 0 0 0.01 53.5400390625 0 cm /F1-DejaVuSans-uni0441 Do Q
q 0.01 0 0 0.01 59.0380859375 1 cm /F1-DejaVuSans-uni0441 Do Q
Q
0.8 w
46.062992 36.980885 m
42.562992 36.980885 l

B

where ASCII gets encoded as an actual string and higher code points get encoded.

If you switch the font type to 42

plt.rcParams['pdf.fonttype'] = 42

in okular I get the Cyrillic to be selectable.

and what I think in the axis label in the pdf as a single (encoded) string.

Fonts in Matplotlib text engine — Matplotlib 3.5.1 documentation maybe a useful read.

1 Like

Its best solution for cyrillic I think, no addittional installations and workarounds with LaTeX and escapes with special characters. Thank you.