Understanding when matplotlib tries to rebuild font-cache?

Hi,

I write to you to discuss a somewhat commonly enountered issue regarding font-cache and matplotlib:

/cle60up07/python/2.7.14/matplotlib/2.1.0/lib/python2.7/site-packages/matplotlib-2.1.0-py2.7-linux-x86_64.egg/matplotlib/font_manager.py:279: UserWarning: Matplotlib is building the font cache using fc-list. This may take a moment. 'Matplotlib is building the font cache using fc-list. '

However, I haven’t managed to find an answer specific to my question, which is – “When does matplotlib require building the font cache?” Some notes on how and where I am running my plot routine:

  1. I am running matplotlib/2.1.0
  2. I am running multiple instances of the routine that invokes matplotlib.pyplot on multiple nodes of a cluster.
  3. MPLCONFIGDIR is not set explicitly,
  4. I see a fontList.cache file in my ${HOME}/.matplotlib area.
  5. Of the 36 instances of the plot jobs running on 36 independent nodes, only 4 Nodes showed the above Warning message.

The trouble was these jobs went on for 18 hours after which it had to be killed forcefully to release the nodes back for use. These 4 jobs were by no means the first ones to have run on the cluster. In fact the earlier jobs ran fine without having to build the font-cache.

q1. One of the hypothesis is that multiple matplotlib.pyplot instances (running simultaneously) is trying to create/delete the font-caches leading to conflicts. Would you be surprised if this was to be the case?

q2. A workaround proposed is to define the environment variable MPLCONFIGDIR and point it to a /tmp directory local to the nodes (basically to the shared memory). Would you approve of this workaround?

q3. Personally, I suspect that ${HOME}/.matplotlib area could have momentarily been inaccessible for those nodes (transient issue). Would this not lead to the rebuilding of font-caches?

So my real question:

q4. What must have been the cause(s) in your opinion that could force Matplotlib to rebuild the font cache when in fact jobs preceding the ones showing the Warning, ran fine?

PS : I am unable to reproduce this issue. I have been running this code hundreds of times on various data sets since the last few years and never faced this problem. So I am a little perplexed why I saw this a few weeks ago on 4 nodes, and then again, I am unable to reproduce this issue!

Thanks,
wasim

This is certainly possible (although I guess it depends on what you mean by “temporarily inaccessible”.

It would help if you can guarantee that the location is always accessible, otherwise it’ll likely run into the same issues as with ~/.matplotlib.


The situation may also have improved since mpl2.1, as I replaced a possibly buggy path-locking mechanism by a hopefully less buggy one in Switch to per-file locking. by anntzer · Pull Request #10596 · matplotlib/matplotlib · GitHub.

@anntzer.lee, Thanks for your response and your referernce to Issue #10596.

  1. By “temporarily inaccessible”, I meant an issue we have on one of the clusters I am using where some OSTs (Object Storage Target) of the Lustre File System becomes inaccessible for a few minutes.

  2. However my fundamental question remains – under what circumstances does matplotlib even try to create/re-create font-caches? Does the locking of the directory happen irrespective of whether or not I have a a fontList.cache file or a tex.cache/ directory in my default $HOME/.matplotlib area? If so, your strategy of ensuring a private lock path as opposed to locking the directory would most likely help.

  3. I will request our admin to upgrade to a more recent version of matplotlib in any case.

wrt 2. I genuinely don’t remember how things were back in mpl2.1, and you should just check the source code to know :slight_smile:

Thanks again! I will delve in to the codes.

···

On 16 Feb 2020, at 21:35, anntzer.lee via Matplotlib nobody@discourse.matplotlib.org wrote:


anntzer.lee

February 16
wrt 2. I genuinely don’t remember how things were back in mpl2.1, and you should just check the source code to know
:slight_smile:


Visit
Topic
or reply to this email to respond.

To unsubscribe from these emails,
click here
.

Just to clarify (as pointed by one of our Supercompute experts) – the Lustre file system per se may not be responsible for the “transient behaviour” related to this particular issue.