We've recently seen an issue where someone running multiple instances
of jobs on our supercomputer, all of which have a matplotlib component
that thus runs on the compute nodes, rather than as part of any
post-processing on our anciliary services.
Some of these jobs ended up hanging and, in a number of cases, we have
observed that the hanging process is what we belive to be the matplotlib-
Is there anything, in the way that matplotlib is written, that might
see race conditions, around access to the per-user font cache, or
other matplotlib data, being created?
Furthermore, is there a way that our users could define a per-job font
cache directory, by using the job-ID, and thereby explcitly avoiding
any inter-job interference resulting from their "massively parallel"
Here's hoping that matplotlib is the cause, and, if so, that there's an
easy solution, when you know how to use matplotlib.
Matplotlib-users mailing list