Plotting Lists of Strings has high CPU

Strings are now treated as ?categories? rather than cast to floats, and

plotted in the order received.

https://matplotlib.org/gallery/lines_bars_and_markers/categorical_variables.html

Cheers, Jody

Thanks for that Jody, I did just "get lucky".

Some assessment of this shows the high CPU associated with this operation
is at least partially avoidable.

The majority of the CPU time, according to:
  python3 -m cProfile -s time plotit.py -s|head -n20
is in or under StrCategoryFormatter._text which seems to be getting called
exponentially more times than I would expect. Of the order number of
categories squared in my samples, with 40K calls for 100 categories and 4M
for 1000 on mpl 2.2 amd 6M on mpl 3.0. Seems high.

Within the _text function in 2.2, the most expensive operation is the
constant test of the numpy version. This can be significantly reduced by
moving the constant expression with a simple change like:

diff --git a/lib/matplotlib/category.py b/lib/matplotlib/category.py
index b135bff1c..89b1c5bd9 100644
--- a/lib/matplotlib/category.py
+++ b/lib/matplotlib/category.py
@@ -28,6 +28,8 @@ import matplotlib.ticker as ticker
# np 1.6/1.7 support
from distutils.version import LooseVersion

+NP_PRE_1_7_0 = LooseVersion(np.__version__) < LooseVersion('1.7.0')

···

+
VALID_TYPES = tuple(set(six.string_types +
                         (bytes, six.text_type, np.str_, np.bytes_)))

@@ -158,7 +160,7 @@ class StrCategoryFormatter(ticker.Formatter):
     def _text(value):
         """Converts text values into `utf-8` or `ascii` strings
         """
- if LooseVersion(np.__version__) < LooseVersion('1.7.0'):
+ if NP_PRE_1_7_0:
             if (isinstance(value, (six.text_type, np.unicode))):
                 value = value.encode('utf-8', 'ignore').decode('utf-8')
         if isinstance(value, (np.bytes_, six.binary_type)):
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/matplotlib-users/attachments/20181026/396faed9/attachment.html&gt;

Perhaps not surprising that hasn?t been optimized, because most folks don?t have that many categories. If you have an actual use-case for that many categories, submitting a bug report on Github would be great.

Cheers, Jody

···

On Oct 25, 2018, at 16:47 PM, Douglas Clowes <douglas.clowes at gmail.com> wrote:

> Strings are now treated as ?categories? rather than cast to floats, and plotted in the order received.

> https://matplotlib.org/gallery/lines_bars_and_markers/categorical_variables.html

> Cheers, Jody

Thanks for that Jody, I did just "get lucky".

Some assessment of this shows the high CPU associated with this operation is at least partially avoidable.

The majority of the CPU time, according to:
  python3 -m cProfile -s time plotit.py -s|head -n20
is in or under StrCategoryFormatter._text which seems to be getting called exponentially more times than I would expect. Of the order number of categories squared in my samples, with 40K calls for 100 categories and 4M for 1000 on mpl 2.2 amd 6M on mpl 3.0. Seems high.

Within the _text function in 2.2, the most expensive operation is the constant test of the numpy version. This can be significantly reduced by moving the constant expression with a simple change like:

diff --git a/lib/matplotlib/category.py b/lib/matplotlib/category.py
index b135bff1c..89b1c5bd9 100644
--- a/lib/matplotlib/category.py
+++ b/lib/matplotlib/category.py
@@ -28,6 +28,8 @@ import matplotlib.ticker as ticker
# np 1.6/1.7 support
from distutils.version import LooseVersion

+NP_PRE_1_7_0 = LooseVersion(np.__version__) < LooseVersion('1.7.0')
+
VALID_TYPES = tuple(set(six.string_types +
                         (bytes, six.text_type, np.str_, np.bytes_)))

@@ -158,7 +160,7 @@ class StrCategoryFormatter(ticker.Formatter):
     def _text(value):
         """Converts text values into `utf-8` or `ascii` strings
         """
- if LooseVersion(np.__version__) < LooseVersion('1.7.0'):
+ if NP_PRE_1_7_0:
             if (isinstance(value, (six.text_type, np.unicode))):
                 value = value.encode('utf-8', 'ignore').decode('utf-8')
         if isinstance(value, (np.bytes_, six.binary_type)):

_______________________________________________
Matplotlib-users mailing list
Matplotlib-users at python.org
Matplotlib-users Info Page

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/matplotlib-users/attachments/20181025/7aa63443/attachment.html&gt;