mpl-1.2.1: Speedup code by removing .startswith() calls and some for loops

Hi,
  I am drawing some barcharts and scatter plot and the speed for rendering is awful once you have
100 000 of dots. I ran python profiler which lead me to .startswith() calls and some for loops
which append do a list repeatedly. This parts could be still sped up I think but a first attempt
is here:

UNPATCHED 1.2.1

real 23m17.764s
user 13m25.880s
sys 3m37.180s

PATCHED:

real 6m59.831s
user 5m18.000s
sys 1m40.360s

The patches are simple and because I see elsewhere in the code list expansions I do not see any
problems with backwards compatibility (new new python language features are required).

Hope this helps,
Martin

artist.py.patch (444 Bytes)

axes.py.patch (1.03 KB)

Hi Martin,

Thanks for this - we are really interested in speeding up the scatter and barchart plotting with large data sets. In fact, we’ve done some work (https://github.com/matplotlib/matplotlib/pull/2156) recently to make the situation better.

I’d really like to review these changes (against matplotlib master), and the best possible solution to doing this is if you were to submit a pull request. If the changes you have made are logically seperable, then I’d encourage you to make a few PRs, but otherwise, a single PR with all of these changes would be great.

Would you mind turning these patches into PR(s)? (https://github.com/matplotlib/matplotlib/compare/)

Thanks!

Phil

···

On 9 August 2013 12:53, Martin Mokrejs <mmokrejs@…3951…> wrote:

Hi,

I am drawing some barcharts and scatter plot and the speed for rendering is awful once you have

100 000 of dots. I ran python profiler which lead me to .startswith() calls and some for loops

which append do a list repeatedly. This parts could be still sped up I think but a first attempt

is here:

UNPATCHED 1.2.1

real 23m17.764s

user 13m25.880s

sys 3m37.180s

PATCHED:

real 6m59.831s

user 5m18.000s

sys 1m40.360s

The patches are simple and because I see elsewhere in the code list expansions I do not see any

problems with backwards compatibility (new new python language features are required).

Hope this helps,

Martin


Get 100% visibility into Java/.NET code with AppDynamics Lite!

It’s a free troubleshooting tool designed for production.

Get down to code-level detail for bottlenecks, with <2% overhead.

Download for free and get started troubleshooting in minutes.

http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk


Matplotlib-users mailing list

Matplotlib-users@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/matplotlib-users

Hi Phil,

Phil Elson wrote:

Hi Martin,

Thanks for this - we are really interested in speeding up the scatter and barchart plotting with large data sets. In fact, we've done some work (https://github.com/matplotlib/matplotlib/pull/2156) recently to make the situation better.

I'd really like to review these changes (against matplotlib master), and the best possible solution to doing this is if you were to submit a pull request. If the changes you have made are logically seperable, then I'd encourage you to make a few PRs, but otherwise, a single PR with all of these changes would be great.

I went through the changes there and they just cope with other pieces of matplotlib.
My changes are general python improvements moving away from str.startswith()
and using generators instead of for loops. Just apply the patches yourself and see.
:wink:

Would you mind turning these patches into PR(s)? (https://github.com/matplotlib/matplotlib/compare/)

Um, I don't know what to do on that page, sorry. I don't see how to upload my patch file or patched file
to be compared with master. :frowning:

Thanks!

I am sorry but I just don't have time to fiddle with github. It is just awkward. I even failed to download
diffs of the changes from https://github.com/matplotlib/matplotlib/pull/2156/commits.

I rather continue studying runsnake output. :wink:

Martin

···

Phil

On 9 August 2013 12:53, Martin Mokrejs <mmokrejs@…3951… <mailto:mmokrejs@…3951…>> wrote:

    Hi,
      I am drawing some barcharts and scatter plot and the speed for rendering is awful once you have
    100 000 of dots. I ran python profiler which lead me to .startswith() calls and some for loops
    which append do a list repeatedly. This parts could be still sped up I think but a first attempt
    is here:

    UNPATCHED 1.2.1

    real 23m17.764s
    user 13m25.880s
    sys 3m37.180s

    PATCHED:

    real 6m59.831s
    user 5m18.000s
    sys 1m40.360s

    The patches are simple and because I see elsewhere in the code list expansions I do not see any
    problems with backwards compatibility (new new python language features are required).

    Hope this helps,
    Martin

    ------------------------------------------------------------------------------
    Get 100% visibility into Java/.NET code with AppDynamics Lite!
    It's a free troubleshooting tool designed for production.
    Get down to code-level detail for bottlenecks, with <2% overhead.
    Download for free and get started troubleshooting in minutes.
    http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
    _______________________________________________
    Matplotlib-users mailing list
    Matplotlib-users@lists.sourceforge.net <mailto:Matplotlib-users@lists.sourceforge.net>
    https://lists.sourceforge.net/lists/listinfo/matplotlib-users

A snippet from one of you patches:

         dsu = []

- for a in self.patches:
- dsu.append( (a.get_zorder(), a, a.draw, [renderer]))
+ [ dsu.append( (x.get_zorder(), x, x.draw, [renderer])) for x in
self.patches ]

Yes, we certainly should use list-comprehensions here, but you are using it
incorrectly. It should be:

dsu = [(x.get_zorder(), x, x.draw, [renderer])) for x in self.patches ]

And then, further down, do the following:

dsu.extend((x.get_zorder(), x, x.draw, [renderer])) for x in self.lines)

Note the generator form of the comprehension as opposed to the list
comprehension form. List comprehensions should *always* be assigned to
something. List comprehensions should only be for replacing the for-append
idiom in python.

Thank you though for pointing out parts of the code that can benefit from
revisions. I certainly hope you can get this put together as a pull request
on github so we can work to make this patch better!

Ben Root

P.S. - I <3 runsnakerun!

···

On Fri, Aug 9, 2013 at 9:04 AM, Martin Mokrejs <mmokrejs@...3951...>wrote:

Hi Phil,

Phil Elson wrote:
> Hi Martin,
>
> Thanks for this - we are really interested in speeding up the scatter
and barchart plotting with large data sets. In fact, we've done some work (
https://github.com/matplotlib/matplotlib/pull/2156) recently to make the
situation better.
>
> I'd really like to review these changes (against matplotlib master), and
the best possible solution to doing this is if you were to submit a pull
request. If the changes you have made are logically seperable, then I'd
encourage you to make a few PRs, but otherwise, a single PR with all of
these changes would be great.

I went through the changes there and they just cope with other pieces of
matplotlib.
My changes are general python improvements moving away from
str.startswith()
and using generators instead of for loops. Just apply the patches yourself
and see.
:wink:

>
> Would you mind turning these patches into PR(s)? (
https://github.com/matplotlib/matplotlib/compare/)

Um, I don't know what to do on that page, sorry. I don't see how to upload
my patch file or patched file
to be compared with master. :frowning:

>
> Thanks!

I am sorry but I just don't have time to fiddle with github. It is just
awkward. I even failed to download
diffs of the changes from
https://github.com/matplotlib/matplotlib/pull/2156/commits.

I rather continue studying runsnake output. :wink:

Martin

Hi Ben,
  thank your for your comments. OK, here are revised patches. I see a hot spot
in artist.py where the getattr() calls are too expensive. Actually, those under
the callable() path.

793 class ArtistInspector:

817 def get_aliases(self):
818 """
819 Get a dict mapping *fullname* -> *alias* for each *alias* in
820 the :class:`~matplotlib.artist.ArtistInspector`.
821
822 Eg., for lines::
823
824 {'markerfacecolor': 'mfc',
825 'linewidth' : 'lw',
826 }
827
828 """
829 names = [name for name in dir(self.o) if
830 (name[:4] in ['set_', 'get_'])
831 and callable(getattr(self.o, name))]
832 aliases = {}
833 for name in names:
834 func = getattr(self.o, name)
835 if not self.is_alias(func):
836 continue
837 docstring = func.__doc__
838 fullname = docstring[10:]
839 aliases.setdefault(fullname[4:], {})[name[4:]] = None
840 return aliases

Another hot spot is setp() artist.py, actually its get_aliases() on line 817,
which again leads to getattr() and callable(). The problem is there are millions of
their calls.

So, once again, my all my patches attached (figure.py.patch artist.py.patch have changed).
Martin

Benjamin Root wrote:

axes.py.patch (1.03 KB)

artist.py.patch (1.16 KB)

figure.py.patch (1.7 KB)

···

On Fri, Aug 9, 2013 at 9:04 AM, Martin Mokrejs <mmokrejs@…3951… <mailto:mmokrejs@…3951…>> wrote:

    Hi Phil,

    Phil Elson wrote:
    > Hi Martin,
    >
    > Thanks for this - we are really interested in speeding up the scatter and barchart plotting with large data sets. In fact, we've done some work (https://github.com/matplotlib/matplotlib/pull/2156) recently to make the situation better.
    >
    > I'd really like to review these changes (against matplotlib master), and the best possible solution to doing this is if you were to submit a pull request. If the changes you have made are logically seperable, then I'd encourage you to make a few PRs, but otherwise, a single PR with all of these changes would be great.

    I went through the changes there and they just cope with other pieces of matplotlib.
    My changes are general python improvements moving away from str.startswith()
    and using generators instead of for loops. Just apply the patches yourself and see.
    ;)

    >
    > Would you mind turning these patches into PR(s)? (https://github.com/matplotlib/matplotlib/compare/)

    Um, I don't know what to do on that page, sorry. I don't see how to upload my patch file or patched file
    to be compared with master. :frowning:

    >
    > Thanks!

    I am sorry but I just don't have time to fiddle with github. It is just awkward. I even failed to download
    diffs of the changes from https://github.com/matplotlib/matplotlib/pull/2156/commits.

    I rather continue studying runsnake output. :wink:

    Martin

A snippet from one of you patches:

         dsu = []

- for a in self.patches:
- dsu.append( (a.get_zorder(), a, a.draw, [renderer]))
+ [ dsu.append( (x.get_zorder(), x, x.draw, [renderer])) for x in self.patches ]

Yes, we certainly should use list-comprehensions here, but you are using it incorrectly. It should be:

dsu = [(x.get_zorder(), x, x.draw, [renderer])) for x in self.patches ]

And then, further down, do the following:

dsu.extend((x.get_zorder(), x, x.draw, [renderer])) for x in self.lines)

Note the generator form of the comprehension as opposed to the list comprehension form. List comprehensions should *always* be assigned to something. List comprehensions should only be for replacing the for-append idiom in python.

Thank you though for pointing out parts of the code that can benefit from revisions. I certainly hope you can get this put together as a pull request on github so we can work to make this patch better!

Ben Root

P.S. - I <3 runsnakerun!

Ah, yes... one of the biggest warts in our codebase. Does get_aliases()
really get called that often? If so, I doubt changes with regards to
startwith() is the best use of our time. I would imagine that a refactor
that caches possible aliases. Hell, I would just as soon like to see the
entire aliases framework ripped out and redone more correctly in the first
place.

As for the other startswith() changes, there are some subtle differences in
the change that has to be considered. First, is there a guarantee that the
string being indexed is not empty? startswith() would handle that
correctly, while indexing would throw an exception (and setting up code to
try...catch those exceptions would reduce readability and probably reduce
performance back to where we started).

I think that the *biggest* improvement we are going to get is from your
patch to figure.py because it touches on some very deep code that is
executed very frequently, and we were doing in probably the most
inefficient manner.

Again, I really stress the importance of setting up a github account and
submitting a PR. Once you do that, you are all set to go for contributing
to many other great projects (numpy, scipy, ipython, etc.).

Cheers!
Ben Root

···

On Fri, Aug 9, 2013 at 11:15 AM, Martin Mokrejs <mmokrejs@...3951... > wrote:

Hi Ben,
  thank your for your comments. OK, here are revised patches. I see a hot
spot
in artist.py where the getattr() calls are too expensive. Actually, those
under
the callable() path.