Some remarks/questions about perceived slowness of matplotlib

John_Hunter · December 12, 2006, 5:04pm

Hi, I am a regular user of matplotlib since I moved from

    > matlab to python/numpy/scipy. Even if I find matplotlib to
    > be a real help during the transition from matlab to python,
    > I must confess I found it the most disappointing compare
    > other packages ( essentially numpy/scipy/ipython). This is

Meatloaf: Now don't be sad, cause two out of three ain't bad

If you consider the fact that matplotlib was originally an ipython
patch that was rejected, you can see why we are such a bastard child
of the scientific python world. There is a seed of truth in this;
Numeric, scipy and ipython were all mature packages in widespread use
before the first line of matplotlib code was written. So they are
farther along in terms of maturity, documentation, usability,
etc... than matplotlib is.

But we've achieved a lot in a comparably short time. When I started
working on matplotlib there were probably two dozen plotting packages
that people used and recommended. Now we are down to 5 or 6, with
matplotlib doing most of what most people need. I've focused on
making something that does most of what people (and I) need rather
than doing it the fastest, so it is too slow for some purposes but
fast enough for most. When we get a well defined important test case
that is too slow, we typically try and optimize it, sometimes with
dramatic results (eg 25 fold speedups); more on this below.

A consequence of trying to support most of the needs of most users is
this: we run on all major operating systems and all major GUIs with
all major array packages. Consider the combinatorial problem: 5
graphical user interfaces with two or more versions in the wild across
3 operating systems and you will get a feel for what the support
problem we have. This is not an academic point. Most of the GUI
maintainers for *a single backend* burn out in short order. Most
graphics packages *solve* this problem by supporting a single output
format (PYX) or GUI (chaco) which is a damned fine and admirable
solution. But the consequence of this is plotting fragmentation:
people who need GTK cannot use Chaco, people who need SVG cannot use
PYX, and so on, and so they'll write their own plotting library for
their own GUI or output format (the situation before matplotlib). You
can certainly get closer to bare metal speed by reducing choices and
focusing on a single target -- part of the performance price we pay is
in our abstraction layers, part is in trying to support features that
may be rarely used but cost something (masked array support, rotated
text with newlines), part is because we need to get to work and
optimize the slow parts.

    > not a rant; I want to know if this slowness is coming from
    > my lack of matplotlib knowledge or not; I apologize in
    > advance if the following hurts anyone feelings

Meatloaf: But -- there ain't no way I'm ever gonna love you

OK, I'll stop now.

    > First, I must admit that whereas I took a significant
    > amount of time to study numpy and scipy, I didn't take that
    > same time for matplotlib. So this disappointment may just
    > be a consequences of this laziness.

I suspect this is partly true; see below.

    > My main problem with matplotlib is speed: I find it
    > really annoying to use in an interactive manner. For
    > example, when I need to display some 2d information, such
    > as spectrogramm or correlogram, this take 1 or 2 seconds
    > for a small signal (~4500 frames of 256 samples). My
    > function correlogram (similar to specgram, but compute
    > correlation instead of log spectrum) uses imshow, and this
    > function takes 20 times more time than imagesc of matlab
    > for the same size. Also, I found changing the size of the

This is where you can help us. Saying specgram is slow is only
marginally more useful than saying matplotlib is slow or python is
slow. What is helpful is to post a complete, free-standing script
that we can run, with some attached performance numbers. For
starters, just run it with the Agg backend so we can isolate
matplotlib from the respective GUIs. Show us how the performance
scales with the specgram parameters (frames and samples). specgram is
divided into two parts (if you look at the Axes.specgram you will see
that it calls matplotlib.mlab.specgram to do the computation and
Axes.imshow to visualize it. Which part is slow: the mlab.specgram
computation or the visualizion (imshow) part or both? You can paste
this function into your own python file and start timing different
parts. The most helpful "this is slow" posts come with profiler
output so we can see where the bottlenecks are.

Such a post by Fernando Perez on "plot" with markers yielded
performance boosts of 25x for large numbers of points when he showed
we were making about one hundred thousand function calls per plot.

    > matplotlib window really 'annoying to the eye': I compared
    > to matlab, and this may be due to the fact that the whole
    > window is redrawn with matplotlib, including the toolbar,
    > whereas in matlab, the top toolbar is not redrawn.

It would be nice if we exposed the underlying GTK widgets to you so
you could customize the "expand" and "fill" properties of the gtk
toolbar, but this gets us into the multiple GUI, multiple version
problem discussed above. Providing an abstract interface to such
details that works across the mpl backends is a lot of work that takes
us away from our core incompetency -- plotting. What we do is enable
you to write your own widgets and embed mpl in them; see
examples/embedding_in_gtk2.py which shows you how to do this for
GTK/GTKAgg. You can then customize the toolbar to your heart's
content.

    > Finally, plotting many data (using plot(X, Y) with X and Y
    > around 1000/10000 samples) is 'slow' (the '' are because I
    > don't know much about computer graphics, and I understand
    > that slow in the rendering is often just a perception)

This shouldn't be slow -- again a test script with some performance
numbers would help so we can compare what we are getting. One
thought: make sure you are using the numerix layer properly -- ie, if
you are creating arrays with numpy make sure you have numerix set to
numpy ( i see below that you set numerix to numpy but
--verbose-helpful will confirm the setting). A good way to start is
to write a demonstration script that you find too slow which makes a
call to savefig, and run it with

> time myscript.py --verbose-helpful -dAgg

and post the output and script. Then we might be able to help.

    > So, is this a current limitation of matplotlib, is
    > matplotlib optimized for good rendering for publication,
    > and not for interactive use, or I am just misguided in my
    > use of matplotlib ?

Many people use it interactively, but a number of power users find it
slow.

JDH

_Fernando_Perez2 · December 12, 2006, 5:25pm

It may be worth mentioning here this little utility (Linux only, unfortunately):

http://amath.colorado.edu/faculty/fperez/python/profiling/

For profiling more complex codes, it's really a godsend. And note
that the generated cachegrind files are typically small and can be
sent to others for analysis, so you can run it locally (if for example
the run depends on data you can't share) and then send to the list the
generated profile. Anyone with Kcachegrind will then be able to load
your profile info and study it in detail.

Cheers,

f

···

On 12/12/06, John Hunter <jdhunter@...4...> wrote:

--verbose-helpful will confirm the setting). A good way to start is
to write a demonstration script that you find too slow which makes a
call to savefig, and run it with

> time myscript.py --verbose-helpful -dAgg

David_Cournapeau2 · December 13, 2006, 7:05am

John Hunter wrote:

This is where you can help us. Saying specgram is slow is only
marginally more useful than saying matplotlib is slow or python is
slow. What is helpful is to post a complete, free-standing script
that we can run, with some attached performance numbers. For
starters, just run it with the Agg backend so we can isolate
matplotlib from the respective GUIs. Show us how the performance
scales with the specgram parameters (frames and samples). specgram is
divided into two parts (if you look at the Axes.specgram you will see
that it calls matplotlib.mlab.specgram to do the computation and
Axes.imshow to visualize it. Which part is slow: the mlab.specgram
computation or the visualizion (imshow) part or both? You can paste
this function into your own python file and start timing different
parts. The most helpful "this is slow" posts come with profiler
output so we can see where the bottlenecks are.

(sorry for double posting)

Ok, here we go: I believe that the rendering of the figure returned by imshow to be slow.

For example, let's say I have a 2 minutes signal @ 8kHz sampling-rate, with windows of 256 samples with 50 % overlap. I have around 64 frames / seconds, eg ~ 8000 frames of 256 windows.

So for benchmark purposes, we can just send random data of shape 8000x256 to imshow. In ipython, this takes a long time (around 2 seconds for imshow(data), where data = random(8000, 256)).

Now, on a small script to have a better idea:

import numpy as N
import pylab as P

def generate_data_2d(fr, nwin, hop, len):
nframes = 1.0 * fr / hop * len
return N.random.randn(nframes, nwin)

def bench_imshow(fr, nwin, hop, len, show = True):
    data = generate_data_2d(fr, nwin, hop, len) P.imshow(data)
    if show:
        P.show()

if __name__ == '__main__':
# 2 minutes (120 sec) of sounds @ 8 kHz with 256 samples with 50 % overlap
bench_imshow(8000, 256, 128, 120, show = False)

Now, I have a problem, because I don't know how to benchmark when using show to True (I have to manually close the figure).

If I run the above script with time, I got 1.5 seconds with show = False (after several trials to be sure matplotlib files are in the system cache: this matters because my home dir is on NFS). If I set show = True, and close the figure by hand once the figure is plotted, I have 4.5 sec instead.

If I run the above script with -dAgg --versbose-helpful (I was looking for this one to check numerix is correctly set to numpy:) ):

with show = False:

matplotlib data path /home/david/local/lib/python2.4/site-packages/matplotlib/mpl-data
$HOME=/home/david
CONFIGDIR=/home/david/.matplotlib
loaded rc file /home/david/.matplotlib/matplotlibrc
matplotlib version 0.87.7
verbose.level helpful
interactive is False
platform is linux2
numerix numpy 1.0.2.dev3484
font search path ['/home/david/local/lib/python2.4/site-packages/matplotlib/mpl-data']
loaded ttfcache file /home/david/.matplotlib/ttffont.cache
backend Agg version v2.2

real 0m1.185s
user 0m0.808s
sys 0m0.224s

with show = True

matplotlib data path /home/david/local/lib/python2.4/site-packages/matplotlib/mpl-data
$HOME=/home/david
CONFIGDIR=/home/david/.matplotlib
loaded rc file /home/david/.matplotlib/matplotlibrc
matplotlib version 0.87.7
verbose.level helpful
interactive is False
platform is linux2
numerix numpy 1.0.2.dev3484
font search path ['/home/david/local/lib/python2.4/site-packages/matplotlib/mpl-data']
loaded ttfcache file /home/david/.matplotlib/ttffont.cache
backend Agg version v2.2

real 0m1.193s
user 0m0.848s
sys 0m0.192s

So the problem is in the rendering, right ? (Not sure to understand exactly what Agg backend is doing).

Now, using hotshot (kcache grind profiles attached to the email), for the noshow case:

       1 0.001 0.001 0.839 0.839 slowmatplotlib.py:181(bench_imshow_noshow)
       1 0.000 0.000 0.837 0.837 slowmatplotlib.py:163(bench_imshow)
       1 0.000 0.000 0.586 0.586 /home/david/local/lib/python2.4/site-packages/matplotlib/pylab.py:1894(imshow)

3 0.000 0.000 0.510 0.170 /home/david/local/lib/python2.4/site-packages/matplotlib/pylab.py:883(gca)
1 0.000 0.000 0.509 0.509 /home/david/local/lib/python2.4/site-packages/matplotlib/pylab.py:950(ishold)

4 0.000 0.000 0.409 0.102 /home/david/local/lib/python2.4/site-packages/matplotlib/pylab.py:903(gcf)
1 0.000 0.000 0.409 0.409 /home/david/local/lib/python2.4/site-packages/matplotlib/pylab.py:818(figure)

1 0.000 0.000 0.408 0.408 /home/david/local/lib/python2.4/site-packages/matplotlib/backends/backend_gtkagg.py:36(new_figure_manager)

1 0.003 0.003 0.400 0.400 /home/david/local/lib/python2.4/site-packages/matplotlib/backends/backend_gtk.py:401(__init__)

1 0.000 0.000 0.397 0.397 /home/david/local/lib/python2.4/site-packages/matplotlib/backends/backend_gtkagg.py:25(_get_toolbar)

1 0.001 0.001 0.397 0.397 /home/david/local/lib/python2.4/site-packages/matplotlib/backends/backend_gtk.py:496(__init__)

1 0.000 0.000 0.396 0.396 /home/david/local/lib/python2.4/site-packages/matplotlib/backend_bases.py:1112(__init__)

1 0.000 0.000 0.396 0.396 /home/david/local/lib/python2.4/site-packages/matplotlib/backends/backend_gtk.py:557(_init_toolbar)

1 0.008 0.008 0.396 0.396 /home/david/local/lib/python2.4/site-packages/matplotlib/backends/backend_gtk.py:595(_init_toolbar2_4)

1 0.388 0.388 0.388 0.388 /home/david/local/lib/python2.4/site-packages/matplotlib/backends/backend_gtk.py:967(__init__)

       1 0.251 0.251 0.251 0.251 slowmatplotlib.py:155(generate_data_2d)
       3 0.000 0.000 0.101 0.034 /home/david/local/lib/python2.4/site-packages/matplotlib/figure.py:629(gca)
       1 0.000 0.000 0.101 0.101 /home/david/local/lib/python2.4/site-packages/matplotlib/figure.py:449(add_subplot)

1 0.000 0.000 0.100 0.100 /home/david/local/lib/python2.4/site-packages/matplotlib/axes.py:4523(__init__)

1 0.000 0.000 0.100 0.100 /home/david/local/lib/python2.4/site-packages/matplotlib/axes.py:337(__init__)

But the show case is more interesting:

  ncalls tottime percall cumtime percall filename:lineno(function)
       1 0.002 0.002 3.886 3.886 slowmatplotlib.py:177(bench_imshow_show)
       1 0.000 0.000 3.884 3.884 slowmatplotlib.py:163(bench_imshow)
       1 0.698 0.698 3.003 3.003 /home/david/local/lib/python2.4/site-packages/matplotlib/backends/backend_gtk.py:70(show)

2 0.000 0.000 2.266 1.133 /home/david/local/lib/python2.4/site-packages/matplotlib/backends/backend_gtk.py:275(expose_event)

1 0.009 0.009 2.266 2.266 /home/david/local/lib/python2.4/site-packages/matplotlib/backends/backend_gtkagg.py:71(_render_figure)

1 0.000 0.000 2.256 2.256 /home/david/local/lib/python2.4/site-packages/matplotlib/backends/backend_agg.py:385(draw)

1 0.000 0.000 2.253 2.253 /home/david/local/lib/python2.4/site-packages/matplotlib/figure.py:510(draw)

       1 0.000 0.000 2.251 2.251 /home/david/local/lib/python2.4/site-packages/matplotlib/axes.py:994(draw)
       1 0.005 0.005 1.951 1.951 /home/david/local/lib/python2.4/site-packages/matplotlib/image.py:173(draw)
       1 0.096 0.096 1.946 1.946 /home/david/local/lib/python2.4/site-packages/matplotlib/image.py:109(make_image)

1 0.002 0.002 1.850 1.850 /home/david/local/lib/python2.4/site-packages/matplotlib/cm.py:50(to_rgba)
1 0.001 0.001 0.949 0.949 /home/david/local/lib/python2.4/site-packages/matplotlib/colors.py:735(__call__)

1 0.097 0.097 0.899 0.899 /home/david/local/lib/python2.4/site-packages/matplotlib/colors.py:568(__call__)

325 0.050 0.000 0.671 0.002 /home/david/local/lib/python2.4/site-packages/numpy/core/ma.py:533(__init__)

1 0.600 0.600 0.600 0.600 /home/david/local/lib/python2.4/site-packages/numpy/core/fromnumeric.py:282(resize)

1 0.000 0.000 0.596 0.596 /home/david/local/lib/python2.4/site-packages/matplotlib/pylab.py:1894(imshow)

10 0.570 0.057 0.570 0.057 /home/david/local/lib/python2.4/site-packages/numpy/oldnumeric/functions.py:117(where)

3 0.000 0.000 0.513 0.171 /home/david/local/lib/python2.4/site-packages/matplotlib/pylab.py:883(gca)
1 0.000 0.000 0.513 0.513 /home/david/local/lib/python2.4/site-packages/matplotlib/pylab.py:950(ishold)

4 0.000 0.000 0.408 0.102 /home/david/local/lib/python2.4/site-packages/matplotlib/pylab.py:903(gcf)

For more details, see the .kc files which are the in the tbz2 archive, with the script for generating profiles for kcachegrind,

I will post an other email for the other problem (with several subplots)

cheers,

David

slowmatplotlib.tbz2 (22.6 KB)

David_Cournapeau2 · December 13, 2006, 8:35am

David Cournapeau wrote:

But the show case is more interesting:

ncalls tottime percall cumtime percall filename:lineno(function)
      1 0.002 0.002 3.886 3.886 slowmatplotlib.py:177(bench_imshow_show)
      1 0.000 0.000 3.884 3.884 slowmatplotlib.py:163(bench_imshow)
      1 0.698 0.698 3.003 3.003 /home/david/local/lib/python2.4/site-packages/matplotlib/backends/backend_gtk.py:70(show)

      2 0.000 0.000 2.266 1.133 /home/david/local/lib/python2.4/site-packages/matplotlib/backends/backend_gtk.py:275(expose_event)

      1 0.009 0.009 2.266 2.266 /home/david/local/lib/python2.4/site-packages/matplotlib/backends/backend_gtkagg.py:71(_render_figure)

      1 0.000 0.000 2.256 2.256 /home/david/local/lib/python2.4/site-packages/matplotlib/backends/backend_agg.py:385(draw)

      1 0.000 0.000 2.253 2.253 /home/david/local/lib/python2.4/site-packages/matplotlib/figure.py:510(draw)

      1 0.000 0.000 2.251 2.251 /home/david/local/lib/python2.4/site-packages/matplotlib/axes.py:994(draw)

      1 0.005 0.005 1.951 1.951 /home/david/local/lib/python2.4/site-packages/matplotlib/image.py:173(draw)

      1 0.096 0.096 1.946 1.946 /home/david/local/lib/python2.4/site-packages/matplotlib/image.py:109(make_image)

      1 0.002 0.002 1.850 1.850 /home/david/local/lib/python2.4/site-packages/matplotlib/cm.py:50(to_rgba)

      1 0.001 0.001 0.949 0.949 /home/david/local/lib/python2.4/site-packages/matplotlib/colors.py:735(__call__)

      1 0.097 0.097 0.899 0.899 /home/david/local/lib/python2.4/site-packages/matplotlib/colors.py:568(__call__)

    325 0.050 0.000 0.671 0.002 /home/david/local/lib/python2.4/site-packages/numpy/core/ma.py:533(__init__)

      1 0.600 0.600 0.600 0.600 /home/david/local/lib/python2.4/site-packages/numpy/core/fromnumeric.py:282(resize)

      1 0.000 0.000 0.596 0.596 /home/david/local/lib/python2.4/site-packages/matplotlib/pylab.py:1894(imshow)

     10 0.570 0.057 0.570 0.057 /home/david/local/lib/python2.4/site-packages/numpy/oldnumeric/functions.py:117(where)

      3 0.000 0.000 0.513 0.171 /home/david/local/lib/python2.4/site-packages/matplotlib/pylab.py:883(gca)

      1 0.000 0.000 0.513 0.513 /home/david/local/lib/python2.4/site-packages/matplotlib/pylab.py:950(ishold)

      4 0.000 0.000 0.408 0.102 /home/david/local/lib/python2.4/site-packages/matplotlib/pylab.py:903(gcf)

For more details, see the .kc files which are the in the tbz2 archive, with the script for generating profiles for kcachegrind,

Here is some stuff I tried:

   - first, we can see that in expose_event (one is expensive, the other negligeable, from my understanding), two calls are pretty expensive:
the __call__ at line 735 (for normalize functor) and one for __call__ at line 568 (for colormap functor).
   - for normalize functor, one line is expensive: val = ma.array(clip(val.filled(vmax), vmin, vmax), mask=mask). If I put a test on mask when mask is None (which it is in my case), then the function becomes negligeable.
   - for colormap functor, the 3 where calls are expensive. I am not sure to understand in which case they are useful; if I understand correctly, one tries to avoid
values out of range (0, N), and force out of range values to be clipped. Isn't there an easier way than using where ?

If I remove the where in the colormap functor, I have a 4x speed increase for the to_rgba function. After that, it becomes a bit more tricky to change things for someone like me who have no knowledge about matplotlib internals.

Cheers,

David

Eric_Firing1 · December 13, 2006, 6:29pm

David,

   - first, we can see that in expose_event (one is expensive, the other negligeable, from my understanding), two calls are pretty expensive:
the __call__ at line 735 (for normalize functor) and one for __call__ at line 568 (for colormap functor).
   - for normalize functor, one line is expensive: val = ma.array(clip(val.filled(vmax), vmin, vmax), mask=mask). If I put a test on mask when mask is None (which it is in my case), then the function becomes negligeable.
   - for colormap functor, the 3 where calls are expensive. I am not sure to understand in which case they are useful; if I understand correctly, one tries to avoid
values out of range (0, N), and force out of range values to be clipped. Isn't there an easier way than using where ?

   If I remove the where in the colormap functor, I have a 4x speed increase for the to_rgba function. After that, it becomes a bit more tricky to change things for someone like me who have no knowledge about matplotlib internals.

The things you have identified were added by me to support masked array bad values and special colors for regions above or below the mapped range of values. I will be happy to make changes to speed them up.

Regarding the clip line, I think that your test for mask is None is not the right solution because it knocks out the clipping operation, but the clipping is intended regardless of the state of the mask. I had expected it to be a very fast operation, so I am surprised it is a bottleneck; in any case I can take a look to see how it can be sped up, or whether it can be bypassed in some cases. Maybe it is also using "where" internally.

Now I recall very recent discussion explaining why "where" is slow compared to indexing with a boolean, so I know I can speed it up with numpy. Unfortunately Numeric does not support this, so maybe what will be needed is numerix functions that take advantage of numpy when available. This is one of those times when I really wish we could drop Numeric and numarray support *now* and start taking full advantage of numpy.

In any case, thanks for pointing out the slowdowns--I will fix them as best I can--and keep at it. I share your interest in speeding up interactive use of matplotlib, along with fixing bugs, filling holes in functionalisy, and smoothing rough edges. There is a lot to be done. As John noted, though, there will always be tradeoffs among flexibility, code simplicity, generality, and speed.

Eric

_Chris_Barker · December 13, 2006, 7:55pm

Eric Firing wrote:

Regarding the clip line, I think that your test for mask is None is not the right solution because it knocks out the clipping operation, but the clipping is intended regardless of the state of the mask. I had expected it to be a very fast operation,

for what it's worth, a few years ago a wrote a "fast_clip" c extension that did clip without making nearly as many temporary arrays as the Numeric one -- I don't know what numpy does , I haven't needed a fast clip recently. I'd be glad to send the code to anyone interested.

Now I recall very recent discussion explaining why "where" is slow compared to indexing with a boolean, so I know I can speed it up with numpy. Unfortunately Numeric does not support this, so maybe what will be needed is numerix functions that take advantage of numpy when available.

good idea.

This is one of those times when I really wish we could drop Numeric and numarray support *now* and start taking full advantage of numpy.

I'd love that too. Maybe your proposal is a good one, though -- make numeric functions that are optimized for numpy. I think that's a good way to transition.

-Chris

···

--
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception

Chris.Barker@...259...

David_Cournapeau2 · December 14, 2006, 3:07am

Eric Firing wrote:

Regarding the clip line, I think that your test for mask is None is not the right solution because it knocks out the clipping operation, but the clipping is intended regardless of the state of the mask. I had expected it to be a very fast operation, so I am surprised it is a bottleneck; in any case I can take a look to see how it can be sped up, or whether it can be bypassed in some cases. Maybe it is also using "where" internally.

(again, sorry for the double posting, I always forget that some ML do not reply automatically to the ML)

My wordings were vague at best The clipping operation is *not* removed, and it was not the culprit (it becomes a bottleneck once you get the 4x speed issue, though). What I did was:

if self.clip:
               mask = ma.getmaskorNone(val)
               if mask == None:
                   val = ma.array(clip(val.filled(vmax), vmin, vmax))
               else:
                   val = ma.array(clip(val.filled(vmax), vmin, vmax),
                               mask=mask)

Actually, the problem is in ma.array: with a value of mask to None, it should not make a difference between mask = None or no mask arg, right ? I didn't change ma.array to keep my change as local as possible. To change only this operation as above gives a speed up from 1.8 s to ~ 1.0 s for to_rgba, which means calling show goes from ~ 2.2 s to ~1.4 s. I also changed
result = (val-vmin)/float(vmax-vmin)

to

invcache = 1.0 / (vmax - vmin)
result = (val-vmin) * invcache

which gives a moderate speed up (around 100 ms for a 8000x256 points array, still in the 5-10 % range of the whole cost, though, and is not likely to cause any hidden bug). Once you make both those changes, the clip call is by far the most expensive operation in normalize functor, but the functor is not really expensive anymore compared to the rest, so this is not where I looked at after.

For the where calls in Colormap functor, I was wondering if they are necessary in all cases: some of those calls seem redundant, and it may be possible to detect that before calling them. This should be both easier and faster, at least in this case, than having a fast where ?

I understand that support of multiple array backend, support of mask arrays have cost consequences. But it looks like it may be possible to speed things up for cases where an array has only meaningful values/no mask.

cheers,

David