Slow imshow when zooming or panning with several synced subplots

I'm plotting several images at once, sharing axes, because I use it
for exploratory purposes. Each image is the same satellite image at
different dates. I'm experimenting a slow response from matplotlib
when zooming and panning, and I would like to ask for any tips that
could speed up the process.

What I am doing now is:
    - Load data from several netcdf files.
    - Calculate maximum value of all the data, for normalization.
    - Create a grid of subplots using ImageGrid. As each subplot is
generated, I delete the array to free some memory (each array is
stored in a list, the "deletion" is just a list.pop()). See the code
below.

It's 15 images, single-channel, of 4600x3840 pixels each. I've noticed
that the bottleneck is not the RAM (I have 8 GB), but the processor.
Python spikes to 100% usage on one of the cores when zooming or
panning (it's an Intel(R) Core(TM) i5-2500 CPU @ 3.30GHz, 4 cores, 64
bit).

The code is:

···

-------------------------------------------
import os
import sys

import numpy as np
import netCDF4 as ncdf
import matplotlib.pyplot as plt
from mpl_toolkits.axes_grid1 import ImageGrid
from matplotlib.colors import LogNorm

MIN = 0.001 # Hardcoded minimum data value used in normalization

variable = 'conc_chl'
units = r'$mg/m^3$'
data = []
dates = []

# Get a list of only netCDF files
filelist = os.listdir(sys.argv[1])
filelist = [f for f in filelist if os.path.splitext(f)[1] == '.nc']
filelist.sort()
filelist.reverse()

# Load data and extract dates from filenames
for f in filelist:
    dataset = ncdf.Dataset(os.path.join(sys.argv[1],f), 'r')
    data.append(dataset.variables[variable][:])
    dataset.close()
    dates.append((f.split('_')[2][:-3],f.split('_')[1]))

# Get the maximum value of all data. Will be used for normalization
maxc = np.array(data).max()

# Plot the grid of images + dates
fig = plt.figure()
grid = ImageGrid(fig, 111,\
        nrows_ncols = (3, 5),\
        axes_pad = 0.0,\
        share_all=True,\
        aspect = False,\
        cbar_location = "right",\
        cbar_mode = "single",\
        cbar_size = '2.5%',\
        )
for g in grid:
    v = data.pop()
    d = dates.pop()
    im = g.imshow(v, interpolation='none', norm=LogNorm(), vmin=MIN, vmax=maxc)
    g.text(0.01, 0.01, '-'.join(d), transform = g.transAxes) # Date on a corner
cticks = np.logspace(np.log10(MIN), np.log10(maxc), 5)
cbar = grid.cbar_axes[0].colorbar(im)
cbar.ax.set_yticks(cticks)
cbar.ax.set_yticklabels([str(np.round(t, 2)) for t in cticks])
cbar.set_label_text(units)

# Fine-tune figure; make subplots close to each other and hide x ticks for
# all
fig.subplots_adjust(left=0.02, bottom=0.02, right=0.95, top=0.98,
hspace=0, wspace=0)
grid.axes_llc.set_yticklabels([], visible=False)
grid.axes_llc.set_xticklabels([], visible=False)

plt.show()
-------------------------------------------

Any clue about what could be improved to make it more responsive?

PD: This question has been posted previously on Stackoverflow, but it
hasn't got any answer:
http://stackoverflow.com/questions/10635901/slow-imshow-when-zooming-or-panning-with-several-synced-subplots

Hello

What is the size of a single image file? If they are very big, it is better to do everything from processing to ploting at once for each file.

I'm plotting several images at once, sharing axes, because I use it
for exploratory purposes. Each image is the same satellite image at
different dates. I'm experimenting a slow response from matplotlib
when zooming and panning, and I would like to ask for any tips that
could speed up the process.

What I am doing now is:
     - Load data from several netcdf files.
     - Calculate maximum value of all the data, for normalization.
     - Create a grid of subplots using ImageGrid. As each subplot is
generated, I delete the array to free some memory (each array is
stored in a list, the "deletion" is just a list.pop()). See the code
below.

It's 15 images, single-channel, of 4600x3840 pixels each.

This is a lot of data. 8bit or 16bit ?

I've noticed
that the bottleneck is not the RAM (I have 8 GB), but the processor.
Python spikes to 100% usage on one of the cores when zooming or
panning (it's an Intel(R) Core(TM) i5-2500 CPU @ 3.30GHz, 4 cores, 64
bit).

The code is:
-------------------------------------------
import os
import sys

import numpy as np
import netCDF4 as ncdf
import matplotlib.pyplot as plt
from mpl_toolkits.axes_grid1 import ImageGrid
from matplotlib.colors import LogNorm

MIN = 0.001 # Hardcoded minimum data value used in normalization

variable = 'conc_chl'
units = r'$mg/m^3$'
data = []
dates = []

# Get a list of only netCDF files
filelist = os.listdir(sys.argv[1])
filelist = [f for f in filelist if os.path.splitext(f)[1] == '.nc']
filelist.sort()
filelist.reverse()

# Load data and extract dates from filenames
for f in filelist:

everything should happen in this loop

     dataset = ncdf.Dataset(os.path.join(sys.argv[1],f), 'r')
     data.append(dataset.variables[variable][:])

instead of creating this big list, use a temporary array (which will be overwritten)

···

Le 23/05/2012 10:11, Sergi Pons Freixes a écrit :

     dataset.close()
     dates.append((f.split('_')[2][:-3],f.split('_')[1]))

# Get the maximum value of all data. Will be used for normalization
maxc = np.array(data).max()

# Plot the grid of images + dates
fig = plt.figure()
grid = ImageGrid(fig, 111,\
         nrows_ncols = (3, 5),\
         axes_pad = 0.0,\
         share_all=True,\
         aspect = False,\
         cbar_location = "right",\
         cbar_mode = "single",\
         cbar_size = '2.5%',\
         )
for g in grid:
     v = data.pop()
     d = dates.pop()
     im = g.imshow(v, interpolation='none', norm=LogNorm(), vmin=MIN, vmax=maxc)
     g.text(0.01, 0.01, '-'.join(d), transform = g.transAxes) # Date on a corner
cticks = np.logspace(np.log10(MIN), np.log10(maxc), 5)
cbar = grid.cbar_axes[0].colorbar(im)
cbar.ax.set_yticks(cticks)
cbar.ax.set_yticklabels([str(np.round(t, 2)) for t in cticks])
cbar.set_label_text(units)

# Fine-tune figure; make subplots close to each other and hide x ticks for
# all
fig.subplots_adjust(left=0.02, bottom=0.02, right=0.95, top=0.98,
hspace=0, wspace=0)
grid.axes_llc.set_yticklabels([], visible=False)
grid.axes_llc.set_xticklabels([], visible=False)

plt.show()
-------------------------------------------

Any clue about what could be improved to make it more responsive?

PD: This question has been posted previously on Stackoverflow, but it
hasn't got any answer:
http://stackoverflow.com/questions/10635901/slow-imshow-when-zooming-or-panning-with-several-synced-subplots

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-users

Hello

What is the size of a single image file? If they are very big, it is
better to do everything from processing to ploting at once for each file.

As stated below, each image is single-channel, of 4600x3840 pixels. As
you can see on the code, there is not much processing, just loading
the images and plotting them. What it's slow is not the execution of
the code, is the interactive zooming and panning once the plots "are
in the screen".

It's 15 images, single-channel, of 4600x3840 pixels each.

This is a lot of data. 8bit or 16bit ?

They are floating point values (for example, from 0 to 45.xxx). If I
understood correctly, setting the vmin and vmax, matplotlib should
normalize the values to an appropriate number of bits.

for f in filelist:

everything should happen in this loop

 dataset = ncdf\.Dataset\(os\.path\.join\(sys\.argv\[1\],f\), 'r'\)
 data\.append\(dataset\.variables\[variable\]\[:\]\)

instead of creating this big list, use a temporary array (which will be
overwritten)

 dataset\.close\(\)
 dates\.append\(\(f\.split\('\_'\)\[2\]\[:\-3\],f\.split\('\_'\)\[1\]\)\)

Why? It's true that this way at the beginning it eats a lot of RAM,
but then it is released after each pop() (and calculating the maximum
of all the data without plotting is needed to use the same
normalization level on all the plots). Anyway, the slowness ocurrs
during the interaction of the plot, not during the execution of the
code.

···

On Wed, May 23, 2012 at 11:00 AM, Guillaume Gay <guillaume@...4007...> wrote:

Hello

What is the size of a single image file? If they are very big, it is
better to do everything from processing to ploting at once for each file.

As stated below, each image is single-channel, of 4600x3840 pixels. As
you can see on the code, there is not much processing, just loading
the images and plotting them. What it's slow is not the execution of
the code, is the interactive zooming and panning once the plots "are
in the screen".

It's 15 images, single-channel, of 4600x3840 pixels each.

This is a lot of data. 8bit or 16bit ?

They are floating point values (for example, from 0 to 45.xxx). If I
understood correctly, setting the vmin and vmax, matplotlib should
normalize the values to an appropriate number of bits.

for f in filelist:

everything should happen in this loop

      dataset = ncdf.Dataset(os.path.join(sys.argv[1],f), 'r')
      data.append(dataset.variables[variable][:])

instead of creating this big list, use a temporary array (which will be
overwritten)

      dataset.close()
      dates.append((f.split('_')[2][:-3],f.split('_')[1]))

Why? It's true that this way at the beginning it eats a lot of RAM,
but then it is released after each pop()

oh I didn't see the pop()...

So now then I don't know...

Do you have to show them full-scale? Maybe you can just use thumbnails of sort?

G.

···

Le 23/05/2012 15:04, Sergi Pons Freixes a écrit :

On Wed, May 23, 2012 at 11:00 AM, Guillaume Gay > <guillaume@...4007...> wrote:

(and calculating the maximum
of all the data without plotting is needed to use the same
normalization level on all the plots). Anyway, the slowness ocurrs
during the interaction of the plot, not during the execution of the
code.

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-users

I’m not sure what you mean by “normalize the values to an appropriate number of bits”, but I don’t think setting vmin or vmax will change the data type of the image. So if you have 64-bit floating point images (100+ Mb per image), then that’s what you’re going to be moving/scaling when you pan and zoom.

-Tony

···

On Wed, May 23, 2012 at 9:04 AM, Sergi Pons Freixes <sponsfreixes@…287…> wrote:

On Wed, May 23, 2012 at 11:00 AM, Guillaume Gay > > <guillaume@…4007…> wrote:

Hello

What is the size of a single image file? If they are very big, it is

better to do everything from processing to ploting at once for each file.

As stated below, each image is single-channel, of 4600x3840 pixels. As

you can see on the code, there is not much processing, just loading

the images and plotting them. What it’s slow is not the execution of

the code, is the interactive zooming and panning once the plots "are

in the screen".

It’s 15 images, single-channel, of 4600x3840 pixels each.

This is a lot of data. 8bit or 16bit ?

They are floating point values (for example, from 0 to 45.xxx). If I

understood correctly, setting the vmin and vmax, matplotlib should

normalize the values to an appropriate number of bits.

I was just guessing that it is part of the process of converting
actual data (32 bit floats) to images on the screen (24 bit for RGB
(32 with transparency) or 8 bit for grayscale).

I tried converting the data to 8 bit, with .astype('uint8'), and it
keeps being poorly responsive on zooming and panning.

···

On Wed, May 23, 2012 at 6:27 PM, Tony Yu <tsyu80@...287...> wrote:

I'm not sure what you mean by "normalize the values to an appropriate number
of bits", but I don't think setting `vmin` or `vmax` will change the data
type of the image. So if you have 64-bit floating point images (100+ Mb per
image), then that's what you're going to be moving/scaling when you pan and
zoom.

It seems that setting interpolation='none' is significantly slower than setting it to ‘nearest’ (or even ‘bilinear’). On supported backends (e.g. any Agg backend) the code paths for ‘none’ and ‘nearest’ are different: ‘nearest’ gets passed to Agg’s interpolation routine, whereas ‘none’ does an unsampled rescale of the image (I’m just reading the code comments here). Could you check whether changing to interpolation='nearest' fixes this issue?

-Tony

(Note: copied to stackoverflow)

PS: These different approaches do give different qualitative results; for example, the code snippet below gives a slight moiré pattern, which doesn’t appear when interpolation='none'. I think that ‘none’ is roughly the same as ‘nearest’ when zooming in (image pixels are larger than screen pixels) but gives a higher-order interpolation result when zooming out (image pixels smaller than screen pixels). I think the delay comes from some extra Matplotlib/Python calculations needed for the rescaling.

···

On Thu, May 24, 2012 at 9:14 AM, Sergi Pons Freixes <sponsfreixes@…287…> wrote:

On Wed, May 23, 2012 at 6:27 PM, Tony Yu <tsyu80@…287…> wrote:

I’m not sure what you mean by "normalize the values to an appropriate number

of bits", but I don’t think setting vmin or vmax will change the data

type of the image. So if you have 64-bit floating point images (100+ Mb per

image), then that’s what you’re going to be moving/scaling when you pan and

zoom.

I was just guessing that it is part of the process of converting

actual data (32 bit floats) to images on the screen (24 bit for RGB

(32 with transparency) or 8 bit for grayscale).

I tried converting the data to 8 bit, with .astype(‘uint8’), and it

keeps being poorly responsive on zooming and panning.

#~~~

import matplotlib.pyplot as plt

import numpy as np

img = np.random.uniform(0, 255, size=(2000, 2000)).astype(np.uint8)

plt.imshow(img, interpolation=‘nearest’)

plt.show()

It seems that setting `interpolation='none'` is significantly slower than
setting it to 'nearest' (or even 'bilinear'). On supported backends (e.g.
any Agg backend) the code paths for 'none' and 'nearest' are different:
'nearest' gets passed to Agg's interpolation routine, whereas 'none' does an
unsampled rescale of the image (I'm just reading the code comments here).
Could you check whether changing to `interpolation='nearest'` fixes this
issue?

Yes, changing it really speeds-up the interactivity! The delay is now
just a few ms, you can notice it's not completely smooth, but
perfectly usable. I'll compare if when zoomed in any
artifacts/distortion appear.

Thank you!