Hi,

I have lots of data acquired via analogue to digital conversion. The data is

consequently represented as integers (often 16 bit resolution). To obtain the

correct signal and plot it, these data must of course be multiplied by a

floating point scale factor. This seems potentially wasteful of resources

(time and memory), especially as I would prefer to keep the original data

untouched.

It occurs to me that a more efficient plotting method would be to plot the

original data but scale the axes by the appropriate factor. In that way a

simple numpy array view could be passed to plot. Does a method for doing this

exist? I think I can do it in a rather convoluted way by plotting the

original data and then superimposing empty axes at the adjusted scale.

However, I haven't yet tested this and I'm a bit skeptical about the overhead

of two plots. Another possibility might be the units mechanism, but according

to the documentation that is discouraged, and it might be awkward to

implement.

If the possibility doesn't exist, I wonder whether it might be feasible - and

not too difficult - to add to the axis methods? One could add a scale

parameter with a default value of 1 that should not affect existing code.

Boris

Boris Barbour wrote:

Hi,

I have lots of data acquired via analogue to digital conversion. The data is consequently represented as integers (often 16 bit resolution). To obtain the correct signal and plot it, these data must of course be multiplied by a floating point scale factor. This seems potentially wasteful of resources (time and memory), especially as I would prefer to keep the original data untouched.

I don't understand this last clause; scaling your original integer data prior to plotting does not in any way inhibit your storage and use of that original integer data.

It occurs to me that a more efficient plotting method would be to plot the original data but scale the axes by the appropriate factor. In that way a simple numpy array view could be passed to plot. Does a method for doing this exist? I think I can do it in a rather convoluted way by plotting the original data and then superimposing empty axes at the adjusted scale. However, I haven't yet tested this and I'm a bit skeptical about the overhead of two plots. Another possibility might be the units mechanism, but according to the documentation that is discouraged, and it might be awkward to implement.

If the possibility doesn't exist, I wonder whether it might be feasible - and not too difficult - to add to the axis methods? One could add a scale parameter with a default value of 1 that should not affect existing code.

For ordinary plots in matplotlib the data will be converted to double precision anyway, and the time required for you to do your own scaling and conversion is utterly negligible compared to the total plotting time. I don't think it will make any difference in memory usage, either. Matplotlib uses asarray(), so there will not be a copy if the input is already a double precision array.

It sounds like you may be thinking about optimizations in the wrong place. Are you actually running up against speed or memory problems?

Eric

The easiest way is to define a custom formatter -- this is responsible

for taking your numeric data and converting it to strings for the tick

labels and navigation toolbar coordinate reporting. Eg

import numpy as np

import matplotlib.pyplot as plt

import matplotlib.ticker as ticker

t = np.arange(1000)*0.01

s = (np.random.rand(1000)*4096).astype(int)

# this controls the formatting of the tick labels

class VoltFormatter(ticker.Formatter):

"""

take input and convert to +/- 5V 0->-5, 2048->0, 4096->5

"""

def __call__(self, x, pos=None):

return '%1.2f'%(5*(x-2048)/4096.)

formatter = VoltFormatter()

fig = plt.figure()

ax = fig.add_subplot(111)

ax.plot(t, s)

ax.yaxis.set_major_formatter(formatter)

plt.show()

One problem with this solution is that the tick choices are poor,

since the tick locator doesn't know where to put multiple of volts.

To solve this, you can write your own locator, eg as described in the

user's guide, to place ticks on multiples of the integer scale.

But as Eric notes, mpl will be converting your data under the hoods to

doubles anyway, so you won't be getting any space and cpu savings

## ···

On Sun, Aug 10, 2008 at 8:06 AM, Boris Barbour <barbour@...2125...> wrote:

Hi,

I have lots of data acquired via analogue to digital conversion. The data is

consequently represented as integers (often 16 bit resolution). To obtain the

correct signal and plot it, these data must of course be multiplied by a

floating point scale factor. This seems potentially wasteful of resources

(time and memory), especially as I would prefer to keep the original data

untouched.

It occurs to me that a more efficient plotting method would be to plot the

original data but scale the axes by the appropriate factor. In that way a

simple numpy array view could be passed to plot. Does a method for doing this

exist? I think I can do it in a rather convoluted way by plotting the

original data and then superimposing empty axes at the adjusted scale.

However, I haven't yet tested this and I'm a bit skeptical about the overhead

of two plots. Another possibility might be the units mechanism, but according

to the documentation that is discouraged, and it might be awkward to

implement.

Eric and John,

Thanks for the information. You are right that this probably would have been a

premature optimisation, even if it weren't rendered useless by matplotlib

using doubles internally (which I hadn't realised). The thought just occurred

to me as I was writing the data-scaling part of my script.

The script is intended to be somewhat interactive. Initial tests suggest that

plotting or updating several subplots from memory does take a quite

noticeable time (e.g. 1.2 -- 1.5 seconds for 3 subplots of 10000 points) that

will probably become annoying in routine use. As you indicated, basically all

that time is spent within matplotlib. I'm just using standard default calls:

for i in subplot

subplot

plot

xlabel

ylabel

title

Each of these calls seems to take roughly the same time (60--100ms). If

anybody has pointers on speeding things up significantly, I'm all ears.

(Predefining data limits? Using lower-level commands? Use of a non-default

backend?)

Boris

Boris Barbour wrote:

Eric and John,

Thanks for the information. You are right that this probably would have been a premature optimisation, even if it weren't rendered useless by matplotlib using doubles internally (which I hadn't realised). The thought just occurred to me as I was writing the data-scaling part of my script.

The script is intended to be somewhat interactive. Initial tests suggest that plotting or updating several subplots from memory does take a quite noticeable time (e.g. 1.2 -- 1.5 seconds for 3 subplots of 10000 points) that will probably become annoying in routine use. As you indicated, basically all that time is spent within matplotlib. I'm just using standard default calls:

for i in subplot

subplot

plot

xlabel

ylabel

title

Each of these calls seems to take roughly the same time (60--100ms). If

It sounds like you have interactive mode on, in which case each pylab function redraws the figure. The solution is to use the object-oriented interface for almost everything. See the attached example.

anybody has pointers on speeding things up significantly, I'm all ears. (Predefining data limits? Using lower-level commands? Use of a non-default backend?)

If the suggestion above is not enough, we will need to know more about what your script looks like, the environment in which it is running (e.g., ipython? embedded in wx? straight command line? what operating system? what backend?), your constraints, and what you are trying to accomplish. The best thing would be if you could post a very short self-contained script, typically using fake random data, that shows your present approach and that illustrates the speed problem; then we can try to figure out what the bottlenecks are, and whether there are simple ways to speed up the script or to modify mpl for better speed.

Eric

xy.py (602 Bytes)

It sounds like you have interactive mode on, in which case each pylab

function redraws the figure.

Yes - it was that simple (and stupid); thanks for your patience. Turning off

interactive mode and using the set_data approach leads to an execution time

of about 0.05 seconds (~30-fold speed-up), which is _fine_.

Thanks again for yor help.

Boris