Scaling axes not data?

Hi,

I have lots of data acquired via analogue to digital conversion. The data is
consequently represented as integers (often 16 bit resolution). To obtain the
correct signal and plot it, these data must of course be multiplied by a
floating point scale factor. This seems potentially wasteful of resources
(time and memory), especially as I would prefer to keep the original data
untouched.

It occurs to me that a more efficient plotting method would be to plot the
original data but scale the axes by the appropriate factor. In that way a
simple numpy array view could be passed to plot. Does a method for doing this
exist? I think I can do it in a rather convoluted way by plotting the
original data and then superimposing empty axes at the adjusted scale.
However, I haven't yet tested this and I'm a bit skeptical about the overhead
of two plots. Another possibility might be the units mechanism, but according
to the documentation that is discouraged, and it might be awkward to
implement.

If the possibility doesn't exist, I wonder whether it might be feasible - and
not too difficult - to add to the axis methods? One could add a scale
parameter with a default value of 1 that should not affect existing code.

Boris

Boris Barbour wrote:

Hi,

I have lots of data acquired via analogue to digital conversion. The data is consequently represented as integers (often 16 bit resolution). To obtain the correct signal and plot it, these data must of course be multiplied by a floating point scale factor. This seems potentially wasteful of resources (time and memory), especially as I would prefer to keep the original data untouched.

I don't understand this last clause; scaling your original integer data prior to plotting does not in any way inhibit your storage and use of that original integer data.

It occurs to me that a more efficient plotting method would be to plot the original data but scale the axes by the appropriate factor. In that way a simple numpy array view could be passed to plot. Does a method for doing this exist? I think I can do it in a rather convoluted way by plotting the original data and then superimposing empty axes at the adjusted scale. However, I haven't yet tested this and I'm a bit skeptical about the overhead of two plots. Another possibility might be the units mechanism, but according to the documentation that is discouraged, and it might be awkward to implement.

If the possibility doesn't exist, I wonder whether it might be feasible - and not too difficult - to add to the axis methods? One could add a scale parameter with a default value of 1 that should not affect existing code.

For ordinary plots in matplotlib the data will be converted to double precision anyway, and the time required for you to do your own scaling and conversion is utterly negligible compared to the total plotting time. I don't think it will make any difference in memory usage, either. Matplotlib uses asarray(), so there will not be a copy if the input is already a double precision array.

It sounds like you may be thinking about optimizations in the wrong place. Are you actually running up against speed or memory problems?

Eric

The easiest way is to define a custom formatter -- this is responsible
for taking your numeric data and converting it to strings for the tick
labels and navigation toolbar coordinate reporting. Eg

    import numpy as np
    import matplotlib.pyplot as plt
    import matplotlib.ticker as ticker

    t = np.arange(1000)*0.01
    s = (np.random.rand(1000)*4096).astype(int)

    # this controls the formatting of the tick labels
    class VoltFormatter(ticker.Formatter):
        """
        take input and convert to +/- 5V 0->-5, 2048->0, 4096->5
        """
        def __call__(self, x, pos=None):
            return '%1.2f'%(5*(x-2048)/4096.)

    formatter = VoltFormatter()
    fig = plt.figure()
    ax = fig.add_subplot(111)
    ax.plot(t, s)

    ax.yaxis.set_major_formatter(formatter)

    plt.show()

One problem with this solution is that the tick choices are poor,
since the tick locator doesn't know where to put multiple of volts.
To solve this, you can write your own locator, eg as described in the
user's guide, to place ticks on multiples of the integer scale.

But as Eric notes, mpl will be converting your data under the hoods to
doubles anyway, so you won't be getting any space and cpu savings

···

On Sun, Aug 10, 2008 at 8:06 AM, Boris Barbour <barbour@...2125...> wrote:

Hi,

I have lots of data acquired via analogue to digital conversion. The data is
consequently represented as integers (often 16 bit resolution). To obtain the
correct signal and plot it, these data must of course be multiplied by a
floating point scale factor. This seems potentially wasteful of resources
(time and memory), especially as I would prefer to keep the original data
untouched.

It occurs to me that a more efficient plotting method would be to plot the
original data but scale the axes by the appropriate factor. In that way a
simple numpy array view could be passed to plot. Does a method for doing this
exist? I think I can do it in a rather convoluted way by plotting the
original data and then superimposing empty axes at the adjusted scale.
However, I haven't yet tested this and I'm a bit skeptical about the overhead
of two plots. Another possibility might be the units mechanism, but according
to the documentation that is discouraged, and it might be awkward to
implement.

Eric and John,

Thanks for the information. You are right that this probably would have been a
premature optimisation, even if it weren't rendered useless by matplotlib
using doubles internally (which I hadn't realised). The thought just occurred
to me as I was writing the data-scaling part of my script.

The script is intended to be somewhat interactive. Initial tests suggest that
plotting or updating several subplots from memory does take a quite
noticeable time (e.g. 1.2 -- 1.5 seconds for 3 subplots of 10000 points) that
will probably become annoying in routine use. As you indicated, basically all
that time is spent within matplotlib. I'm just using standard default calls:

for i in subplot
    subplot
    plot
    xlabel
    ylabel
    title

Each of these calls seems to take roughly the same time (60--100ms). If
anybody has pointers on speeding things up significantly, I'm all ears.
(Predefining data limits? Using lower-level commands? Use of a non-default
backend?)

Boris

Boris Barbour wrote:

Eric and John,

Thanks for the information. You are right that this probably would have been a premature optimisation, even if it weren't rendered useless by matplotlib using doubles internally (which I hadn't realised). The thought just occurred to me as I was writing the data-scaling part of my script.

The script is intended to be somewhat interactive. Initial tests suggest that plotting or updating several subplots from memory does take a quite noticeable time (e.g. 1.2 -- 1.5 seconds for 3 subplots of 10000 points) that will probably become annoying in routine use. As you indicated, basically all that time is spent within matplotlib. I'm just using standard default calls:

for i in subplot
    subplot
    plot
    xlabel
    ylabel
    title

Each of these calls seems to take roughly the same time (60--100ms). If

It sounds like you have interactive mode on, in which case each pylab function redraws the figure. The solution is to use the object-oriented interface for almost everything. See the attached example.

anybody has pointers on speeding things up significantly, I'm all ears. (Predefining data limits? Using lower-level commands? Use of a non-default backend?)

If the suggestion above is not enough, we will need to know more about what your script looks like, the environment in which it is running (e.g., ipython? embedded in wx? straight command line? what operating system? what backend?), your constraints, and what you are trying to accomplish. The best thing would be if you could post a very short self-contained script, typically using fake random data, that shows your present approach and that illustrates the speed problem; then we can try to figure out what the bottlenecks are, and whether there are simple ways to speed up the script or to modify mpl for better speed.

Eric

xy.py (602 Bytes)

It sounds like you have interactive mode on, in which case each pylab
function redraws the figure.

Yes - it was that simple (and stupid); thanks for your patience. Turning off
interactive mode and using the set_data approach leads to an execution time
of about 0.05 seconds (~30-fold speed-up), which is _fine_.

Thanks again for yor help.

Boris