Matplotlib PSD bug?

Fago, Matt - AES wrote:

I cannot really compute the example without the pad_to support in svn. Nevertheless, using something similar (nfft=128, noffset=64) gives similarly erroneous results.

Did you add 'pad_to'? If so, thanks!

Good to know. I recently (within the last month) did a bunch of work on psd, based in a large part on work done by one of my colleagues, Sean Arms. I was worried some of this had broken existing code, but it appears more likely that this was already a problem.

After much playing, and reading the Matlab docs, it looks like the difference is that Matlab normalizes by the sampling frequency. Also, if a one-sided psd is returned, it multiplies everything by 2. If I apply these two scaling factors, I get results in line with those produced by Matlab.

Now, this scaling was not performed by the old code, so this is not a new incompatibility (bug?) with Matlab. Also, while I have not reviewed the reference specified in the docstring (Bendat and Piersol 1986), the book I have handy (Stoica and Moses 2005) does not mention scaling by the sampling frequency, nor does the included Matlab code perform any such scaling.

So what should be done here? I would be opposed to making such scaling the default, as IMHO it tries to do too much and I would have to work around it to get the more "raw" numbers. However, I am not opposed to adding (yet another) option to do such scaling.

Other opinions?

Ryan

···

--
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma

Ryan May wrote:

Fago, Matt - AES wrote:

I cannot really compute the example without the pad_to support in svn. Nevertheless, using something similar (nfft=128, noffset=64) gives similarly erroneous results.

Did you add 'pad_to'? If so, thanks!

Good to know. I recently (within the last month) did a bunch of work on
psd, based in a large part on work done by one of my colleagues, Sean
Arms. I was worried some of this had broken existing code, but it
appears more likely that this was already a problem.

After much playing, and reading the Matlab docs, it looks like the
difference is that Matlab normalizes by the sampling frequency. Also,
if a one-sided psd is returned, it multiplies everything by 2. If I
apply these two scaling factors, I get results in line with those
produced by Matlab.

Now, this scaling was not performed by the old code, so this is not a
new incompatibility (bug?) with Matlab. Also, while I have not reviewed
the reference specified in the docstring (Bendat and Piersol 1986), the
book I have handy (Stoica and Moses 2005) does not mention scaling by
the sampling frequency, nor does the included Matlab code perform any
such scaling.

So what should be done here? I would be opposed to making such scaling
the default, as IMHO it tries to do too much and I would have to work
around it to get the more "raw" numbers. However, I am not opposed to
adding (yet another) option to do such scaling.

Hi Ryan,

I appreciate the work you're doing on this, and while I don't have any
very strong opinions on the API questions you raise, I would request
that you include in the docstrings information at least at the level of
the above, listing the scaling issue, the sources you've followed and
how and why you've chosen to follow them or deviate, and differences
with the default behavior of other software. I recently helped a
colleague use Matlab's PSD/pwelch functions and had to spend some time
sending in sine waves of various frequencies and amplitudes to figure
out what the results meant. If matplotlib gives different results by
default, I think it should 1) justify why and 2) indicate what option(s)
may be set to get results equivalent to other systems, including Matlab.

Thanks again,
Andrew

My preference is to strive for matlab compatability here. I am pretty
sure it was compatible with matlab in my original tests to several
significant digits, but I no longer have the test code and my memory
is fuzzy because it was over 5 years ago. But I intended this
function to be matlab compatible, and if it is not I would prefer to
see it compatible even if it means breaking some existing code. It is
possible that the original code did not scale with sampling frequency
and was compatible, but over time we've made enhancements that broke
compatibility. Its been a long time since I used and tested against
matlab.

If there is good reason to add support for an alternate scaling, we
can easily support it with kwargs. But since the signature and
origins of this function were matlab compatibility, I think it is
least confusing if the default output has similar scaling.

JDH

···

On Mon, Dec 1, 2008 at 3:40 PM, Andrew Straw <strawman@...106...> wrote:

I appreciate the work you're doing on this, and while I don't have any
very strong opinions on the API questions you raise, I would request
that you include in the docstrings information at least at the level of
the above, listing the scaling issue, the sources you've followed and
how and why you've chosen to follow them or deviate, and differences
with the default behavior of other software. I recently helped a
colleague use Matlab's PSD/pwelch functions and had to spend some time
sending in sine waves of various frequencies and amplitudes to figure
out what the results meant. If matplotlib gives different results by
default, I think it should 1) justify why and 2) indicate what option(s)
may be set to get results equivalent to other systems, including Matlab.

I suppose the issue is: what is correct? Or is it a matter of definition?

I don't have Stoica and Moses, but Bendat and Piersol eqn 11.102:

    One_Sided_PSD = 2/(n_d * N * dt) * Sum(FFT^2)

where there are n_d is the number of averages and N is the number of points in the FFT.
That seems to be scaling by the length? I'm fairly certain that the factor of two (as shown
above) is required for a one-sided PSD, as that comes from 'removing' the negative
frequency range.

Note that Matlab shows such scaling (by 2/L) even when computing the power spectra directly
from a raw (unaveraged) FFT:

   http://www.mathworks.com/support/tech-notes/1700/1702.html

To me, as Matplotlib is striving to be Matlab-compatible, the default behaviour should be to
give results as close to the Matlab implementation as possible. One could always have an
option to turn off the scaling.

Note that the Matplotlib results also seem to have significantly less frequency resolution than
the Matlab results. Is this the case, or am I not using noffset, nfft, and pad_to correctly?

Thanks for your help,
Matt

···

________________________________________
From: Ryan May [rmay31@...287...]
Sent: Monday, December 01, 2008 1:58 PM
To: Fago, Matt - AES
Cc: Matplotlib Users
Subject: Re: [Matplotlib-users] Matplotlib PSD bug?

Fago, Matt - AES wrote:

I cannot really compute the example without the pad_to support in svn. Nevertheless, using something similar (nfft=128, noffset=64) gives similarly erroneous results.

Did you add 'pad_to'? If so, thanks!

Good to know. I recently (within the last month) did a bunch of work on
psd, based in a large part on work done by one of my colleagues, Sean
Arms. I was worried some of this had broken existing code, but it
appears more likely that this was already a problem.

After much playing, and reading the Matlab docs, it looks like the
difference is that Matlab normalizes by the sampling frequency. Also,
if a one-sided psd is returned, it multiplies everything by 2. If I
apply these two scaling factors, I get results in line with those
produced by Matlab.

Now, this scaling was not performed by the old code, so this is not a
new incompatibility (bug?) with Matlab. Also, while I have not reviewed
the reference specified in the docstring (Bendat and Piersol 1986), the
book I have handy (Stoica and Moses 2005) does not mention scaling by
the sampling frequency, nor does the included Matlab code perform any
such scaling.

So what should be done here? I would be opposed to making such scaling
the default, as IMHO it tries to do too much and I would have to work
around it to get the more "raw" numbers. However, I am not opposed to
adding (yet another) option to do such scaling.

Other opinions?

Ryan

--
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma

This e-mail and any files transmitted with it may be proprietary and are intended solely for the use of the individual or entity to whom they are addressed. If you have received this e-mail in error please notify the sender.
Please note that any views or opinions presented in this e-mail are solely those of the author and do not necessarily represent those of ITT Corporation. The recipient should check this e-mail and any attachments for the presence of viruses. ITT accepts no liability for any damage caused by any virus transmitted by this e-mail.

Fago, Matt - AES wrote:

I suppose the issue is: what is correct? Or is it a matter of definition?

I don't have Stoica and Moses, but Bendat and Piersol eqn 11.102:

    One_Sided_PSD = 2/(n_d * N * dt) * Sum(FFT^2)

where there are n_d is the number of averages and N is the number of points in the FFT.
That seems to be scaling by the length? I'm fairly certain that the factor of two (as shown
above) is required for a one-sided PSD, as that comes from 'removing' the negative
frequency range.

Note that Matlab shows such scaling (by 2/L) even when computing the power spectra directly
from a raw (unaveraged) FFT:

   http://www.mathworks.com/support/tech-notes/1700/1702.html

To me, as Matplotlib is striving to be Matlab-compatible, the default behaviour should be to
give results as close to the Matlab implementation as possible. One could always have an
option to turn off the scaling.

Yeah, scaling by a factor of two for one-sided is definitely correct now that I think about it. Note, however, that the scaling by the length is not the problem. In fact, the current psd implementation does this correctly when it corrects for the power in the window. The problem stems from matlab scaling by the sampling frequency. Stoica and Moses don't do this, nor does Welch 1967. I'm all for doing things in a Matlab-compatible way--when they're correct. One of the reasons I stopped using Matlab was that it tried to be too smart for me.

Note that the Matplotlib results also seem to have significantly less frequency resolution than
the Matlab results. Is this the case, or am I not using noffset, nfft, and pad_to correctly?

It's not that you're using those incorrectly, but rather that you failed to notice that the default window is the Hanning window, not rectangular. Try adding window=mlab.window_none to the call. (Sorry, I meant to mention that before in my reply.

Ryan

···

--
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma

From: Ryan May [mailto:rmay31@…287…]
Fago, Matt - AES wrote:
> I suppose the issue is: what is correct? Or is it a matter of
definition?

[...]

Yeah, scaling by a factor of two for one-sided is definitely correct now
that I think about it. Note, however, that the scaling by the length is
not the problem. In fact, the current psd implementation does this
correctly when it corrects for the power in the window. The problem
stems from matlab scaling by the sampling frequency. Stoica and Moses
don't do this, nor does Welch 1967. I'm all for doing things in a
Matlab-compatible way--when they're correct. One of the reasons I
stopped using Matlab was that it tried to be too smart for me.

Yes, multiplying by 2/fs does look fairly good. Perhaps this comes about due to the Matlab implementation of the FFT?

> Note that the Matplotlib results also seem to have significantly less
> frequency resolution than
> the Matlab results. Is this the case, or am I not using noffset, nfft,
> and pad_to correctly?

It's not that you're using those incorrectly, but rather that you failed
to notice that the default window is the Hanning window, not
rectangular. Try adding window=mlab.window_none to the call. (Sorry, I
meant to mention that before in my reply.

Ah ... many thanks. Silly me.

- Matt

This e-mail and any files transmitted with it may be proprietary and are intended solely for the use of the individual or entity to whom they are addressed. If you have received this e-mail in error please notify the sender.
Please note that any views or opinions presented in this e-mail are solely those of the author and do not necessarily represent those of ITT Corporation. The recipient should check this e-mail and any attachments for the presence of viruses. ITT accepts no liability for any damage caused by any virus transmitted by this e-mail.

So what is to be done here? It seems to me that at least the factor of two should be
fixed for one-sided PSDs, and the 1/fs normalization difference with Matlab
documented. Ideally, I'd think this normalization would be on by default, with the
option to turn it off.

Are you planning to submit a patch, or shall I look into it? If I submit a patch, does matplotlib
require a copyright assignment?

Thanks,
Matt

···

________________________________________
From: Ryan May [rmay31@...287...]
Sent: Monday, December 01, 2008 3:31 PM
To: Fago, Matt - AES
Cc: Matplotlib Users
Subject: Re: [Matplotlib-users] Matplotlib PSD bug?

Fago, Matt - AES wrote:

I suppose the issue is: what is correct? Or is it a matter of definition?

I don't have Stoica and Moses, but Bendat and Piersol eqn 11.102:

    One_Sided_PSD = 2/(n_d * N * dt) * Sum(FFT^2)

where there are n_d is the number of averages and N is the number of points in the FFT.
That seems to be scaling by the length? I'm fairly certain that the factor of two (as shown
above) is required for a one-sided PSD, as that comes from 'removing' the negative
frequency range.

Note that Matlab shows such scaling (by 2/L) even when computing the power spectra directly
from a raw (unaveraged) FFT:

   http://www.mathworks.com/support/tech-notes/1700/1702.html

To me, as Matplotlib is striving to be Matlab-compatible, the default behaviour should be to
give results as close to the Matlab implementation as possible. One could always have an
option to turn off the scaling.

Yeah, scaling by a factor of two for one-sided is definitely correct now
that I think about it. Note, however, that the scaling by the length is
not the problem. In fact, the current psd implementation does this
correctly when it corrects for the power in the window. The problem
stems from matlab scaling by the sampling frequency. Stoica and Moses
don't do this, nor does Welch 1967. I'm all for doing things in a
Matlab-compatible way--when they're correct. One of the reasons I
stopped using Matlab was that it tried to be too smart for me.

Note that the Matplotlib results also seem to have significantly less frequency resolution than
the Matlab results. Is this the case, or am I not using noffset, nfft, and pad_to correctly?

It's not that you're using those incorrectly, but rather that you failed
to notice that the default window is the Hanning window, not
rectangular. Try adding window=mlab.window_none to the call. (Sorry, I
meant to mention that before in my reply.

Ryan

--
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma

This e-mail and any files transmitted with it may be proprietary and are intended solely for the use of the individual or entity to whom they are addressed. If you have received this e-mail in error please notify the sender.
Please note that any views or opinions presented in this e-mail are solely those of the author and do not necessarily represent those of ITT Corporation. The recipient should check this e-mail and any attachments for the presence of viruses. ITT accepts no liability for any damage caused by any virus transmitted by this e-mail.

If you submit a patch, we assume you are agreeing to the mpl licensing
terms, so you do not need to explicitly do anything. As for the
decision about how the defaults and kwargs should behave, I'll defer
to you and Ryan.

JDH

···

On Mon, Dec 8, 2008 at 9:10 AM, Fago, Matt - AES <Matt.Fago@...2409...> wrote:

So what is to be done here? It seems to me that at least the factor of two should be
fixed for one-sided PSDs, and the 1/fs normalization difference with Matlab
documented. Ideally, I'd think this normalization would be on by default, with the
option to turn it off.

Are you planning to submit a patch, or shall I look into it? If I submit a patch, does matplotlib
require a copyright assignment?

John Hunter wrote:

So what is to be done here? It seems to me that at least the factor of two should be
fixed for one-sided PSDs, and the 1/fs normalization difference with Matlab
documented. Ideally, I'd think this normalization would be on by default, with the
option to turn it off.

Are you planning to submit a patch, or shall I look into it? If I submit a patch, does matplotlib
require a copyright assignment?

If you submit a patch, we assume you are agreeing to the mpl licensing
terms, so you do not need to explicitly do anything. As for the
decision about how the defaults and kwargs should behave, I'll defer
to you and Ryan.

I'm fine with coding up a patch to handle the scaling. Since it seems Matlab compatibility is the key desire here, making the Matlab scaling the default is fine by me. I, myself, am more than capable of turning on a keyword argument. I see two independent changes here:

1) Scaling by factor of two for one-sided psd -- Doesn't need a keyword argument, just makes sense.

2) Scaling by 1/Fs. I'm still not sure why Matlab is doing this, other than perhaps to make the PSD dB/Hz instead of dB/(rad/s). Needs a new keyword argument. Suggestions on what exactly this should be called? scale_by_freq?

My only other concern is whether this belongs in 0.98.x. This is a behavior change from 0.98.3, not necessarily a bug fix. I'll defer to John, et al. on whether this should go in 0.98.x or go in a later release.

Ryan

···

On Mon, Dec 8, 2008 at 9:10 AM, Fago, Matt - AES <Matt.Fago@...2409...> wrote:

--
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma

It's a judgement call, but we have always had new features and minor
API changes in the point releases. Except on the 0.91 maintenance
branch, which is only bugfix, we have continuously added new stuff
on every release. When the breakage is likely to be difficult, or the
new feature really significant, we will push the major version. I
don't think these changes are so significant that they require waiting
until a new major version number, but you may want to consider issuing
a warning in addition to the requisite explanation in the docstring
and CHANGELOG.

JDH

···

On Mon, Dec 8, 2008 at 10:56 AM, Ryan May <rmay31@...287...> wrote:

My only other concern is whether this belongs in 0.98.x. This is a behavior
change from 0.98.3, not necessarily a bug fix. I'll defer to John, et al.
on whether this should go in 0.98.x or go in a later release.

John Hunter wrote:

···

On Mon, Dec 8, 2008 at 10:56 AM, Ryan May <rmay31@...287...> wrote:

My only other concern is whether this belongs in 0.98.x. This is a behavior
change from 0.98.3, not necessarily a bug fix. I'll defer to John, et al.
on whether this should go in 0.98.x or go in a later release.

It's a judgement call, but we have always had new features and minor
API changes in the point releases. Except on the 0.91 maintenance
branch, which is only bugfix, we have continuously added new stuff
on every release. When the breakage is likely to be difficult, or the
new feature really significant, we will push the major version. I
don't think these changes are so significant that they require waiting
until a new major version number, but you may want to consider issuing
a warning in addition to the requisite explanation in the docstring
and CHANGELOG.

How about making the default None, and if the default is used set to True and issue the warning for this release. In the future the warning is remove and the default goes to True.

As long as we're going for Matlab compatibility, what about changing Fs from defaulting to 2 to 2*pi? That seems to be the default.

Ryan

--
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma

John Hunter wrote:

···

On Mon, Dec 8, 2008 at 10:56 AM, Ryan May <rmay31@...287...> wrote:

My only other concern is whether this belongs in 0.98.x. This is a behavior
change from 0.98.3, not necessarily a bug fix. I'll defer to John, et al.
on whether this should go in 0.98.x or go in a later release.

It's a judgement call, but we have always had new features and minor
API changes in the point releases. Except on the 0.91 maintenance
branch, which is only bugfix, we have continuously added new stuff
on every release. When the breakage is likely to be difficult, or the
new feature really significant, we will push the major version. I
don't think these changes are so significant that they require waiting
until a new major version number, but you may want to consider issuing
a warning in addition to the requisite explanation in the docstring
and CHANGELOG.

The scaling changes are in, as well as the warning and the corresponding lines in api_changes and CHANGELOG. I also added the converted matlab demo I used to figure this stuff out. Now would probably be the time to see if I did something wrong (especially the warning).

Ryan

--
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma

Question from the peanut gallery: are you guys putting in numerical
tests as part of your unit test suite for this stuff? Rather than in
the future saying 'I think in December 2008 it used to match matlab'
it would be nice to have some automated testing of this. Useful if
someone ever wants to optimize it, cythonize it, etc...

Just a thought...

f

···

On Mon, Dec 8, 2008 at 1:58 PM, Ryan May <rmay31@...287...> wrote:

John Hunter wrote:

The scaling changes are in, as well as the warning and the corresponding lines in
api_changes and CHANGELOG. I also added the converted matlab demo I used to
figure this stuff out. Now would probably be the time to see if I did something
wrong (especially the warning).

From: Ryan May
The scaling changes are in, as well as the warning and the corresponding
lines in
  api_changes and CHANGELOG. I also added the converted matlab demo I
used to
figure this stuff out. Now would probably be the time to see if I did
something
wrong (especially the warning).

The magnitudes do seem better, and the error message works in a reasonable
way (no warning if explicitly set the new arg).

The plot does still look somewhat different from Matlab (e.g., the peak at 150 Hz is much larger in matplotlib), even with window=window_none. Fernando might have a point about unit tests: perhaps we could somehow get a few numerical values out of Matlab to compare to in a test?

Regardless, much better. Thank you!

- Matt

This e-mail and any files transmitted with it may be proprietary and are intended solely for the use of the individual or entity to whom they are addressed. If you have received this e-mail in error please notify the sender.
Please note that any views or opinions presented in this e-mail are solely those of the author and do not necessarily represent those of ITT Corporation. The recipient should check this e-mail and any attachments for the presence of viruses. ITT accepts no liability for any damage caused by any virus transmitted by this e-mail.