trouble with Wt argument of matplotlib.mlab.PCA

Justin_R · June 12, 2012, 6:03am

operating system Windows 7
matplotlib version : 1.1.0
obtained from sourceforge

the class seems to generate the same Wt matrix for every input. The
every element of the weight matrix is either +sqrt(1/2) or -sqrt(1/2).

dat1 = 4*np.random.randn(200,1) + 2
dat2 = dat1*.25 + 1*np.random.randn(200,1)
pcaObj1 = PCA(np.hstack((dat1,dat2)))
print pcaObj1.Wt

dat3 = 2*np.random.randn(200,1) + 2
dat4 = dat3*2 + 3*np.random.randn(200,1)
pcaObj2 = PCA(np.hstack((dat1,dat2)))
print pcaObj2.Wt

The output Y seems to be correct, and the projection function works.
only the Wt matrix seems to be messed up. Am I using this class
incorrectly, or could this be a bug?
thanks,
Justin

_Paul_Hobson1 · June 12, 2012, 4:59pm

Justin, could you post a self-contained script that demonstrates the
issue? Where does this PCA function come from?

In [1]: from pylab import *

In [2]: PCA

···

On Mon, Jun 11, 2012 at 11:03 PM, Justin R <justinbrowe@...287...> wrote:

operating system Windows 7
matplotlib version : 1.1.0
obtained from sourceforge

the class seems to generate the same Wt matrix for every input. The
every element of the weight matrix is either +sqrt(1/2) or -sqrt(1/2).

dat1 = 4*np.random.randn(200,1) + 2
dat2 = dat1*.25 + 1*np.random.randn(200,1)
pcaObj1 = PCA(np.hstack((dat1,dat2)))
print pcaObj1.Wt

dat3 = 2*np.random.randn(200,1) + 2
dat4 = dat3*2 + 3*np.random.randn(200,1)
pcaObj2 = PCA(np.hstack((dat1,dat2)))
print pcaObj2.Wt

The output Y seems to be correct, and the projection function works.
only the Wt matrix seems to be messed up. Am I using this class
incorrectly, or could this be a bug?
thanks,
Justin

---------------------------------------------------------------------------
NameError Traceback (most recent call last)
C:\Users\phobson\<ipython-input-2-dcf6991f51c0> in <module>()
----> 1 PCA

NameError: name 'PCA' is not defined

-paul

Goyo · June 13, 2012, 7:01pm

2012/6/12 Paul Hobson <pmhobson@...287...>:

···

On Mon, Jun 11, 2012 at 11:03 PM, Justin R <justinbrowe@...287...> wrote:
Justin, could you post a self-contained script that demonstrates the
issue? Where does this PCA function come from?

It comes from matplotlib.mlab. Just add these imports before the OP's code:

import numpy as np
from matplotlib.mlab import PCA

But I don't know much about PCA and can't comment on this.

Goyo

Warren_Weckesser · June 14, 2012, 2:24pm

operating system Windows 7

matplotlib version : 1.1.0

obtained from sourceforge

the class seems to generate the same Wt matrix for every input. The

every element of the weight matrix is either +sqrt(1/2) or -sqrt(1/2).

dat1 = 4*np.random.randn(200,1) + 2

dat2 = dat1*.25 + 1*np.random.randn(200,1)

pcaObj1 = PCA(np.hstack((dat1,dat2)))

print pcaObj1.Wt

dat3 = 2*np.random.randn(200,1) + 2

dat4 = dat32 + 3np.random.randn(200,1)

pcaObj2 = PCA(np.hstack((dat1,dat2)))

print pcaObj2.Wt

The output Y seems to be correct, and the projection function works.

only the Wt matrix seems to be messed up. Am I using this class

incorrectly, or could this be a bug?

thanks,

Justin

Justin, could you post a self-contained script that demonstrates the

issue? Where does this PCA function come from?

In [1]: from pylab import *

In [2]: PCA

NameError Traceback (most recent call last)

C:\Users\phobson<ipython-input-2-dcf6991f51c0> in ()

----> 1 PCA

NameError: name ‘PCA’ is not defined

Paul,

In case you never got an answer to this: PCA is in the mlab submodule, so if you do “from pylab import *”, you would use mlab.PCA. (At least that’s the case in matplotlib 1.1.0).

Warren

···

On Tue, Jun 12, 2012 at 11:59 AM, Paul Hobson <pmhobson@…287…> wrote:

On Mon, Jun 11, 2012 at 11:03 PM, Justin R <justinbrowe@…287…> wrote:

-paul

Live Security Virtual Conference

Exclusive live event will cover all the ways today’s security and

threat landscape has changed and how IT managers can respond. Discussions

will include endpoint security, mobile security and the latest in malware

threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/

Matplotlib-users mailing list

Matplotlib-users@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/matplotlib-users

Aronne_Merrelli · June 16, 2012, 3:38pm

Hi,

I wouldn't call myself a PCA expert - so don't weight my answer too
heavily - but here is what I think is happening:

Looking at the code, the input data array is centered and scaled to
unit variance in each dimension. The attribute .a of the class is a
copy of the array that is actually sent to the SVD; note the
centering/scaling. I don't have a proof of this, but intuitively I
expect that the PCA axes associated with a 2-dimension centered/scaled
array will always be at 45" angles (e.g., [1,1], [-1,1], etc., which
are normalized to [sqrt(1/2), sqrt(1/2)], etc). I think one way to
describe this is that after centering/scaling there are no degrees of
freedom left if you only started with 2 dimensions. So I don't think
there is a bug, but it is maybe unclear what the PCA class is doing.
If you increase to > 2 dimensions, you can see there is random
fluctuation in Wt:

In [102]: pcaObj = PCA(np.random.randn(200,2))
In [103]: pcaObj.Wt
Out[103]:
array([[-0.70710678, -0.70710678],
[-0.70710678, 0.70710678]])

In [104]: pcaObj = PCA(np.random.randn(200,3))
In [105]: pcaObj.Wt
Out[105]:
array([[ 0.65456366, -0.24141116, -0.7164266 ],
[ 0.39843462, 0.91551401, 0.05553329],
[ 0.64249223, -0.32179924, 0.69544877]])

In [106]: pcaObj = PCA(np.random.randn(200,3))
In [107]: pcaObj.Wt
Out[107]:
array([[-0.29885902, -0.67436982, 0.67521007],
[-0.95428685, 0.21449891, -0.20815098],
[-0.00446109, -0.70655189, -0.70764718]])

Hope that helps,
Aronne

···

On Tue, Jun 12, 2012 at 1:03 AM, Justin R <justinbrowe@...287...> wrote:

operating system Windows 7
matplotlib version : 1.1.0
obtained from sourceforge

the class seems to generate the same Wt matrix for every input. The
every element of the weight matrix is either +sqrt(1/2) or -sqrt(1/2).

dat1 = 4*np.random.randn(200,1) + 2
dat2 = dat1*.25 + 1*np.random.randn(200,1)
pcaObj1 = PCA(np.hstack((dat1,dat2)))
print pcaObj1.Wt

dat3 = 2*np.random.randn(200,1) + 2
dat4 = dat3*2 + 3*np.random.randn(200,1)
pcaObj2 = PCA(np.hstack((dat1,dat2)))
print pcaObj2.Wt

The output Y seems to be correct, and the projection function works.
only the Wt matrix seems to be messed up. Am I using this class
incorrectly, or could this be a bug?