matplotlib.mlab PCA analysis

Hi,

Thanks a lot for your comments. I did try earlier on to remove the bad points but came across some problems when re-ordering my array. I will try out the method sent to me and check the reference.

Regards, Marjolaine.

<kgdunn+nabble@...287...> 02/11/09 4:06 PM >>>

Marjolaine,

I am assuming your masked array entries are missing data. Multivariate analysis with missing data can be handled in several standard ways, however these methods don't appear in most Python libraries.

Here are some references on the topic that will help you:

[1] P.R.C. Nelson and J.F. MacGregor, 1996, "Missing data methods in PCA and PLS: Score calculations with incomplete observations", Chemometrics and Intelligent Laboratory Systems, v35, p 45-65.

[2] F. Arteaga and A. Ferrer, 2002, "Dealing with missing data in MSPC: several methods, different interpretations, some examples", Journal of Chemometrics, v16, p408-418.

Paper [1] deals with building a model with missing data, while paper [2] looks at applying an existing PCA model to new data that contains missing entries.

Hope these help,
Kevin

Marjolaine Rouault wrote:

Hi,

I am struggling to do a PCA analysis on a masked array. Anybody has
suggestions on how to deal with masked array when doing PCAs?

Best regards, Marjolaine.

Quoted from:
http://www.nabble.com/matplotlib.mlab-PCA-analysis-tp21932808p21932808.html

···

--
This message is subject to the CSIR's copyright terms and conditions, e-mail legal notice, and implemented Open Document Format (ODF) standard.
The full disclaimer details can be found at http://www.csir.co.za/disclaimer.html.

This message has been scanned for viruses and dangerous content by MailScanner,
and is believed to be clean. MailScanner thanks Transtec Computers for their support.

Hi,

Thanks a lot for your comments. I did try earlier on to remove the bad points but came across some problems when re-ordering my array. I will try out the method sent to me and check the reference.

Yep, the compacting/reordering method is appropriate for fixed missing values (typically a grid mask) but not approriate for randomly placed missing values.

I didn’t read these references, but a simple approach you can implement in python (using for example numpy.linalg.eig applied to you covariance matrix to compute your eofs, and then your pcs) consist in a interative method to fill your missing value and let converge your EOF/PC.

  1. first save your mask once: mask = data.mask.copy()

  2. fill the missing values with the mean data.filled(data.mean())

  3. make a reconstruction of your data (EOF[1:10] . PC[1:10]) after your PCA analysis using a limited number of modes (met’s say 10) : datarec

  4. replace you original missing data with your reconstructed field: data = numpy.where(mask, datarec, data)

  5. restart from 1) a number of time you can fixe or detect with a criteria based for example on the eigenvalues

  6. the finally, you can use you EOFs et PCs, and has a bonus, you filled your data!

···

On Wed, Feb 11, 2009 at 8:00 PM, Marjolaine Rouault <mrouault@…1229…> wrote:

Regards, Marjolaine.

<kgdunn+nabble@…287…> 02/11/09 4:06 PM >>>

Marjolaine,

I am assuming your masked array entries are missing data. Multivariate analysis with missing data can be handled in several standard ways, however these methods don’t appear in most Python libraries.

Here are some references on the topic that will help you:

[1] P.R.C. Nelson and J.F. MacGregor, 1996, “Missing data methods in PCA and PLS: Score calculations with incomplete observations”, Chemometrics and Intelligent Laboratory Systems, v35, p 45-65.

[2] F. Arteaga and A. Ferrer, 2002, “Dealing with missing data in MSPC: several methods, different interpretations, some examples”, Journal of Chemometrics, v16, p408-418.

Paper [1] deals with building a model with missing data, while paper [2] looks at applying an existing PCA model to new data that contains missing entries.

Hope these help,

Kevin

Marjolaine Rouault wrote:

Hi,

I am struggling to do a PCA analysis on a masked array. Anybody has

suggestions on how to deal with masked array when doing PCAs?

Best regards, Marjolaine.

Quoted from:

http://www.nabble.com/matplotlib.mlab-PCA-analysis-tp21932808p21932808.html

This message is subject to the CSIR’s copyright terms and conditions, e-mail legal notice, and implemented Open Document Format (ODF) standard.

The full disclaimer details can be found at http://www.csir.co.za/disclaimer.html.

This message has been scanned for viruses and dangerous content by MailScanner,

and is believed to be clean. MailScanner thanks Transtec Computers for their support.


Create and Deploy Rich Internet Apps outside the browser with Adobe(R)AIR™

software. With Adobe AIR, Ajax developers can use existing skills and code to

build responsive, highly engaging applications that combine the power of local

resources and data with the reach of the web. Download the Adobe AIR SDK and

Ajax docs to start building applications today-http://p.sf.net/sfu/adobe-com


Matplotlib-users mailing list

Matplotlib-users@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/matplotlib-users


Stephane Raynaud