I am implementing some simple Principal Component Analysis (PCA) in Python but I have run into trouble with the graphical output. I have calculated my scores and my loadings (just matrices with mean-centered, univariate values) and I want to scatterplot them. However, to make the graph more useful I want to label each dot in the scatter plot and also color it. I am using Matplotlib, Pylab, and Scipy.
For example, given a 3x3 matrix of scores called T, I want to:
T,P,E = PCA_svd( X, standardize = True )
t1, t2 = T[:,0], T[:,1]
properties = dict( alpha = 0.75, c = some_colors )
s1 = scatter( t1, t2 ,s = 50, **properties )
grid( True )
And the result should show three dots of various colors with a legend describing each color, and a data-label (say a two-character code, like AA, BB, CC) for each data-point.
I understand that pylab.scatter objects are not formatted correctly to use the pylab.legend command, and I was wondering if a patch has been written for this yet. I use Python 2.5.3
I have found one work-around for the legend that plots each group in color and then hacks with a Rectangle object, as follows:
props = dict( alpha = 0.75, faceted = False )
Scores = scatter( t1, t2, c = 'red', s = 50, **props )
Loadings = scatter( p1, p2, c = 'blue', s = 50, **props )
redp = Rectangle( ( 0,0 ), 1, 1, facecolor = 'red' )
bluep = Rectangle( ( 0,0 ), 1, 1, facecolor = 'blue' )
legend( ( redp,bluep ),( 'Scores','Loadings' ) )
grid( True )
This works for varying colors across two groups of points, but it doesn't work for single data-points (it says "ValueError: First argument must be a sequence") and it also does not allow me to label each data-point with a two-char code.
Any shoves in the right direction would be very much appreciated. Links to online examples and source-code especially so.