Thursday, February 12, 2015

PCA Terminology in R/prcomp

In R, the prcomp returns the following components:

1. sdev, the standard deviations of the principal components (PCs) (i.e., the square roots of the eigenvalues of the covariance/correlation matrix). To calculate the variance explained by each PC: sdev^2/sum(sdev^2). A scree plot is simply something like barplot(sdev^2). To determine the appropriate number of "important" PCs, we can look for an "elbow" in the scree plot. The component number is taken to be the point at which the remaining eigenvalues are relatively small and all about the same size.

2. rotation, the matrix of variable loadings (i.e., a matrix whose columns contain the eigenvectors).

3. x, the value of the rotated data (the centred (and scaled if requested) data multiplied by the rotation matrix). This is also called PCA scores. Hence, cov(x) is the diagonal matrix diag(sdev^2). These PC scores can be used in visualization of sample outliers (e.g., plot(x[,1],x[,2])) and subsequent data analyses, such as correction for hidden structure in linear regression models with PC scores incorporated as covariates.

No comments: