Thursday, October 6, 2022

Illustration of PCA using SVD with a Real Example

Let us illustrate PCA in R programming language. As an example, we will use a built-in dataset mtcars (samples in the rows and variables in the columns).

> head(mtcars)

                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Scale data before PCA.

> X <- scale(mtcars)

Call function prcomp to compute PCA.

> result1 <- prcomp(X, retx = TRUE, center = FALSE, scale. = FALSE)

Object result1 is a list with several components:

sdev

the standard deviations of the principal components (i.e., the square roots of the eigenvalues of the covariance/correlation matrix).

rotation

the matrix of variable loadings (i.e., a matrix whose columns contain the eigenvectors).

x

the PCA scores or the rotated data (data multiplied by the rotation matrix), which is the projection of the data on the eigenvectors.

center, scale

the centering and scaling used, or FALSE.


Next, we compute SVD of data X.

> result2 <- svd(X)

Object result2 is a list with several components:

d

  a vector containing the singular values of X.

u

  a matrix whose columns contain the left singular vectors of x.

v

  a matrix whose columns contain the right singular vectors of x.

Comparing the components of prcomp output result1 with components of svd output result2, it is clear that 

(1) eigenvector matrix result1$rotation is equivalent to result2$v

> result1$rotation[, 1:3] #show first 3 columns due to page size limit
            PC1         PC2         PC3
mpg  -0.3625305  0.01612440 -0.22574419
cyl   0.3739160  0.04374371 -0.17531118
disp  0.3681852 -0.04932413 -0.06148414
hp    0.3300569  0.24878402  0.14001476
drat -0.2941514  0.27469408  0.16118879
wt    0.3461033 -0.14303825  0.34181851
qsec -0.2004563 -0.46337482  0.40316904
vs   -0.3065113 -0.23164699  0.42881517
am   -0.2349429  0.42941765 -0.20576657
gear -0.2069162  0.46234863  0.28977993
carb  0.2140177  0.41357106  0.52854459
> s2$v[, 1:3]
            [,1]        [,2]        [,3]
 [1,] -0.3625305  0.01612440 -0.22574419
 [2,]  0.3739160  0.04374371 -0.17531118
 [3,]  0.3681852 -0.04932413 -0.06148414
 [4,]  0.3300569  0.24878402  0.14001476
 [5,] -0.2941514  0.27469408  0.16118879
 [6,]  0.3461033 -0.14303825  0.34181851
 [7,] -0.2004563 -0.46337482  0.40316904
 [8,] -0.3065113 -0.23164699  0.42881517
 [9,] -0.2349429  0.42941765 -0.20576657
[10,] -0.2069162  0.46234863  0.28977993
[11,]  0.2140177  0.41357106  0.52854459

> sapply(1:ncol(result1$rotation), function(i) all.equal(as.vector(result1$rotation[, i]), result2$v[, i]))
 [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

(2) PCA scores matrix result1$x can be derived from the scaled data X by multiplying by right singular vectors result2$v

> scores <- X %*% result2$v

> head(result1$x[, 1:3]) #show first 3 columns due to page size limit
                          PC1        PC2        PC3
Mazda RX4         -0.64686274  1.7081142 -0.5917309
Mazda RX4 Wag     -0.61948315  1.5256219 -0.3763013
Datsun 710        -2.73562427 -0.1441501 -0.2374391
Hornet 4 Drive    -0.30686063 -2.3258038 -0.1336213
Hornet Sportabout  1.94339268 -0.7425211 -1.1165366
Valiant           -0.05525342 -2.7421229  0.1612456

> head(scores[, 1:3])
                         [,1]       [,2]       [,3]
Mazda RX4         -0.64686274  1.7081142 -0.5917309
Mazda RX4 Wag     -0.61948315  1.5256219 -0.3763013
Datsun 710        -2.73562427 -0.1441501 -0.2374391
Hornet 4 Drive    -0.30686063 -2.3258038 -0.1336213
Hornet Sportabout  1.94339268 -0.7425211 -1.1165366
Valiant           -0.05525342 -2.7421229  0.1612456

> sapply(1:ncol(result1$x), function(i) all.equal(result1$x[, i], scores[, i]))
 [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

(3) PCA scores matrix result1$x is also equivalent to left singular vectors result2$u multiplied by singular values result2$d

> scores2 <- result2$u %*% diag(result2$d)

> head(result1$x[, 1:3])
                          PC1        PC2        PC3
Mazda RX4         -0.64686274  1.7081142 -0.5917309
Mazda RX4 Wag     -0.61948315  1.5256219 -0.3763013
Datsun 710        -2.73562427 -0.1441501 -0.2374391
Hornet 4 Drive    -0.30686063 -2.3258038 -0.1336213
Hornet Sportabout  1.94339268 -0.7425211 -1.1165366
Valiant           -0.05525342 -2.7421229  0.1612456
 

> head(scores2[, 1:3])
            [,1]       [,2]       [,3]
[1,] -0.64686274  1.7081142 -0.5917309
[2,] -0.61948315  1.5256219 -0.3763013
[3,] -2.73562427 -0.1441501 -0.2374391
[4,] -0.30686063 -2.3258038 -0.1336213
[5,]  1.94339268 -0.7425211 -1.1165366
[6,] -0.05525342 -2.7421229  0.1612456

> sapply(1:ncol(result1$x), function(i) all.equal(as.vector(result1$x[, i]), scores2[, i]))
 [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

(4) the standard deviations of principal components are equivalent to singular values divided by the square root of n - 1, where n is the number of samples.

> all.equal(result1$sdev, result2$d / sqrt(nrow(X) - 1))
[1] TRUE

 

Friday, September 16, 2022

Ubuntu unable to connect to internet after recovering from hibernation with VPN connected

There could be many difference issues causing the network connection problem after hibernation. One possible issue I wanted to note down here is network mis-configuration by VPN. The VPN client will change the content in the network configuration file /etc/resolv.conf. The system should automatically restore the original configuration file after disconnecting from VPN. However, when putting the system into hibernation before disconnecting from VPN, the original configuration file won't be restored in some instances. As a result, when recovering from hibernation, the system has a mis-configured file /etc/resolv.conf so that it can not connect to or ping any website. To resolve the problem, look for a backup file for /etc/resolv.conf in folder /etc/ (list possible files by command ls /etc/resolv.conf*) and replace the mis-configured file with the correct version.


Wednesday, July 20, 2022

Linux find and remove files in one command

find /dir/to/search/ -type f -name "FILE-TO-FIND-Regex" -exec rm -f {} \;

find . -name "FILE-TO-FIND" -exec rm -rf {} \;

 

Tuesday, May 24, 2022

Revert a Git repository to a previous commit

 Following is from the answer by boulder_ruby

Working on your own and just want it to work? Follow these instructions below, they’ve worked reliably for me and many others for years.

Working with others? Git is complicated. Read the comments below this answer before you do something rash.

Reverting Working Copy to Most Recent Commit

To revert to the previous commit, ignoring any changes:

git reset --hard HEAD

where HEAD is the last commit in your current branch

Reverting The Working Copy to an Older Commit

To revert to a commit that's older than the most recent commit:

# Resets index to former commit; replace '56e05fced' with your commit code
git reset 56e05fced 

# Moves pointer back to previous HEAD
git reset --soft HEAD@{1}

git commit -m "Revert to 56e05fced"

# Updates working copy to reflect the new commit
git reset --hard

Credits go to a similar Stack Overflow question, Revert to a commit by a SHA hash in Git?.

 

Reference

https://stackoverflow.com/questions/4114095/how-do-i-revert-a-git-repository-to-a-previous-commit

 

Thursday, March 31, 2022

Using R in Jupyter Notebook

Here is a short tutorial about running R in Jupyter Notebook on a Windows system.

1. Install R by downloading R binary installer from the Comprehensive R Archive Network [CRAN] https://cran.r-project.org/.

2. Install Python 3.x from Python official website at https://www.python.org/. Add Python 3.x to system PATH after installation.

3. Install PIP and Jupyter Notebook:

python -m pip install --upgrade
pip install jupyter
4. Install R kernel for Jupyter Notebook in R terminal:
install.packages('IRkernel')
5. Add kernel to Jupyter by running the following in R terminal:
IRkernel::installspec(user = FALSE)
6. Run jupyter notebook from the Windows Command Prompt to start the Jupyter Notebook. Now you should be able to create a new Notebook for R.

Monday, January 10, 2022

Onedrive Sync Any Directory on Your PC via Mklink

https://winbuzzer.com/2020/07/24/how-to-onedrive-folder-sync-any-directory-via-mklink-xcxwbt/

https://www.addictivetips.com/windows-tips/create-delete-a-junction-link-on-windows-10/

 

Syntax to create a junction link (run in command prompt)

mklink /J "path to junction link" "path to target folder"

E.g, to create a junction link in the personal OneDrive directory

mklink /j "%UserProfile%\OneDrive\Documents" "C:\Users\Winbuzzer\Documents" 
 

Syntax to delete the junction link

rmdir "path to junction link"

Wednesday, January 5, 2022

Use future-promise to improve R/Shiny user experience

 https://rstudio.github.io/promises/articles/shiny.html

https://rstudio.github.io/promises/articles/overview.html

https://rstudio.github.io/promises/articles/casestudy.html

 https://engineering-shiny.org/optimizing-shiny-code.html#asynchronous-in-shiny