Friday, January 27, 2023

A Singularity definition file with R installation

Bootstrap:docker
From:ubuntu:latest
%post
    apt update
    apt -y install openssh-server
    apt -y install openssh-client
    apt -y install build-essential
    apt -y install less
    apt -y install nano
    TZ="America/New_York"
    ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone
    apt -y install tzdata
    apt -y install locales
    locale-gen --purge en_US.UTF-8
    # install R
    apt -y install --no-install-recommends software-properties-common dirmngr
    wget -qO- https://cloud.r-project.org/bin/linux/ubuntu/marutter_pubkey.asc | tee -a /etc/apt/trusted.gpg.d/cran_ubuntu_key.asc
    add-apt-repository "deb https://cloud.r-project.org/bin/linux/ubuntu $(lsb_release -cs)-cran40/"
    apt -y install --no-install-recommends r-base
%environment
    export LC_ALL=C.UTF-8
    export LANG=C.UTF-8






Friday, January 13, 2023

wsl logon failure: the user has not been granted the requested logon type at this computer.

To resolve this error, run the following command to restart the virtual machine service:

powershell restart-service vmcompute

 

Reference: https://github.com/microsoft/WSL/issues/5401

Thursday, October 6, 2022

Illustration of PCA using SVD with a Real Example

Let us illustrate PCA in R programming language. As an example, we will use a built-in dataset mtcars (samples in the rows and variables in the columns).

> head(mtcars)

                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Scale data before PCA.

> X <- scale(mtcars)

Call function prcomp to compute PCA.

> result1 <- prcomp(X, retx = TRUE, center = FALSE, scale. = FALSE)

Object result1 is a list with several components:

sdev

the standard deviations of the principal components (i.e., the square roots of the eigenvalues of the covariance/correlation matrix).

rotation

the matrix of variable loadings (i.e., a matrix whose columns contain the eigenvectors).

x

the PCA scores or the rotated data (data multiplied by the rotation matrix), which is the projection of the data on the eigenvectors.

center, scale

the centering and scaling used, or FALSE.


Next, we compute SVD of data X.

> result2 <- svd(X)

Object result2 is a list with several components:

d

  a vector containing the singular values of X.

u

  a matrix whose columns contain the left singular vectors of x.

v

  a matrix whose columns contain the right singular vectors of x.

Comparing the components of prcomp output result1 with components of svd output result2, it is clear that 

(1) eigenvector matrix result1$rotation is equivalent to result2$v

> result1$rotation[, 1:3] #show first 3 columns due to page size limit
            PC1         PC2         PC3
mpg  -0.3625305  0.01612440 -0.22574419
cyl   0.3739160  0.04374371 -0.17531118
disp  0.3681852 -0.04932413 -0.06148414
hp    0.3300569  0.24878402  0.14001476
drat -0.2941514  0.27469408  0.16118879
wt    0.3461033 -0.14303825  0.34181851
qsec -0.2004563 -0.46337482  0.40316904
vs   -0.3065113 -0.23164699  0.42881517
am   -0.2349429  0.42941765 -0.20576657
gear -0.2069162  0.46234863  0.28977993
carb  0.2140177  0.41357106  0.52854459
> s2$v[, 1:3]
            [,1]        [,2]        [,3]
 [1,] -0.3625305  0.01612440 -0.22574419
 [2,]  0.3739160  0.04374371 -0.17531118
 [3,]  0.3681852 -0.04932413 -0.06148414
 [4,]  0.3300569  0.24878402  0.14001476
 [5,] -0.2941514  0.27469408  0.16118879
 [6,]  0.3461033 -0.14303825  0.34181851
 [7,] -0.2004563 -0.46337482  0.40316904
 [8,] -0.3065113 -0.23164699  0.42881517
 [9,] -0.2349429  0.42941765 -0.20576657
[10,] -0.2069162  0.46234863  0.28977993
[11,]  0.2140177  0.41357106  0.52854459

> sapply(1:ncol(result1$rotation), function(i) all.equal(as.vector(result1$rotation[, i]), result2$v[, i]))
 [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

(2) PCA scores matrix result1$x can be derived from the scaled data X by multiplying by right singular vectors result2$v

> scores <- X %*% result2$v

> head(result1$x[, 1:3]) #show first 3 columns due to page size limit
                          PC1        PC2        PC3
Mazda RX4         -0.64686274  1.7081142 -0.5917309
Mazda RX4 Wag     -0.61948315  1.5256219 -0.3763013
Datsun 710        -2.73562427 -0.1441501 -0.2374391
Hornet 4 Drive    -0.30686063 -2.3258038 -0.1336213
Hornet Sportabout  1.94339268 -0.7425211 -1.1165366
Valiant           -0.05525342 -2.7421229  0.1612456

> head(scores[, 1:3])
                         [,1]       [,2]       [,3]
Mazda RX4         -0.64686274  1.7081142 -0.5917309
Mazda RX4 Wag     -0.61948315  1.5256219 -0.3763013
Datsun 710        -2.73562427 -0.1441501 -0.2374391
Hornet 4 Drive    -0.30686063 -2.3258038 -0.1336213
Hornet Sportabout  1.94339268 -0.7425211 -1.1165366
Valiant           -0.05525342 -2.7421229  0.1612456

> sapply(1:ncol(result1$x), function(i) all.equal(result1$x[, i], scores[, i]))
 [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

(3) PCA scores matrix result1$x is also equivalent to left singular vectors result2$u multiplied by singular values result2$d

> scores2 <- result2$u %*% diag(result2$d)

> head(result1$x[, 1:3])
                          PC1        PC2        PC3
Mazda RX4         -0.64686274  1.7081142 -0.5917309
Mazda RX4 Wag     -0.61948315  1.5256219 -0.3763013
Datsun 710        -2.73562427 -0.1441501 -0.2374391
Hornet 4 Drive    -0.30686063 -2.3258038 -0.1336213
Hornet Sportabout  1.94339268 -0.7425211 -1.1165366
Valiant           -0.05525342 -2.7421229  0.1612456
 

> head(scores2[, 1:3])
            [,1]       [,2]       [,3]
[1,] -0.64686274  1.7081142 -0.5917309
[2,] -0.61948315  1.5256219 -0.3763013
[3,] -2.73562427 -0.1441501 -0.2374391
[4,] -0.30686063 -2.3258038 -0.1336213
[5,]  1.94339268 -0.7425211 -1.1165366
[6,] -0.05525342 -2.7421229  0.1612456

> sapply(1:ncol(result1$x), function(i) all.equal(as.vector(result1$x[, i]), scores2[, i]))
 [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

(4) the standard deviations of principal components are equivalent to singular values divided by the square root of n - 1, where n is the number of samples.

> all.equal(result1$sdev, result2$d / sqrt(nrow(X) - 1))
[1] TRUE