A weblog sharing great ideas, theory, and implementations in data, sciences, and beyond.
Sunday, July 6, 2014
Thursday, July 3, 2014
What is principal component analysis?
A really good post by Lior Pachter
http://liorpachter.wordpress.com/2014/05/26/what-is-principal-component-analysis/
http://liorpachter.wordpress.com/2014/05/26/what-is-principal-component-analysis/
Print specific column of a text file
To print the second last column of a tab delimited file,
awk -F '\t' '{print $(NF-1)}' file
NF is a special awk variable that contains the number of fields in the current record.
awk -F '\t' '{print $(NF-1)}' file
NF is a special awk variable that contains the number of fields in the current record.
Merge multiple text files while deleting the first line of all files
tail -q -n +2 file1 file2 file3
Reference
http://stackoverflow.com/questions/10103619/unix-merge-many-files-while-deleting-first-line-of-all-files
Reference
http://stackoverflow.com/questions/10103619/unix-merge-many-files-while-deleting-first-line-of-all-files
Tuesday, June 17, 2014
Convert VCF chromosome notation
VCF files from difference sources may use different chromosome notations, either with or without chr. To make a consistent notation, Vivek provided two lines of awk code to swiftly convert vcf chromosome naming format from one to another.
1. Remove 'chr' from the chromosome notation:
awk '{gsub(/^chr/,""); print}' with_chr.vcf > no_chr.vcf
2. Add chr before chromosome id
awk '{if($0 !~ /^#/) print "chr"$0; else print $0}' no_chr.vcf > with_chr.vcf
Reference:
https://www.biostars.org/p/98582/
1. Remove 'chr' from the chromosome notation:
awk '{gsub(/^chr/,""); print}' with_chr.vcf > no_chr.vcf
2. Add chr before chromosome id
awk '{if($0 !~ /^#/) print "chr"$0; else print $0}' no_chr.vcf > with_chr.vcf
Reference:
https://www.biostars.org/p/98582/
Thursday, June 12, 2014
Wednesday, June 11, 2014
Sort every n lines in a file
http://edwards.sdsu.edu/labsite/index.php/robert/399-sorting-fastq-files-by-their-sequence-identifiers
Sorting FASTQ files by their sequence identifiers
In certain cases, you need to sort FASTQ files by their sequence identifiers (e.g. to fix the order of paired-end or mate-pair sequences). There are several ways of sorting the FASTQ files, but the simplest way is usually the best. Here is a one liner to do the job:cat file.fastq | paste - - - - | sort -k1,1 -t " " | tr "\t" "\n" > file_sorted.fastq
The cat command will print the file content (to STDOUT).
The paste command will join the four lines of a FASTQ entry into a single line, each original line separated by a tab.
The sort command will sort each line using everything before the first space (which is our sequence identifer).
The tr command will replace the tabs with line breaks, which is basically an undo of the paste command (in a simplified explanation).
The ">" sign will write the sorted output to the file specified after it.
Thursday, June 5, 2014
Install Mac style launcher on Ubuntu
Step 1
Install cairo/dock program:
sudo add-apt-repository ppa:cairo-dock-team/ppa
sudo apt-get update
sudo apt-get install cairo-dock cairo-dock-plug-ins
Step 2
First launch by running command:
cairo-dock &
Caveat: On the first launcher, Cairo-Dock prompts whether to enable OpenGL but the OpenGL can be badly supported by your video drivers though most of them support it well.
Step 3
Allow cairo-dock to start automatically after login Ubuntu:
Type in command 'gnome-session-properties '
or
Open System Tools -> Preferences -> Startup Applications,
Click the “Add” button -> name your item “Cairo Dock” and in the command box type “cairo-dock” without the quotes. You can leave the comments field blank. Then click Add.
Step 4 (optional)
Hide the system default launcher panel:
Right click on desktop -> choose 'Change Desktop Background' -> in the Appearance setting window, go to Behavior tab -> switch on 'Auto-hide the launcher'
Press Alt+F2 will bring up the system launcher window.
Reference
https://help.ubuntu.com/community/CairoDock
http://glx-dock.org/ww_page.php?p=First%20Steps&lang=en
Simplify SSH Login
On a Linux machine, visiting a remote Unix/Linux machine is usually running command like 'ssh foo@hpc.example.com' and then typing in password once prompted. This process is trivial but a little bit of annoying that we need to type in the whole lengthy address and password every time. It would be nice if we can simplify the login process so that we access the remote server without typing in the full address, user id and password. Here is a solution (note that all the following procedures are done on the local machine).
First of all, edit/add the SSH configuration file $HOME/.ssh/config with content like the following:
Set file mode so that only the current user can read/write this configuration file:
First of all, edit/add the SSH configuration file $HOME/.ssh/config with content like the following:
Host hpc
HostName hpc.example.com
Port 21
User foo
HostName hpc.example.com
Port 21
User foo
chmod 600 $HOME/.ssh/config
Here we have set up an alias 'hpc' for the full remote machine address and 'ssh hpc' can initialize the login process without using the lengthy one 'ssh foo@hpc.example.com'. But we still need to type in password to get access.
Let us next set up passwordless ssh login.
Make a pair of private and public keys by:
ssh-keygen -t rsa
Note that passphrase should be left empty when prompted. By default, two files id_rsa.pub (the public key) and id_rsa (the private key) will be generated in the folder ~/.ssh/.
Copy the public key to the remote machine and then append its content to file ~/.ssh/authorized_keys:
ssh hpc cat id_rsa.pub >>~/.ssh/authorized_keys <~/.ssh/id_rsa.pub
Finally, change the file mode of the private key file so that other users can not meddle with it:
chmod 600 ~/.ssh/id_rsa
Now everything is set and we should be able to access the remote server without a password:
ssh hpc
Let us next set up passwordless ssh login.
Make a pair of private and public keys by:
ssh-keygen -t rsa
Note that passphrase should be left empty when prompted. By default, two files id_rsa.pub (the public key) and id_rsa (the private key) will be generated in the folder ~/.ssh/.
Copy the public key to the remote machine and then append its content to file ~/.ssh/authorized_keys:
ssh hpc cat id_rsa.pub >>~/.ssh/authorized_keys <~/.ssh/id_rsa.pub
Finally, change the file mode of the private key file so that other users can not meddle with it:
chmod 600 ~/.ssh/id_rsa
Now everything is set and we should be able to access the remote server without a password:
ssh hpc
Wednesday, June 4, 2014
Friday, May 23, 2014
Setup color for ls command
Linux shell terminal uses dircolors to manage the color scheme. To change the color setting for the ls output, we can do the following.
Step 1 Add the following into file ~/.bashrc so that the changes will be made permanent
if [ "$TERM" != "dump" ]; then
if [ -e $HOME/.dircolors ]; then
eval "`dircolors -b $HOME/.dircolors`"
fi
fi
Step 2 Print current color scheme and save it to file ~/.dircolors
dircolors -p >~/.dircolors
Step 3 Edit color settings for different file types as specified in file ~/.dircolors, e.g., change the color code for directory as "DIR 01;36" where 01 stands for bold font face and 36 indicates cyan for text color.
Step 4 Test the changes
source ~/.bashrc
ls -l ~/
Step 1 Add the following into file ~/.bashrc so that the changes will be made permanent
if [ "$TERM" != "dump" ]; then
if [ -e $HOME/.dircolors ]; then
eval "`dircolors -b $HOME/.dircolors`"
fi
fi
Step 2 Print current color scheme and save it to file ~/.dircolors
dircolors -p >~/.dircolors
Step 3 Edit color settings for different file types as specified in file ~/.dircolors, e.g., change the color code for directory as "DIR 01;36" where 01 stands for bold font face and 36 indicates cyan for text color.
Step 4 Test the changes
source ~/.bashrc
ls -l ~/
Monday, April 7, 2014
Resolve broken dependency on Ubuntu
It happens at times that there is a broken package dependency stopping apt-get command from functioning properly. E.g., I recently encountered an error when running 'apt-get upgrade' on my Ubuntu 12.04 machine:
Running 'apt-get -f install' turned out no good:
'apt-get remove libavahi-common-data' did not work because of the same stupid dependency problem. After trying various approaches, I finally got the following solution:
The following packages have unmet dependencies. libavahi-common3 : Depends: libavahi-common-data but it is not going to be installed E: Unmet dependencies. Try 'apt-get -f install' with no packages (or specify a solution)
Running 'apt-get -f install' turned out no good:
dpkg: error processing libavahi-common-data (--configure): libavahi-common-data:amd64 0.6.30-5ubuntu2 cannot be configured because libavahi-common-data:i386 is in a different version (0.6.30-5ubuntu2.1)
'apt-get remove libavahi-common-data' did not work because of the same stupid dependency problem. After trying various approaches, I finally got the following solution:
sudo dpkg --force-all -P libavahi-common-data:i386 libavahi-common-dataThe above command would force remove the two broken packages and hence fix the broken dependency. Now we can run apt-get command to install whatever we want.
Monday, March 31, 2014
R install package from source on Windows
To install from package source code for both i386 and x64 arch:
R CMD INSTALL --compile-both Package_Directory
To install and build *.zip archive at the same time:
R CMD INSTALL --build --compile-both Package_Directory
R CMD INSTALL --compile-both Package_Directory
To install and build *.zip archive at the same time:
R CMD INSTALL --build --compile-both Package_Directory
Wednesday, January 29, 2014
Install CPAN modules in Perl
To install modules from CPAN, run the following command as administrator
For example, to support the command line history in CPAN shell, we need to install the modules
To customize installation path, eg, when installing without root privilege, run the following commands in cpan (Ref 1):
Once installation path is customized, set environment variable PERL5LIB to tell perl where to look for modules from, eg, add one line in ~/.bash_profile:
Update
We can also use
Reference
1. Setting up customized installation path for cpan:
http://stackoverflow.com/questions/540640/how-can-i-install-a-cpan-module-into-a-local-directory
perl -MCPAN -e shellor
cpanUnder the CPAN shell, type in
install Module_Name
For example, to support the command line history in CPAN shell, we need to install the modules
install Term::ReadLine
install Term::ReadLine::Perl
To customize installation path, eg, when installing without root privilege, run the following commands in cpan (Ref 1):
o conf mbuildpl_arg '--install_base /home/user/.local/perl5'
o conf makepl_arg INSTALL_BASE=/home/user/.local/perl5
o conf commit
Once installation path is customized, set environment variable PERL5LIB to tell perl where to look for modules from, eg, add one line in ~/.bash_profile:
export PERL5LIB=/home/user/.local/perl5/lib/perl5:$PERL5LIB
Setup MANPATH
MANPATH=$MANPATH:/home/user/.local/man
Update
We can also use
cpanm
to facilitate installation from CPAN or from local (*.tar.gz) file. Firstly type in this command in terminalcpan App::cpanminusNow, install module
cpanm CPAN_Module_Name
cpanm -l /path/to/install perl_package_file #-l option specifies installation location
Reference
1. Setting up customized installation path for cpan:
http://stackoverflow.com/questions/540640/how-can-i-install-a-cpan-module-into-a-local-directory
Subscribe to:
Posts (Atom)