Tuesday, June 17, 2014

Convert VCF chromosome notation

VCF files from difference sources may use different chromosome notations, either with or without chr. To make a consistent notation, Vivek provided two lines of awk code to swiftly convert vcf chromosome naming format from one to another.

1. Remove 'chr' from the chromosome notation:

awk '{gsub(/^chr/,""); print}' with_chr.vcf > no_chr.vcf

2. Add chr before chromosome id

awk '{if($0 !~ /^#/) print "chr"$0; else print $0}' no_chr.vcf > with_chr.vcf


Reference:
https://www.biostars.org/p/98582/

No comments: