Closed
Description
After annotating my VCF with CSQ (bcftools 1.12
), I noticed a variant (I:4464670
) that was annotated as *missense
for sample JU2106 but I couldn't find a previous stop codon in this sample. It seems that a stop codon (I:4463998
) in another sample (ECA2091) is causing this mis-annotation. If I remove ECA2091 from the VCF, JU2106 is now labeled missense
at this position. Similarly, if I leave all samples in but remove the stop codon in ECA2091, JU2106 is again labeled properly as missense
. See code below for a reproducible example.
#### download vcf, ref, and gff
wget https://github.com/AndersenLab/annotation-nf/raw/main/bcsq_error/eri6.vcf.gz
wget https://github.com/AndersenLab/annotation-nf/raw/main/bcsq_error/chrI.fa.gz
wget https://github.com/AndersenLab/annotation-nf/raw/main/bcsq_error/csq.gff3.gz
# annotate
bcftools csq -O z --fasta-ref chrI.fa.gz \
--gff-annot csq.gff3.gz\
--ncsq 224 \
--phase a eri6.vcf.gz > test.bcsq.vcf.gz
# extract
bcftools view -e 'INFO/BCSQ="."' -Ou test.bcsq.vcf.gz | \
bcftools query -i 'GT ="alt"' -f'%CHROM\t%POS\t%REF\t%ALT\t[%SAMPLE:%TBCSQ{*}=]\n' > test.bcsq.samples.tsv
# look at annotation
cat test.bcsq.samples.tsv | grep 4464670 # missense variant
cat test.bcsq.samples.tsv | grep 4463998 # stop
#################################################
# remove ECA2091 with stop codon
# make vcf
bcftools view -s JU2106,CB4856,MY10 eri6.vcf.gz > eri6_noECA2091.vcf
# annotate
bcftools csq -O z --fasta-ref chrI.fa.gz \
--gff-annot csq.gff3.gz\
--ncsq 224 \
--phase a eri6_noECA2091.vcf > noECA2091.bcsq.vcf.gz
# extract
bcftools view -e 'INFO/BCSQ="."' -Ou noECA2091.bcsq.vcf.gz | \
bcftools query -i 'GT ="alt"' -f'%CHROM\t%POS\t%REF\t%ALT\t[%SAMPLE:%TBCSQ{*}=]\n' > noECA2091.bcsq.samples.tsv
# look at annotation
cat noECA2091.bcsq.samples.tsv | grep 4464670 # missense variant
cat noECA2091.bcsq.samples.tsv | grep 4463998 # stop
###########################################################
# remove stop variant but keep ECA2091
# make vcf
bcftools view -t ^I:4463998 eri6.vcf.gz > eri6_nostop.vcf
# annotate
bcftools csq -O z --fasta-ref chrI.fa.gz \
--gff-annot csq.gff3.gz\
--ncsq 224 \
--phase a eri6_nostop.vcf > nostop.bcsq.vcf.gz
# extract
bcftools view -e 'INFO/BCSQ="."' -Ou nostop.bcsq.vcf.gz | \
bcftools query -i 'GT ="alt"' -f'%CHROM\t%POS\t%REF\t%ALT\t[%SAMPLE:%TBCSQ{*}=]\n' > nostop.bcsq.samples.tsv
# look at annotation
cat nostop.bcsq.samples.tsv | grep 4464670 # missense variant
cat nostop.bcsq.samples.tsv | grep 4463998 # stop
Activity