Skip to content

CSQ sample annotation error downstream of stop codon #1578

Closed
@katiesevans

Description

After annotating my VCF with CSQ (bcftools 1.12), I noticed a variant (I:4464670) that was annotated as *missense for sample JU2106 but I couldn't find a previous stop codon in this sample. It seems that a stop codon (I:4463998) in another sample (ECA2091) is causing this mis-annotation. If I remove ECA2091 from the VCF, JU2106 is now labeled missense at this position. Similarly, if I leave all samples in but remove the stop codon in ECA2091, JU2106 is again labeled properly as missense. See code below for a reproducible example.

#### download vcf, ref, and gff
wget https://github.com/AndersenLab/annotation-nf/raw/main/bcsq_error/eri6.vcf.gz
wget https://github.com/AndersenLab/annotation-nf/raw/main/bcsq_error/chrI.fa.gz
wget https://github.com/AndersenLab/annotation-nf/raw/main/bcsq_error/csq.gff3.gz

# annotate
bcftools csq -O z --fasta-ref chrI.fa.gz \
             --gff-annot csq.gff3.gz\
             --ncsq 224 \
             --phase a eri6.vcf.gz > test.bcsq.vcf.gz

# extract
bcftools view -e 'INFO/BCSQ="."' -Ou test.bcsq.vcf.gz  | \
bcftools query -i 'GT ="alt"' -f'%CHROM\t%POS\t%REF\t%ALT\t[%SAMPLE:%TBCSQ{*}=]\n' > test.bcsq.samples.tsv


# look at annotation
cat test.bcsq.samples.tsv | grep 4464670 # missense variant
cat test.bcsq.samples.tsv | grep 4463998 # stop

#################################################
# remove ECA2091 with stop codon
# make vcf
bcftools view -s JU2106,CB4856,MY10 eri6.vcf.gz > eri6_noECA2091.vcf

# annotate
bcftools csq -O z --fasta-ref chrI.fa.gz \
             --gff-annot csq.gff3.gz\
             --ncsq 224 \
             --phase a eri6_noECA2091.vcf > noECA2091.bcsq.vcf.gz

# extract
bcftools view -e 'INFO/BCSQ="."' -Ou noECA2091.bcsq.vcf.gz  | \
bcftools query -i 'GT ="alt"' -f'%CHROM\t%POS\t%REF\t%ALT\t[%SAMPLE:%TBCSQ{*}=]\n' > noECA2091.bcsq.samples.tsv


# look at annotation
cat noECA2091.bcsq.samples.tsv | grep 4464670 # missense variant
cat noECA2091.bcsq.samples.tsv | grep 4463998 # stop

###########################################################
# remove stop variant but keep ECA2091
# make vcf
bcftools view -t ^I:4463998 eri6.vcf.gz > eri6_nostop.vcf

# annotate
bcftools csq -O z --fasta-ref chrI.fa.gz \
             --gff-annot csq.gff3.gz\
             --ncsq 224 \
             --phase a eri6_nostop.vcf > nostop.bcsq.vcf.gz

# extract
bcftools view -e 'INFO/BCSQ="."' -Ou nostop.bcsq.vcf.gz  | \
bcftools query -i 'GT ="alt"' -f'%CHROM\t%POS\t%REF\t%ALT\t[%SAMPLE:%TBCSQ{*}=]\n' > nostop.bcsq.samples.tsv


# look at annotation
cat nostop.bcsq.samples.tsv | grep 4464670 # missense variant
cat nostop.bcsq.samples.tsv | grep 4463998 # stop

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions