Description
Hello,
I obtained the vcf file from 2 WGS samples by using Jointcall.
After that, I used bcftools norm commands for split multi-allele in the VCF file.
Several options were used to phase the multi-allele of the variant, but it did not take the correct results that I intended.
I will show the original vcf file,
##fileformat=VCFv4.1
chrM 511 . CAGCA CAG,CAA,CCA 3070 PASS CIGAR=3M2D,2M2D1M,1M2D2M;RU=CA,GC,AG;REFREP=5,1,1;IDREP=4,0,0;MQ=60 GT:GQ:GQX:DPI:AD:ADF:ADR:FT:PL 1/1:3070:11:10787:90,8620,548,447:75,5407,518,428:15,3213,30,19:PASS:999,999,0,999,999,999,999,999,999,999 1/1:3070:13:21408:163,17261,827,978:134,10201,753,900:29,7060,74,78:PASS:999,999,0,999,999,999,999,999,999,999
As you can see above, the position 511 of chrM is marked with multi-allele.
First, I used the option "bcftools norm -Ov -m-any", and the results are as follows.
Options: bcftools norm -Ov -m-any
##fileformat=VCFv4.1
##bcftools_normVersion=1.15.1+htslib-1.15.1
##bcftools_normCommand=norm -Ov -m-any test.vcf;
chrM 511 . CAGCA CAG 3070 PASS CIGAR=3M2D;RU=CA;REFREP=5;IDREP=4;MQ=60 GT:GQ:GQX:DPI:AD:ADF:ADR:FT:PL 1/1:3070:11:10787:90,8620,548,447:75,5407,518,428:15,3213,30,19:PASS:999,999,0 1/1:3070:13:21408:163,17261,827,978:134,10201,753,900:29,7060,74,78:PASS:999,999,0
chrM 511 . CAGCA CAA 3070 PASS CIGAR=2M2D1M;RU=GC;REFREP=1;IDREP=0;MQ=60 GT:GQ:GQX:DPI:AD:ADF:ADR:FT:PL 0/0:3070:11:10787:90,8620,548,447:75,5407,518,428:15,3213,30,19:PASS:999,999,999 0/0:3070:13:21408:163,17261,827,978:134,10201,753,900:29,7060,74,78:PASS:999,999,999
chrM 511 . CAGCA CCA 3070 PASS CIGAR=1M2D2M;RU=AG;REFREP=1;IDREP=0;MQ=60 GT:GQ:GQX:DPI:AD:ADF:ADR:FT:PL 0/0:3070:11:10787:90,8620,548,447:75,5407,518,428:15,3213,30,19:PASS:999,999,999 0/0:3070:13:21408:163,17261,827,978:134,10201,753,900:29,7060,74,78:PASS:999,999,999
I don't think the results of using this option are correct.
The information in the INFO column should be different for each divided variant. And I think the genotype information displayed does not match the original vcf information.
To briefly express the position and genotype, I expected the results should come out like the below.
chrM 511 . CAGCA CAG 1/1 1/1
chrM 511 . CAGCA CAA ./. ./.
chrM 511 . CAGCA CCA ./. ./.
In the original vcf file, only "1/1" genotype DP has been identified, I think only CAG ALT allele position should be marked respectively 1/1.
Since the intended result was not obtained, I used the second option "bcftools norm -Ov -a --atom-overlaps . " for phased variants by position.
##bcftools_normVersion=1.15.1+htslib-1.15.1
##bcftools_normCommand=norm -Ov -a --atom-overlaps . test.vcf
chrM 511 . CAG C 3070 PASS CIGAR=1M2D2M;RU=AG;REFREP=1;IDREP=0;MQ=60 GT:GQ:GQX:DPI:AD:ADF:ADR:FT:PL ./.:3070:11:10787:90,8620,548,447:75,5407,518,428:15,3213,30,19:PASS:999,999,999 ./.:3070:13:21408:163,17261,827,978:134,10201,753,900:29,7060,74,78:PASS:999,999,999
chrM 512 . AGC A 3070 PASS CIGAR=2M2D1M;RU=GC;REFREP=1;IDREP=0;MQ=60 GT:GQ:GQX:DPI:AD:ADF:ADR:FT:PL ./.:3070:11:10787:90,8620,548,447:75,5407,518,428:15,3213,30,19:PASS:999,999,999 ./.:3070:13:21408:163,17261,827,978:134,10201,753,900:29,7060,74,78:PASS:999,999,999
chrM 513 . GCA G 3070 PASS CIGAR=3M2D;RU=CA;REFREP=5;IDREP=4;MQ=60 GT:GQ:GQX:DPI:AD:ADF:ADR:FT:PL 1/1:3070:11:10787:90,8620,548,447:75,5407,518,428:15,3213,30,19:PASS:999,999,0 1/1:3070:13:21408:163,17261,827,978:134,10201,753,900:29,7060,74,78:PASS:999,999,0
As a result of using the option, I don't know how the results came out as follows.
Why is the genotype in the chrM 513 position only marked 1/1?
and also I wonder if the all of the INFO column have a same information.
I wonder which option I should use to express the exact split variant info that I want.
I'll be waiting for your reply.
Thank you,
Activity