Skip to content

Unclear criteria for when merge sets genotypes missing vs ref homoz #1891

Closed
@jcm6t

Description

@jcm6t

Thanks for supporting these tools.

bcftools 1.16, file formats are VCF 4.2.

I have merged 1379 gcvf files using the following pipeline:

$ bcftools merge --merge none -f PASS,. --gvcf $REFGENOME -i DP:sum,MQ:avg,DQUAL:min -r chr22  --file-list $gvcflistfile -Ou | \
$ bcftools norm -m -any  -Ou | \
$ bcftools view  -e 'ALT="<NON_REF>"' -Ou | \
$ bcftools annotate -x FORMAT $THREADSOPT -Ob > $outfile

These are the top few variants in the chr22 run and first 2 samples:

POS	REF	ALT	QUAL	FILTER	FORMAT	77900074	77900324
10510105	T	A	8.33	PASS	GT	./.	./.
10510113	C	T	17.93	PASS	GT	./.	./.
10510212	A	T	13.59	PASS	GT	./.	./.
10510258	C	T	41.37	PASS	GT	./.	./.
10510341	C	G	6.8	PASS	GT	./.	0/0
10510352	AT	A	15.8	PASS	GT	./.	./.
10510352	ATAAT	A	10.95	PASS	GT	./.	./.
10510355	AT	A	12.48	PASS	GT	./.	./.
10510356	T	A	46.64	PASS	GT	./.	1/1

if we focus on the 2nd sample 77900324, we see that first 4 GTs are missing ./. while the 5th at 10510341 is called 0/0. I'm trying to rationalize why. One of the key issues in multiplex gvcf merging is to utilize confidence in deciding between ./. and 0/0 for rare variants at positions that were not variable in the individual gvcfs, rather than assuming they are all 0/0 because they weren't variable and therefore are more likely to ref homozyg. Note also that I have chosen this example carefully to remove the multi-allelic issues - the positions I am referring to were unique and were SNV biallelic (but I note the split alleles at 10510352).

The block for the first merged variant at 10510105 in the individual 77900324 gvcf is:

chr22	10510083	.	A	<NON_REF>	.	PASS	END=10510105	GT:AD:DP:GQ:MIN_DP:PL:SPL:ICNT	0/0:6,0:6:12:4:0,12,141:0,12,141:6,0

ie, this PASSED in the individual gcvf with a GQ of 12, 0/0. Similarly the next 3.

The first called homozygous REF variant in this sample was at 10510341 corresponding to this block:

chr22	10510341	.	C	<NON_REF>	.	PASS	END=10510341	GT:AD:DP:GQ:MIN_DP:PL:SPL:ICNT	0/0:23,1:24:57:24:0,57,933:0,57,255:22,0

This PASSED in the gvcf with a GQ of 57, 0/0.

I would have expected all of the top 5 variants to be called 0/0 in the merged bcf file. Is merge applying hidden quality/depth threshold ?

Thanks for your insights.

(I have reproduced the first lines of the gvcf for the sample below for reference:)

chr22	10510000	.	N	<NON_REF>	.	LowDepth;LowGQ	END=10510000	GT:AD:DP:GQ:MIN_DP:PL:SPL:ICNT	./.:0,1:1:0:1:0,0,0:42,3,0:0,0
chr22	10510001	.	G	<NON_REF>	.	LowDepth	END=10510037	GT:AD:DP:GQ:MIN_DP:PL:SPL:ICNT	0/0:1,0:1:3:1:0,3,15:0,3,15:1,0
chr22	10510038	.	T	<NON_REF>	.	LowDepth	END=10510082	GT:AD:DP:GQ:MIN_DP:PL:SPL:ICNT	0/0:1,0:1:3:1:0,3,15:0,3,15:3,0
chr22	10510083	.	A	<NON_REF>	.	PASS	END=10510105	GT:AD:DP:GQ:MIN_DP:PL:SPL:ICNT	0/0:6,0:6:12:4:0,12,141:0,12,141:6,0
chr22	10510106	.	T	<NON_REF>	.	PASS	END=10510117	GT:AD:DP:GQ:MIN_DP:PL:SPL:ICNT	0/0:8,0:8:21:7:0,21,254:0,21,254:9,0
chr22	10510118	.	A	<NON_REF>	.	PASS	END=10510175	GT:AD:DP:GQ:MIN_DP:PL:SPL:ICNT	0/0:12,0:12:30:10:0,30,391:0,30,255:13,0
chr22	10510176	.	T	<NON_REF>	.	PASS	END=10510249	GT:AD:DP:GQ:MIN_DP:PL:SPL:ICNT	0/0:16,0:16:42:14:0,42,552:0,42,255:19,0
chr22	10510250	.	C	<NON_REF>	.	PASS	END=10510250	GT:AD:DP:GQ:MIN_DP:PL:SPL:ICNT	0/0:16,1:17:36:17:0,36,641:0,36,255:14,0
chr22	10510251	.	T	<NON_REF>	.	PASS	END=10510257	GT:AD:DP:GQ:MIN_DP:PL:SPL:ICNT	0/0:17,0:17:42:16:0,42,630:0,48,255:15,0
chr22	10510258	.	C	<NON_REF>	.	PASS	END=10510262	GT:AD:DP:GQ:MIN_DP:PL:SPL:ICNT	0/0:16,0:16:39:15:0,39,585:0,45,255:13,0
chr22	10510263	.	A	<NON_REF>	.	PASS	END=10510265	GT:AD:DP:GQ:MIN_DP:PL:SPL:ICNT	0/0:16,0:16:42:16:0,42,630:0,48,255:14,0
chr22	10510266	.	T	<NON_REF>	.	PASS	END=10510266	GT:AD:DP:GQ:MIN_DP:PL:SPL:ICNT	0/0:14,2:16:17:16:0,17,542:0,17,255:12,0
chr22	10510267	.	T	<NON_REF>	.	PASS	END=10510267	GT:AD:DP:GQ:MIN_DP:PL:SPL:ICNT	0/0:16,0:16:42:16:0,42,630:0,48,255:14,0
chr22	10510268	.	G	<NON_REF>	.	PASS	END=10510275	GT:AD:DP:GQ:MIN_DP:PL:SPL:ICNT	0/0:14,0:14:39:14:0,39,585:0,42,255:13,0
chr22	10510276	.	G	<NON_REF>	.	PASS	END=10510303	GT:AD:DP:GQ:MIN_DP:PL:SPL:ICNT	0/0:17,0:17:45:17:0,45,675:0,51,255:19,0
chr22	10510304	.	G	<NON_REF>	.	PASS	END=10510306	GT:AD:DP:GQ:MIN_DP:PL:SPL:ICNT	0/0:20,0:20:60:20:0,60,823:0,60,255:20,0
chr22	10510307	.	G	<NON_REF>	.	PASS	END=10510311	GT:AD:DP:GQ:MIN_DP:PL:SPL:ICNT	0/0:20,0:20:57:20:0,57,855:0,60,255:19,0
chr22	10510312	.	C	<NON_REF>	.	PASS	END=10510314	GT:AD:DP:GQ:MIN_DP:PL:SPL:ICNT	0/0:21,0:21:60:21:0,60,900:0,63,255:20,0
chr22	10510315	.	G	<NON_REF>	.	PASS	END=10510318	GT:AD:DP:GQ:MIN_DP:PL:SPL:ICNT	0/0:21,0:21:48:20:0,48,782:0,48,255:19,0
chr22	10510319	.	G	<NON_REF>	.	PASS	END=10510319	GT:AD:DP:GQ:MIN_DP:PL:SPL:ICNT	0/0:21,0:21:60:21:0,60,900:0,63,255:20,0
chr22	10510320	.	G	<NON_REF>	.	PASS	END=10510327	GT:AD:DP:GQ:MIN_DP:PL:SPL:ICNT	0/0:21,0:21:57:20:0,57,855:0,60,255:19,0
chr22	10510328	.	G	<NON_REF>	.	PASS	END=10510330	GT:AD:DP:GQ:MIN_DP:PL:SPL:ICNT	0/0:21,0:21:60:21:0,60,900:0,63,255:20,0
chr22	10510331	.	G	<NON_REF>	.	PASS	END=10510331	GT:AD:DP:GQ:MIN_DP:PL:SPL:ICNT	0/0:19,1:20:45:20:0,45,752:0,45,255:19,0
chr22	10510332	.	A	<NON_REF>	.	PASS	END=10510334	GT:AD:DP:GQ:MIN_DP:PL:SPL:ICNT	0/0:21,0:21:60:20:0,60,797:0,60,255:22,0
chr22	10510335	.	C	<NON_REF>	.	PASS	END=10510335	GT:AD:DP:GQ:MIN_DP:PL:SPL:ICNT	0/0:21,1:22:51:22:0,51,862:0,51,255:21,0
chr22	10510336	.	C	<NON_REF>	.	PASS	END=10510340	GT:AD:DP:GQ:MIN_DP:PL:SPL:ICNT	0/0:23,0:23:66:23:0,66,937:0,69,255:24,0
chr22	10510341	.	C	<NON_REF>	.	PASS	END=10510341	GT:AD:DP:GQ:MIN_DP:PL:SPL:ICNT	0/0:23,1:24:57:24:0,57,933:0,57,255:22,0
chr22	10510342	.	T	<NON_REF>	.	PASS	END=10510344	GT:AD:DP:GQ:MIN_DP:PL:SPL:ICNT	0/0:24,0:24:60:24:0,60,900:0,72,255:20,0
chr22	10510345	.	A	<NON_REF>	.	PASS	END=10510345	GT:AD:DP:GQ:MIN_DP:PL:SPL:ICNT	0/0:23,0:23:57:23:0,57,855:0,69,255:19,0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions