Closed
Description
I have not looked at the vcfconcat.c
code yet, but the following behavior seems puzzling.
Generate the following VCFs:
(echo "##fileformat=VCFv4.2"
echo "##contig=<ID=chr1>"
echo "##FORMAT=<ID=GT,Number=1,Type=String,Description=\"Genotype\">"
echo -e "#CHROM\tPOS\tID\tREF\tALT\tQUAL\tFILTER\tINFO\tFORMAT\tSAMPLE"
echo -e "chr1\t1\t.\tA\tC\t.\t.\t.\tGT\t0|1") | bcftools view -Oz -o A1.vcf.gz
bcftools index -t -f A1.vcf.gz
(echo "##fileformat=VCFv4.2"
echo "##contig=<ID=chr1>"
echo "##FORMAT=<ID=GT,Number=1,Type=String,Description=\"Genotype\">"
echo -e "#CHROM\tPOS\tID\tREF\tALT\tQUAL\tFILTER\tINFO\tFORMAT\tSAMPLE"
echo -e "chr1\t1\t.\tA\tC\t.\t.\t.\tGT\t0|1"
echo -e "chr1\t2\t.\tC\tG\t.\t.\t.\tGT\t0|1") | bcftools view -Oz -o A2.vcf.gz
bcftools index -t -f A2.vcf.gz
(echo "##fileformat=VCFv4.2"
echo "##contig=<ID=chr1>"
echo "##FORMAT=<ID=GT,Number=1,Type=String,Description=\"Genotype\">"
echo -e "#CHROM\tPOS\tID\tREF\tALT\tQUAL\tFILTER\tINFO\tFORMAT\tSAMPLE"
echo -e "chr1\t2\t.\tC\tG\t.\t.\t.\tGT\t1|0"
echo -e "chr1\t3\t.\tG\tT\t.\t.\t.\tGT\t0|1") | bcftools view -Oz -o B.vcf.gz
bcftools index -t -f B.vcf.gz
(echo "##fileformat=VCFv4.2"
echo "##contig=<ID=chr1>"
echo "##FORMAT=<ID=GT,Number=1,Type=String,Description=\"Genotype\">"
echo -e "#CHROM\tPOS\tID\tREF\tALT\tQUAL\tFILTER\tINFO\tFORMAT\tSAMPLE"
echo -e "chr1\t3\t.\tG\tT\t.\t.\t.\tGT\t1|0"
echo -e "chr1\t4\t.\tT\tA\t.\t.\t.\tGT\t0|1") | bcftools view -Oz -o C.vcf.gz
bcftools index -t -f C.vcf.gz
All four variants are output when performing a simple concatenation:
$ bcftools concat --allow-overlaps --rm-dups all A1.vcf.gz B.vcf.gz C.vcf.gz | bcftools view -H
chr1 1 . A C . . . GT 0|1
chr1 2 . C G . . . GT 1|0
chr1 3 . G T . . . GT 0|1
chr1 4 . T A . . . GT 0|1
One variant goes missing when the overlap between the first two VCFs is empty:
$ bcftools concat --ligate A1.vcf.gz B.vcf.gz C.vcf.gz | bcftools view -H
chr1 1 . A C . . . GT:PS 0|1:1
chr1 3 . G T . . . GT:PS 0|1:1
chr1 4 . T A . . . GT:PS 1|0:1
If the overlap between the first two VCFs is not empty, all variants are retained both with a simple concatenation:
$ bcftools concat --allow-overlaps --rm-dups all A2.vcf.gz B.vcf.gz C.vcf.gz | bcftools view -H
chr1 1 . A C . . . GT 0|1
chr1 2 . C G . . . GT 0|1
chr1 3 . G T . . . GT 0|1
chr1 4 . T A . . . GT 0|1
And with a concatenation using phase ligation:
$ bcftools concat --ligate A2.vcf.gz B.vcf.gz C.vcf.gz | bcftools view -H
chr1 1 . A C . . . GT:PS 0|1:1
chr1 2 . C G . . . GT:PS 0|1:1
chr1 3 . G T . . . GT:PS 1|0:1
chr1 4 . T A . . . GT:PS 0|1:1
Metadata
Metadata
Assignees
Labels
No labels