bcftools annotate: possible to use ID to match variants or otherwise disambiguate variants with the same POS/ALT?

If I understand the current behavior of `bcftools annotate` correctly, records in the input VCF are matched to records in the annotation file based on POS, REF, and ALT in cases where the annotation file is a VCF, or if it's a tab-delimited file and REF and ALT are specified in `-c`. 

When dealing with VCFs representing structural variants, we sometimes have records that represent different variants but have the same position and alternate allele. This is because use symbolic ALT alleles, which don't alwasy fully specify the variant. For example, we may have detected an deletion with two different tools that have the same start position but different end positions. Both records will have `<DEL>` as their alt allele despite representing different variants. If we've prepared data with which we'd like to annotate each variant, this leaves us unable to do so with `bcftools annotate` under the current matching scheme. For example, if in the VCF we have these two records:

```
chr1	100000 VID1	A	<DUP>	.	PASS	END=200000
chr1	100000 VID2	A	<DUP>	.	PASS	END=300000
```

And we'd like to annotate VID1 and VID2 with different values, there doesn't seem to be a way to do so with the current matching rules of `bcftools annotate`; ie if we have the annotation file:

```
chr1	100000	A	<DUP>	1
chr1	100000	A	<DUP>	2
```

and try to annotate with `bcftools annotate -a annotations.tsv.gz -c CHROM,POS,REF,ALT,INFO/VAL input_vcf.gz`, we get the output vcf:

```
chr1	100000 VID1	A	<DUP>	.	PASS	END=200000;VAL=1
chr1	100000 VID2	A	<DUP>	.	PASS	END=300000;VAL=1
```

If we add ID to the annotation file and include it in the column list, ID will get overwritten by the ID of the first matching variant by CHR,POS,REF,ALT.

I was wondering if either there is some way to accomplish our desired annotation in the current functionality of bcftools, and if not, if it would be possible to add it as a new feature. I could see the latter being accomplished either by an option that would allow the user to specify ID as a column in the annotation file which should be used for matching records, or by adding a matching rule to `-l` that would do something like match the nth duplicate record at a given position to the nth duplicate annotation value (although I imagine the latter option might get tricky to implement).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bcftools annotate: possible to use ID to match variants or otherwise disambiguate variants with the same POS/ALT? #1461

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

bcftools annotate: possible to use ID to match variants or otherwise disambiguate variants with the same POS/ALT? #1461

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions