Skip to content

[BUG] Error adding alt from insertion sequence representing a duplication #82

Open
@rickymagner

Description

@rickymagner

Hi, I tried running the tool on a VCF which contained this record:

chr4    1614374 Sniffles2.INS.2AS3      G       GGGGGATGCAGAGACGTGAGGGGCATGCAGAGACGTGAGGGGGATGCAGAGACGTGAG      60      PASS    PRECISE;SVTYPE=INS;SUPPORT=4;COVERAGE=6,6,6,6,6;STRAND=+-;AF=0.667;STDEV_LEN=0;STDEV_POS=0;SUPPORT_LONG=0;END=1614431;SVLEN=57  GT:GQ:DR:DV     0/1:8:2:4

I got a stacktrace:

Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar
    return list(map(*args))
  File "/opt/paragraph-build/lib/python3/grm/vcf2paragraph/__init__.py", line 286, in run_vcf2paragraph
    alt_paths=params["alt_paths"])
  File "/opt/paragraph-build/lib/python3/grm/vcf2paragraph/__init__.py", line 86, in convert_vcf
    ref, indexed_vcf.name, ins_info_key, chrom, start, end, ref_node_padding, allele_graph)
  File "/opt/paragraph-build/lib/python3/grm/vcfgraph/vcfgraph.py", line 129, in create_from_vcf
    graph.add_record(record, allele_graph, varId, ins_info_key)
  File "/opt/paragraph-build/lib/python3/grm/vcfgraph/vcfgraph.py", line 217, in add_record
    self.add_alt(vcf.pos, vcf.stop, ref_sequence, alt, alt_samples, refSamples)
  File "/opt/paragraph-build/lib/python3/grm/vcfgraph/vcfgraph.py", line 298, in add_alt
    raise Exception("{}:{} missing REF or ALT sequence.".format(start, end))
Exception: 1614374:1614431 missing REF or ALT sequence.

I confirmed this variant was causing the error by isolating it and running again to get the same error.

After digging through the code, it looks like the VCFGraph.add_alt method does not handle this edge case properly. This is a large insertion that is actually duplicating the next 57 bases of the reference. So when the method does it's trimming, it ends up trimming all the bases away. I'm not sure what the right fix is, but I'd imagine a path should be added that repeats this reference block.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions