Open
Description
Hi, I tried running the tool on a VCF which contained this record:
chr4 1614374 Sniffles2.INS.2AS3 G GGGGGATGCAGAGACGTGAGGGGCATGCAGAGACGTGAGGGGGATGCAGAGACGTGAG 60 PASS PRECISE;SVTYPE=INS;SUPPORT=4;COVERAGE=6,6,6,6,6;STRAND=+-;AF=0.667;STDEV_LEN=0;STDEV_POS=0;SUPPORT_LONG=0;END=1614431;SVLEN=57 GT:GQ:DR:DV 0/1:8:2:4
I got a stacktrace:
Traceback (most recent call last):
File "/usr/lib/python3.6/multiprocessing/pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "/usr/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar
return list(map(*args))
File "/opt/paragraph-build/lib/python3/grm/vcf2paragraph/__init__.py", line 286, in run_vcf2paragraph
alt_paths=params["alt_paths"])
File "/opt/paragraph-build/lib/python3/grm/vcf2paragraph/__init__.py", line 86, in convert_vcf
ref, indexed_vcf.name, ins_info_key, chrom, start, end, ref_node_padding, allele_graph)
File "/opt/paragraph-build/lib/python3/grm/vcfgraph/vcfgraph.py", line 129, in create_from_vcf
graph.add_record(record, allele_graph, varId, ins_info_key)
File "/opt/paragraph-build/lib/python3/grm/vcfgraph/vcfgraph.py", line 217, in add_record
self.add_alt(vcf.pos, vcf.stop, ref_sequence, alt, alt_samples, refSamples)
File "/opt/paragraph-build/lib/python3/grm/vcfgraph/vcfgraph.py", line 298, in add_alt
raise Exception("{}:{} missing REF or ALT sequence.".format(start, end))
Exception: 1614374:1614431 missing REF or ALT sequence.
I confirmed this variant was causing the error by isolating it and running again to get the same error.
After digging through the code, it looks like the VCFGraph.add_alt
method does not handle this edge case properly. This is a large insertion that is actually duplicating the next 57 bases of the reference. So when the method does it's trimming, it ends up trimming all the bases away. I'm not sure what the right fix is, but I'd imagine a path should be added that repeats this reference block.
Metadata
Metadata
Assignees
Labels
No labels