-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Open
Description
Hi, I found a parsing issue when reading GenBank files generated by antiSMASH.
When a long qualifier value such as /domain_id is wrapped across multiple lines, Biopython inserts a whitespace during parsing, which is not desired in this case.
This behavior is acceptable for many qualifiers, but here the inserted whitespace changes the intended format of the value.
The code and extraction output are below:
from Bio import SeqIO
gbk = "QUAK01000239.1.region001.gbk"
for record in SeqIO.parse(gbk, "genbank"):
for f in record.features:
if f.type == "aSDomain":
domain_id = f.qualifiers["domain_id"]
print(domain_id)
I would like to know whether this behavior is caused by non-standard GenBank formatting in the antiSMASH output, or if Biopython has any recommended approach to reliably handle such multi-line qualifier values.
Thanks!
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels
