Skip to content

UMLS Entity Linker throws BadZipFile error #534

Open
@markediger

Description

I am trying to run a basic example of the UMLS Entity Linker:

import spacy
import scispacy
from scispacy.umls_linking import UmlsEntityLinker
nlp = spacy.load('en_core_sci_md')
linker = UmlsEntityLinker()

nlp.add_pipe(linker)
doc = nlp("Spinal and bulbar muscular atrophy (SBMA) is an \
           inherited motor neuron disease caused by the expansion \
           of a polyglutamine tract within the androgen receptor (AR). \
           SBMA can be caused by this easily.")

entity = doc.ents[1]
print("Name: ", entity)

for umls_ent in entity._.umls_ents:
    print(linker.umls.cui_to_entity[umls_ent[0]])

I get an error implying that scispacy is not able to identify the UMLS dictionaries?

Traceback (most recent call last):
  File "H:\integrated_evidence\indication_coding\indication-master\src\scispacy_test.py", line 5, in <module>
    linker = UmlsEntityLinker()
             ^^^^^^^^^^^^^^^^^^
  File "H:\integrated_evidence\indication_coding\indication-master\.venv\Lib\site-packages\scispacy\linking.py", line 85, in __init__
    self.candidate_generator = candidate_generator or CandidateGenerator(
                                                      ^^^^^^^^^^^^^^^^^^^
  File "H:\integrated_evidence\indication_coding\indication-master\.venv\Lib\site-packages\scispacy\candidate_generation.py", line 222, in __init__
    self.ann_index = ann_index or load_approximate_nearest_neighbours_index(
                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "H:\integrated_evidence\indication_coding\indication-master\.venv\Lib\site-packages\scispacy\candidate_generation.py", line 133, in load_approximate_nearest_neighbours_index
    concept_alias_tfidfs = scipy.sparse.load_npz(
                           ^^^^^^^^^^^^^^^^^^^^^^
  File "H:\integrated_evidence\indication_coding\indication-master\.venv\Lib\site-packages\scipy\sparse\_matrix_io.py", line 134, in load_npz
    with np.load(file, **PICKLE_KWARGS) as loaded:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "H:\integrated_evidence\indication_coding\indication-master\.venv\Lib\site-packages\numpy\lib\npyio.py", line 444, in load
    ret = NpzFile(fid, own_fid=own_fid, allow_pickle=allow_pickle,
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "H:\integrated_evidence\indication_coding\indication-master\.venv\Lib\site-packages\numpy\lib\npyio.py", line 190, in __init__
    _zip = zipfile_factory(fid)
           ^^^^^^^^^^^^^^^^^^^^
  File "H:\integrated_evidence\indication_coding\indication-master\.venv\Lib\site-packages\numpy\lib\npyio.py", line 103, in zipfile_factory
    return zipfile.ZipFile(file, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\apps\python\Lib\zipfile.py", line 1301, in __init__
    self._RealGetContents()
  File "D:\apps\python\Lib\zipfile.py", line 1368, in _RealGetContents
    raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file

I am using scispacy version 0.5.5 and en_core_sci_md version 0.5.4.

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions