Skip to content

Explore data augmentation for NER robustness #336

Open
@vskmd

Description

Hi, I am working on a covid-19 antiviral and was spot checking antivirals in scispacy and was surprised that remdesivir is not tagged as a chemical in any of the 1,338 PubMed abstracts containing it. I'm using en_ner_bc5cdr_md to extract CHEMICAL and DISEASE entities; spacy: '3.0.4', scispacy: '0.4.0'.

As you see below, remdesivir is not tagged as a CHEMICAL when I run en_ner_bc5cdr_md in Jupyter Lab.

image

However, when I put the same text into your demo, I was surprised that remdesivir is found.

image

Questions

  • Wonder if the version running on demo is the same one that I used in my notebook (spacy: '3.0.4', scispacy: '0.4.0')?
  • Maybe remdesivir isn't found since it wasn't present in earlier training sets?
  • Can we expect new chemicals to be recognized (e.g., first time ever published)?
  • It's especially surprising that remdesivir wasn't detected as a CHEMICAL even in the following line where it's called a 'drug' from the text used in my example:

Though the drug remdesivir (RDV) is not approved by the FDA, still the "Emergency Use Authorization" (EUA) for compassionate use in severe cases is endorsed.

  • In the demo remdesivir is detected but only once while it is mentioned several times in that passage. Is that expected?

Thanks,
vikram

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions