Open
Description
Hi, I am working on a covid-19 antiviral and was spot checking antivirals in scispacy and was surprised that remdesivir is not tagged as a chemical in any of the 1,338 PubMed abstracts containing it. I'm using en_ner_bc5cdr_md to extract CHEMICAL and DISEASE entities; spacy: '3.0.4', scispacy: '0.4.0'.
As you see below, remdesivir is not tagged as a CHEMICAL when I run en_ner_bc5cdr_md in Jupyter Lab.
However, when I put the same text into your demo, I was surprised that remdesivir is found.
Questions
- Wonder if the version running on demo is the same one that I used in my notebook (spacy: '3.0.4', scispacy: '0.4.0')?
- Maybe remdesivir isn't found since it wasn't present in earlier training sets?
- Can we expect new chemicals to be recognized (e.g., first time ever published)?
- It's especially surprising that remdesivir wasn't detected as a CHEMICAL even in the following line where it's called a 'drug' from the text used in my example:
Though the drug remdesivir (RDV) is not approved by the FDA, still the "Emergency Use Authorization" (EUA) for compassionate use in severe cases is endorsed.
- In the demo remdesivir is detected but only once while it is mentioned several times in that passage. Is that expected?
Thanks,
vikram