-
Notifications
You must be signed in to change notification settings - Fork 44
Description
Last week @UnniKohonen uploaded the updated YSO projects to our Hugging Face Hub repository NatLibFi/FintoAI-data-YSO and noticed an alert to be shown for file yso-bonsai-fi.zip on the main page of the reposity:
The alert is shown only for the zip file of the Finnish Bonsai project supposedly because the scanning errors for the zips of English and Swedish projects.
This is the full alert (shown via clicking the pickle button on the file page):
Detected Pickle imports (10)
"annif.analyzer.snowball.SnowballAnalyzer", "numpy.float64", "joblib.numpy_pickle.NumpyArrayWrapper", "sklearn.feature_extraction.text.TfidfVectorizer", "numpy.dtype", "builtins.getattr", "numpy.ndarray", "sklearn.feature_extraction.text.TfidfTransformer", "nltk.stem.snowball.FinnishStemmer", "nltk.stem.snowball.SnowballStemmer"
The reason for the alert is that the TFIDF vectorizer is saved with joblib dump:
Lines 91 to 93 in 7942838
| annif.util.atomic_save( | |
| self.vectorizer, self.datadir, self.VECTORIZER_FILE, method=joblib.dump | |
| ) |
The alert is justified, because downloading pickle files from internet and unpickling them is risky. However, when the pickle file is uploaded by a trusted party and it had not been tampered with, there is no risk.