Skip to content

Pickle scanner alerts on Hugging Face Hub #897

@juhoinkinen

Description

@juhoinkinen

Last week @UnniKohonen uploaded the updated YSO projects to our Hugging Face Hub repository NatLibFi/FintoAI-data-YSO and noticed an alert to be shown for file yso-bonsai-fi.zip on the main page of the reposity:

Image

The alert is shown only for the zip file of the Finnish Bonsai project supposedly because the scanning errors for the zips of English and Swedish projects.

This is the full alert (shown via clicking the pickle button on the file page):

Detected Pickle imports (10)

"annif.analyzer.snowball.SnowballAnalyzer",
"numpy.float64",
"joblib.numpy_pickle.NumpyArrayWrapper",
"sklearn.feature_extraction.text.TfidfVectorizer",
"numpy.dtype",
"builtins.getattr",
"numpy.ndarray",
"sklearn.feature_extraction.text.TfidfTransformer",
"nltk.stem.snowball.FinnishStemmer",
"nltk.stem.snowball.SnowballStemmer" 

The reason for the alert is that the TFIDF vectorizer is saved with joblib dump:

annif.util.atomic_save(
self.vectorizer, self.datadir, self.VECTORIZER_FILE, method=joblib.dump
)

The alert is justified, because downloading pickle files from internet and unpickling them is risky. However, when the pickle file is uploaded by a trusted party and it had not been tampered with, there is no risk.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions