This is the implementation of the following paper:
@InProceedings{recsys_eval19,
author = {Amir H. Jadidinejad and Craig Macdonald and Iadh Ounis},
title = {How Sensitive is Recommendation Systems' Offline Evaluation to Popularity?},
booktitle = {In Workshop on Offline Evaluation for Recommender Systems (REVEAL2019) at the 13th ACM Conference on Recommender Systems.},
year = {2019},
}
- pytorch (1.0.1)
- spotlight (0.1.5)
- pytrec-eval (0.3)
The following plot summarizes the results of popularity-stratified sampling:
By setting P threshold to maximum, evaluation of models is corresponding to the offline recommendation system's evaluation:
See the paper or our poster for more details.
Use the corresponding Jupyter notebook to reproduce the results of each dataset (MovieLens, Amazon) for a specific popularity threshold P: