Movie Recommendation System: CSN-382 Project
Movie Recommendation System: CSN-382 Project
Movie Recommendation System: CSN-382 Project
CSN-382 Project
Submitted By:
Abhishek Jaisingh, 14114002
Tirth Patel, 14114036
Sahil Garg, 14114046
Sumit Kumar Singh, 14114063
Recommendation System
Recommendation systems produce a ranked list of items on which a user might
be interested, in the context of his current choice of an item.
f(movie) → {movies}
The goal of the recommendation engine is to predict the blanks in a utility matrix.
Utility Matrix
Similarity Measures
Pearson Correlation Similarity Measure : Measure of similarity of users or items
from the rows and columns of the Utility Matrix.
Advantages:
where denotes the average rating given by user x to all items. To calculate
we only consider items that were rated by the user.
Prediction
● One way of predicting the value of the utility matrix entry (estimated rating) of
a given user u for item i, is to average the ratings of top_n users.
● Other approach is to first normalize the utility matrix.
● That is, for each of the n most similar users, subtract their average rating for
all items from the rating of the item of interest i. Take the average of these
differences for those users who have rated i, then add this average difference
to the average rating that u gives for all items.
Results
● We achieved a Mean Square Error of 1.076 for the prediction of user ratings
and top_n = 150 (neighborhood size).
Disadvantages
1. Cold Start: There needs to be enough other users already in the system to find
a match.
2. Sparsity: Most users do not rate most items and hence the user-item matrix is
typically very sparse. It is hard to find users that have rated the same items.
3. First Rater: It is not possible to recommend an item that has not been
previously rated. This problem comes for new items mostly.
❖ It uses only the item data maintaining a profile for each item. Each user is
assumed to operate independently. No need for data on other users.
❖ If we consider the content of a movie as director, writer, cast etc., then each
of these attribute can be considered as a feature.
Similarity
We recommend the items to the users which are very much similar to the rated
item by the user.
Here, A1i, A2i .. Ani are the features for the item i.
Function f(A1i , A1j) represents the distance (similarity) between the 1st feature for
item i and j.
Features and Distance Measures
Solving the above regression equations provide estimates for the values of ω1,
ω2, · · · , ωn. If there are l movies under consideration, it is possible to have lC2
regression equations of the above form.
Prediction
● Using regression we can solve for the weight vector, W
● User can input the movie for which he wants recommendation (say Oi)
● We check similarity, S(Oi, Oj) of the given movie with all other movies
(Oj).
● Each movie’s similarity score is dot_product( S, W ).
● We have to recommend movies which have the maximum similarity
score
Future Work
● In collaborative filtering, we have a problem of sparsity of data. Very
few users actually rate the same movie.
● We can use Clustering Algorithms like K-Means to cluster items or
users or both based on their attributes.
● In the hybrid approach, we can use more features to get better
predictions. (Currently, we have only 9 features)
References
1. https://grouplens.org/datasets/movielens/100k/ - MovieLens Dataset.
2. https://pdfs.semanticscholar.org/1356/f4eda338b58b2840c5f643a988a10088
06f0.pdf - Machine Learning Based Hybrid Recommendation System
Thanks