This project is an implementation of sentiment analysis on movie reviews using machine learning techniques. The goal of this project is to predict whether a movie review is positive or negative based on its text. The project uses the Movie Reviews dataset from the NLTK library, and the model is trained using logistic regression.
Sentiment analysis, also known as opinion mining, is a natural language processing (NLP) technique used to determine the sentiment expressed in a piece of text. In this project, we apply sentiment analysis to movie reviews to classify them as positive or negative.
The movie reviews dataset is loaded from the NLTK library. The text data is preprocessed by converting it to lowercase, tokenizing the words, and removing stopwords and non-alphanumeric characters.
The text data is vectorized using the TF-IDF (Term Frequency-Inverse Document Frequency) technique. TF-IDF assigns a weight to each word in the document based on its frequency in the document and its rarity across all documents.
We train a logistic regression classifier on the vectorized text data to predict the sentiment of movie reviews.
The model's performance is evaluated using accuracy, which measures the proportion of correctly classified movie reviews in the test set.
To use the sentiment analysis model, follow these steps:
- Input a movie name in the provided text box.
- Click the "Predict Sentiment" button.
- The model will predict whether the movie review is positive or negative.
- The sentiment will be displayed using an emoji and a bar-like representation.
- If available, the IMDb rating of the movie will be shown.
Future updates :
• To be able to create a user-friendly deployable website. • More accurate sentiment analysis using powerful models like BERT.
The following libraries are required to run the code:
- nltk
- scikit-learn
- pandas
- requests
- ipywidgets
Install the dependencies using the following command:
pip install nltk scikit-learn pandas requests ipywidgets