This project performs a controlled comparative analysis of recurrent neural architectures (RNN, LSTM, Bidirectional LSTM) for binary sentiment classification on the IMDb movie reviews dataset.
The experiments systematically vary activation function, optimizer, sequence length, and gradient clipping, while reporting Accuracy, F1 (macro), and training time per epoch under CPU-only constraints.
project_root/
├── report.pdf
├── README.md
├── requirements.txt
├── data/
│ ├── imdb_seq_len25.npz
│ ├── imdb_seq_len50.npz
│ ├── imdb_seq_len100.npz
│ ├── imdb_stats.json
│ ├── imdb_vocab.pkl
│ └── raw/
│ └── IMDB_Dataset.csv
├── results/
│ ├── metrics.csv
│ ├── summary_table.csv
│ ├── losses/
│ └── plots/
│ ├── acc_vs_seq_length.png
│ ├── f1_vs_seq_length.png
│ ├── loss_curve_best.png
│ └── loss_curve_worst.png
└── src/
├── preprocess.py
├── models.py
├── train.py
├── run_experiments.py
├── evaluate.py
├── plot_losses.py
├── plot_metrics.py
└── utils.py
Tested with Python 3.12.
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtKey packages: torch, numpy, pandas, scikit-learn, nltk, tqdm, matplotlib.
The project uses the IMDb dataset (50,000 reviews, 25k train / 25k test, balanced classes).
Place the CSV in data/raw/IMDB_Dataset.csv. The preprocessing step lowercases text, removes punctuation, tokenizes, caps vocabulary at 10,000 words, and produces padded datasets for sequence lengths 25, 50, 100.
python src/preprocess.pyAll models share the same base configuration for fairness:
Embedding dimension: 100, Hidden size: 64, Layers: 2, Dropout: 0.3, Batch size: 32, Loss function: Binary Cross-Entropy,
Activations tested: Sigmoid, Tanh, ReLU, Optimizers tested: Adam, RMSprop, SGD, Sequence lengths: 25, 50, 100,
Gradient clipping: Enabled / Disabled, Epochs: 8, Seed: 42.
Each experiment varies one factor at a time while fixing others to ensure valid comparison.
| Stage | Factor Varied | Fixed Configuration | Runs |
|---|---|---|---|
| A | Architecture (RNN, LSTM, BiLSTM) | ReLU + Adam + seq=50 + no clip | 3 |
| B | Activation (Sigmoid, ReLU, Tanh) | LSTM + Adam + seq=50 + no clip | 3 |
| C | Optimizer (Adam, SGD, RMSprop) | LSTM + ReLU + seq=50 + no clip | 3 |
| D | Sequence Length (25, 50, 100) | LSTM + ReLU + Adam + no clip | 3 |
| E | Gradient Clipping (On vs Off) | LSTM + ReLU + Adam + seq=50 | 2 |
Note: Duplicate runs with the exact same parameter configurations were avoided.
Run all experiments with:
python src/run_experiments.pyAfter training, generate plots and summary tables:
python src/evaluate.pyOutputs include accuracy/F1 plots, loss curves, and summary CSVs under results/plots.
- Best configuration: LSTM (ReLU, Adam, seq=100, no clip) - Accuracy: 0.815, F1: 0.815, Time/Epoch: 17.46s
- Worst configuration: LSTM (ReLU, SGD, seq=50, no clip) - Accuracy: 0.500, F1: 0.498
Longer sequences improved performance, and Adam provided the best balance of speed and stability. Gradient clipping slightly stabilized training without significant accuracy gain.
All experiments are reproducible. Random seeds were fixed across PyTorch, NumPy, and Python, and deterministic algorithms were enabled for consistent results.
All models were trained in a CPU-only environment with 8 GB RAM using the IMDb dataset’s standard split. Identical configurations guarantee the same metrics on rerun.
- Preprocessed data in
data/ - Metrics in
results/metrics.csv - Plots in
results/plots/ - Final report:
report.pdf
For academic and educational use only. Please refer to the IMDb dataset license on Kaggle for redistribution terms.