Rossmann Retail Sales Forecasting

About the Project

Rossmann operates over 3,000 drug stores across 7 European countries. Store managers are tasked with predicting daily sales up to six weeks in advance. Sales are influenced by factors including promotions, competition, school and state holidays, seasonality, and store locality.

This project builds a regression-based machine learning model to forecast sales for 1,115 Rossmann stores using historical data, helping the business make data-driven decisions on budgets, hiring, incentives, and growth plans.

Dataset

Stores: 1,115 Rossmann stores across Europe
Features: Store type, assortment, promotions, competition distance, school/state holidays, day of week, and more
Target: Daily sales revenue
Download Links:

Approach

Exploratory Data Analysis (EDA) — Analyzed sales trends, seasonality, promotional impact, and store-level patterns
Feature Engineering — Extracted and transformed features from promotions, competition, holidays, and temporal attributes
Model Benchmarking — Trained and compared 5 regression models to identify the best performer
Evaluation — Used MAE, MAPE, and RMSE as evaluation metrics

Model Comparison

Model	MAE	MAPE (%)	RMSE
Random Forest Regressor	383.06	5.46	577.59
SARIMA	365.87	12.66	434.03
XGBoost Regressor	509.39	7.27	739.63
Linear Regression	1045.57	15.05	1458.30
LR Lasso	1107.31	15.66	1582.54

Random Forest Regressor achieved the best balance of performance with the lowest MAPE of 5.46%, meaning predictions deviate from actual sales by only ~5.5% on average.

Best Model Configuration

RandomForestRegressor(
    n_estimators=30,
    random_state=42,
    criterion='gini',
    max_depth=None,
    min_samples_split=2,
    min_samples_leaf=1,
    max_features='auto',
    bootstrap=True,
    oob_score=False,
    class_weight=None
)

Tech Stack

Language: Python
Libraries: Pandas, NumPy, Scikit-learn, XGBoost, Statsmodels (SARIMA), Matplotlib, Seaborn
Models: Random Forest, SARIMA, XGBoost, Linear Regression, Lasso Regression

How to Run

Clone the repository:

git clone https://github.com/varshil009/Rossmann-Regression.git
cd Rossmann-Regression

Install dependencies:

pip install pandas numpy scikit-learn xgboost statsmodels matplotlib seaborn

Download the dataset using the links above and place the files in the project directory.

Run the notebook:

jupyter notebook "Rossman Regression.ipynb"

Project Structure

Rossmann-Regression/
├── ML_process.ipynb            # ML pipeline and model training
├── Rossman Regression.ipynb    # EDA and data preprocessing
└── README.md                   # Project documentation

Key Takeaways

Random Forest outperformed all other models on MAPE (5.46%), the most business-relevant metric for sales forecasting
SARIMA achieved the lowest MAE but had a significantly higher MAPE (12.66%), indicating inconsistent percentage-wise accuracy across stores
Feature engineering on temporal and promotional features was critical to improving model performance
Forecasting at store level enables targeted business decisions for budgets, staffing, and inventory management

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Rossmann Retail Sales Forecasting

About the Project

Dataset

Approach

Model Comparison

Best Model Configuration

Tech Stack

How to Run

Project Structure

Key Takeaways

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
ML_process.ipynb		ML_process.ipynb
README.md		README.md
Rossman Regression.ipynb		Rossman Regression.ipynb

Folders and files

Latest commit

History

Repository files navigation

Rossmann Retail Sales Forecasting

About the Project

Dataset

Approach

Model Comparison

Best Model Configuration

Tech Stack

How to Run

Project Structure

Key Takeaways

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages