Diabetes Risk Prediction Project

A full end-to-end machine learning and Flask web application that predicts diabetes risk and visualises explainability insights for individual or batch predictions. Built with Python, scikit-learn, pandas, SHAP, and Flask, this project demonstrates both data science excellence and software engineering maturity — from raw data ingestion to interactive model deployment.

⚙️ Two integrated components:

End-to-End Machine Learning Pipeline — data processing → training → evaluation → explainability.

Interactive Flask Dashboard — real-time single & batch prediction app powered by the trained model.

Highlights

Automated ML Pipeline: Modular scripts for loading, preprocessing, EDA, training, evaluation, and model explainability.
Interactive Web Dashboard: Built with Flask + Bootstrap + Chart.js, for clinicians and analysts to interactively explore model predictions.
Explainability: Integrated SHAP/LIME interpretability tools, visualising local and global feature contributions.
Production-ready structure: Logging, testing, CI, and pre-commit hooks aligned with professional ML engineering standards.
Collaborative foundation: Code documented with Javadoc-style docstrings, pytest coverage, and a clean file architecture.

Project Architecture

Diabetes-Risk-Prediction-Project/
├── data/                          # raw dataset (diabetes.csv)
├── models/                        # saved ML models (.joblib)
├── reports/                       # generated plots, explainability & metrics
│   ├── explain/                   # SHAP/LIME visual outputs
│   ├── models/                    # trained model artifacts
│   └── figures/                   # EDA & evaluation plots
├── src/
│   ├── data_loading.py
│   ├── data_processing.py
│   ├── data_exploration.py
│   ├── data_visualisation.py
│   ├── statistical_analysis.py
│   ├── model_training.py
│   ├── model_evaluation.py
│   └── dashboard/                 # Flask dashboard app
│       ├── app.py
│       ├── predict.py
│       ├── routes.py
│       ├── templates/
│       │   └── index.html
│       └── static/
├── tests/                         # unit, integration, and dashboard tests
├── main.py                        # unified pipeline runner
├── requirements.txt
├── pyproject.toml
└── README.md

Part 1 — End-to-End Machine Learning Pipeline

This component implements a complete data science workflow — from ingestion to explainability — using the Pima Indians Diabetes dataset.

⚙️ Workflow Overview

Data Loading: Load data/diabetes.csv into pandas and validate structure. python src/data_loading.py --data ./data/diabetes.csv
Data Preprocessing: Handle missing values, normalize numerical features, and encode categorical variables. python src/data_processing.py --data ./data/diabetes.csv --out reports
Exploratory Data Analysis (EDA): Generate descriptive statistics, correlations, and visualizations (BMI, glucose, etc.). python src/data_exploration.py --data ./data/diabetes.csv --out reports
Statistical Analysis: Run hypothesis tests and feature significance analysis. python src/statistical_analysis.py --data ./data/diabetes.csv --out reports
Model Training: Train Logistic Regression, Random Forest, Gradient Boosting, or XGBoost models. Save the best model to reports/models/.
```
python src/model_training.py --data ./data/diabetes.csv --model rf --out_dir reports
```
Model Evaluation: Evaluate accuracy, ROC AUC, and confusion matrix; save plots. python src/model_evaluation.py --model reports/models/rf_best.joblib --out reports
Explainability & Feature Importance: Generate SHAP plots and local explanations stored under reports/explain/.
Run Entire Pipeline Automatically:
```
python main.py
```

Part 2 — Flask Web Application (Dashboard)

An interactive dashboard that loads the trained model from reports/models/ and enables both single and batch predictions.

Quickstart (Local Run)

# 1. From the repo root
python -m pip install -r requirements.txt

# 2. Run the Flask app
PYTHONPATH=src python src/dashboard/app.py

# 3. Visit
http://127.0.0.1:5000

or explicitly specify a model path:

python src/dashboard/app.py --model reports/models/rf_best.joblib

Features

Panel	Description
Quick Single Prediction	Enter medical features manually → get predicted probability and SHAP explanation.
Batch CSV Upload	Upload a `.csv` file with multiple patients → get batch summary, visualized histogram, and downloadable explainability artifacts.
Notes & Guidance	Practical interpretation guide for clinicians and data scientists.

All predictions, explanations, and generated files are timestamped and stored in reports/explain/.

Screenshots

Single Prediction No Data

Single Prediction With Data

Batch Prediction No Data

Batch Prediction With Data

Testing Strategy

Run tests locally before pushing:

PYTHONPATH=src pytest -q

The repository includes:

Unit tests: For ModelWrapper, preprocessing, and data loaders.
API tests: Flask routes and endpoints (/predict, /predict_batch).
Integration tests: Pipeline execution to ensure end-to-end consistency.

CI/CD integration (via GitHub Actions) ensures tests run automatically on every push.

Technologies Used

Language: Python 3.11+
Core Libraries: pandas, numpy, scikit-learn, joblib, shap, matplotlib, seaborn
Web Framework: Flask + Bootstrap + Chart.js
Testing: pytest, pre-commit, black, isort, ruff
Tools: VS Code, GitHub Actions CI, pre-commit hooks, reportlab for PDF export

Outputs

Folder	Description
`reports/models/`	Trained model artifacts (`.joblib`)
`reports/explain/`	SHAP local & global explanations
`reports/`	EDA visuals, evaluation plots, and logs
`data/`	Input dataset (`diabetes.csv`)
`tests/`	Pytest suite

Deployment Notes

For production:

Replace app.secret_key with an environment variable.
Serve via Gunicorn or Waitress instead of Flask’s dev server.
Mount static files via Nginx.
Optionally containerize using Docker with health checks.

Example Use Case

This dashboard enables clinicians or data scientists to:

Instantly assess diabetes risk for new patients.
Interpret which medical features contribute most to the prediction.
Batch-evaluate risk profiles for large datasets.
Export explainability artifacts (HTML/PNG) for audit and reporting.

Acknowledgements

Special thanks to:

The National Institute of Diabetes and Digestive and Kidney Diseases — for the original dataset.
OpenAI’s ChatGPT (GPT-5) — for advanced assistance in refactoring, debugging, and structuring production-ready code, documentation, and CI integration.
The open-source community for continuous innovation in Python, Flask, and ML tooling.

Author

Adrian Adewunmi

GitHub

📄 License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 103 Commits
.github/workflows		.github/workflows
data		data
models		models
reports		reports
src		src
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
main.py		main.py
pytest.ini		pytest.ini
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Diabetes Risk Prediction Project

Highlights

Project Architecture

Part 1 — End-to-End Machine Learning Pipeline

⚙️ Workflow Overview

Part 2 — Flask Web Application (Dashboard)

Quickstart (Local Run)

Features

Screenshots

Testing Strategy

Technologies Used

Outputs

Deployment Notes

Example Use Case

Acknowledgements

Author

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

AAdewunmi/Diabetes-Risk-Prediction-Project

Folders and files

Latest commit

History

Repository files navigation

Diabetes Risk Prediction Project

Highlights

Project Architecture

Part 1 — End-to-End Machine Learning Pipeline

⚙️ Workflow Overview

Part 2 — Flask Web Application (Dashboard)

Quickstart (Local Run)

Features

Screenshots

Testing Strategy

Technologies Used

Outputs

Deployment Notes

Example Use Case

Acknowledgements

Author

📄 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages