This project aims to develop a model to predict employee churn based on past professional, demographic, and attrition data. The goal is to help leadership allocate resources efficiently for employee retention, forecast headcount gaps, and develop succession plans.
- Introduction to Problem
- Feature Engineering
- Model Overview
- Model Accuracy
- Business Implications
- Implementation Plan
Leadership wants to know an efficient way to allocate resources for employee retention. Teams need to forecast headcount gaps and develop succession plans. Management wants to implement proactive measures to reduce turnover costs.
Develop a model based on past employee professional, demographic, and attrition data to forecast churn in a given year.
The dataset includes the following features:
- Education Level
- Joining Year
- City
- Payment Tier
- Age
- Gender
- Benched
- Experience in Current Field
- Leave or Not
- No significant correlation suggesting collinearity.
- Moderate correlation between office city and education level.
- Non-perfect correlation between one-hot encoded variables with more than two possibilities.
Four models were developed and compared:
- Decision Tree Classifier
- AdaBoost Classifier
- Random Forest
- Tuned Random Forest
Decision Tree Classifier
- Criterion:
gini
- Max Depth: 3
AdaBoost Classifier
- Number of Estimators: 100
Random Forest
- Criterion:
gini
- Number of Estimators: 100
- Max Depth: 5
- Max Features:
sqrt
- Min Samples Leaf: 2
- Min Samples Split: 10
Tuned Random Forest
- Criterion:
gini
- Number of Estimators: 250
- Max Depth: 7
- Max Features:
sqrt
- Min Samples Leaf: 2
- Min Samples Split: 5
Metric | 0 | 1 | Avg |
---|---|---|---|
Precision | 0.84 | 0.94 | 0.89 |
Recall | 0.98 | 0.65 | 0.81 |
F1-Score | 0.91 | 0.77 | 0.84 |
Support | 610 | 321 | 931 |
- Insights into variables leading to a drop in Gini impurity.
- Dependent on RandomForest model.
- Low tenure employees predicted to leave are likely to leave.
- PhD holders predicted to leave are likely to leave.
- Employees who are benched are likely to leave.
- Invest in professional development and favorable pay packages for low tenure group.
- Engage benched employees with teams.
- Recognize that PhD holders have options and may require special attention.
- Add quantitative variables such as hours per week worked and years since last promotion to improve model performance.
- Implement proactive measures based on model insights.
- Monitor and adjust resource allocation strategies.
- Develop a comprehensive succession plan based on forecasted headcount gaps.
data/Employee.csv
: The dataset used for model development.notebooks/Employee_Prediction.ipynb
: Jupyter notebook with model development and analysis.scripts/Employee_Prediction.R
: R script for data processing and feature engineering.
- Clone the repository:
git clone https://github.com/yourusername/Employee-Prediction.git
cd Employee-Prediction