NTI_Final_Project

Home Credit Default Risk

Description for GitHub Repository

Here’s a detailed and engaging description for your GitHub repository:

Home Credit Default Risk

📊 Predict Credit Default Risk using Machine Learning

Welcome to the Home Credit Default Risk project! This repository showcases a complete end-to-end pipeline for predicting credit default risk using structured data provided by Home Credit. The goal is to identify potential loan defaulters to assist financial institutions in minimizing risk while maximizing customer satisfaction.

🚀 Features

Data Loading & Preprocessing:
- Efficient loading of multiple datasets.
- Comprehensive preprocessing: handling missing values, encoding categorical features, and scaling numeric features.
Feature Aggregation & Engineering:
- Advanced aggregation techniques for POS, credit card, and installment payments.
- Intelligent merging of datasets for holistic feature representation.
Machine Learning Pipeline:
- Supports multiple models: XGBoost, Random Forest, Logistic Regression.
- Hyperparameter flexibility for tuning models.
- Robust evaluation using metrics like ROC-AUC and confusion matrix.
Visualization:
- Feature importance.
- Correlation matrices.
- ROC and precision-recall curves.
Streamlit Application:
- Interactive UI for exploring the pipeline.
- Upload datasets, preprocess data, train models, and evaluate performance.

📂 Project Structure

├── data/                   # Raw and processed datasets
├── src/                    # Core scripts for pipeline
│   ├── load_data.py        # Data loading
│   ├── join.py             # Aggregation and merging
│   ├── preprocessing.py    # Data preprocessing
│   ├── train_model.py      # Model training
│   ├── evaluate_model.py   # Model evaluation
│   └── visualize.py        # Visualizations
├── main.py                 # Pipeline orchestration
├── streamlit_app.py        # Streamlit interactive app
├── README.md               # Project documentation
├── requirements.txt        # Python dependencies

📈 Data Workflow

Load Data: Load multiple datasets like application, credit card, and POS balance data.
Join & Aggregate: Combine datasets and engineer new features.
Preprocess: Handle missing values, encode features, and scale data.
Train Models: Experiment with XGBoost, Random Forest, and Logistic Regression.
Evaluate & Visualize: Assess model performance and visualize insights.

🔧 How to Use

1. Run Locally

# Clone the repository
git clone https://github.com/yourusername/home-credit-default-risk.git
cd home-credit-default-risk

# Install dependencies
pip install -r requirements.txt

# Run the pipeline
python main.py

2. Launch Streamlit App

# Run Streamlit app
streamlit run streamlit_app.py

🛠️ Built With

Python
Pandas and NumPy: Data manipulation.
Scikit-learn: Preprocessing and evaluation.
XGBoost: Advanced machine learning.
Matplotlib and Seaborn: Data visualization.
Streamlit: Interactive app.

💡 Future Enhancements

Incorporate advanced models (e.g., CatBoost, LightGBM).
Add automated hyperparameter tuning.
Extend visualization capabilities.
Include time-series analysis for sequential datasets.

📄 License

This project is licensed under the MIT License.

🤝 Contributing

Contributions are welcome! If you’d like to improve the project or fix a bug:

Fork the repository.
Create your feature branch: git checkout -b feature-name.
Commit your changes: git commit -m 'Add feature-name'.
Push to the branch: git push origin feature-name.
Open a pull request.

📬 Contact

For any questions, feel free to reach out via [email protected] or create an issue.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Download Data		Download Data
notebooks		notebooks
src		src
Home-Credit-Default-Risk-Kaggle-Competition (1).pptx		Home-Credit-Default-Risk-Kaggle-Competition (1).pptx
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
streamlite.py		streamlite.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NTI_Final_Project

Description for GitHub Repository

Home Credit Default Risk

🚀 Features

📂 Project Structure

📈 Data Workflow

🔧 How to Use

1. Run Locally

2. Launch Streamlit App

🛠️ Built With

💡 Future Enhancements

📄 License

🤝 Contributing

📬 Contact

About

Releases

Packages

Languages

Kareem-Ayman-salama/Home_Credit_Default_Risk

Folders and files

Latest commit

History

Repository files navigation

NTI_Final_Project

Description for GitHub Repository

Home Credit Default Risk

🚀 Features

📂 Project Structure

📈 Data Workflow

🔧 How to Use

1. Run Locally

2. Launch Streamlit App

🛠️ Built With

💡 Future Enhancements

📄 License

🤝 Contributing

📬 Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages