Car-Price-Prediction-App

Introduction

The Car-Price-Prediction-App is a machine learning-based web application that predicts car prices based on user input. This project focuses on:

Understanding the dataset through Exploratory Data Analysis (EDA).
Enhancing performance with Feature Engineering.
Employing modern DevOps practices for deployment using Docker and AWS ECS.
Streamlining development and deployment using CI/CD pipelines via GitHub Actions.

Technologies Used

Framework: Flask
Version Control: Git
Data Tracking: DVC
Experiment Tracking: MLFlow
Containerization: Docker
Cloud Deployment: AWS ECS with Fargate
CI/CD: GitHub Actions

Project Workflow

EDA: Analyze and visualize the dataset to uncover trends and insights.
Feature Engineering: Transform raw data into meaningful features.
Model Development: Train and log models using MLFlow.
Dockerization: Containerize the application.
Deployment: Deploy the Dockerized app to AWS ECS and configure CI/CD.

Exploratory Data Analysis (EDA)

Objective: Understand the data distribution and identify trends affecting car prices.
Techniques Used:
- Correlation analysis.
- Visualizations: scatter plots, histograms, heatmaps.
- Outlier detection and treatment.

Key Findings:

Engine size and car brand significantly influence car prices.
Certain features required transformations for better model accuracy.

In this project, a significant amount of time was spent on Exploratory Data Analysis (EDA) to understand the dataset before proceeding to model training. Below are some key visualizations from the EDA process:

1. Price Distribution by Category

2. Price Distribution by Model

3. Feature Correlation

4. Outlier Analysis

Feature Engineering

Transformations: Applied log transformations for skewed features.
Encoding: Used one-hot encoding for categorical variables.
Scaling: Standardized numerical features.
Feature Selection: Retained only the most impactful features for prediction.

Machine Learning Model

Algorithm: Random Forest Regressor (or specify your model).
Tools:
- DVC: To track and version raw and processed datasets.
- MLFlow: For tracking model metrics, hyperparameters, and outputs.

Dockerization

The application is containerized using Docker for consistent deployment across environments:

Built a Docker image using the Dockerfile.
Tagged and pushed the image to Amazon ECR.
Configured the container to serve predictions via Flask.

Deployment

AWS ECS Deployment

Docker Image: Hosted on Amazon ECR.
Orchestration: Managed with AWS ECS Fargate.
Networking: Configured security groups and load balancer for external access.
Monitoring: Logs and metrics tracked via AWS CloudWatch.

CI/CD Pipeline

GitHub Actions Workflow:
- Builds the Docker image.
- Runs health check tests.
- Deploy the image to ECS upon passing all tests.

How to Run Locally

Clone the repository:

git clone https://github.com/your-repo/car-price-prediction-app.git
cd car-price-prediction-app

Install dependencies:
```
pip install -r requirements.txt
```
Run Flask app:
```
python app.py
```

API Endpoints

The following screenshots show the API call examples for predicting car prices and testing the app.

1. API Call Screenshot for Price Prediction

2. API Call Screenshot for health check Endpoint

Future Work

Model Enhancements: Experiment with advanced algorithms like Gradient Boosting (XGBoost, LightGBM) or Neural Networks to improve prediction accuracy.
Scalability:
- Implement auto-scaling in AWS ECS to handle varying traffic loads dynamically.
- Explore serverless options like AWS Lambda for specific components to optimize costs.
User Interface Improvements:
- Create an intuitive dashboard for predictions and EDA visualizations using tools like Dash or Streamlit.
- Add interactive elements for custom data input and insights.
Data Pipeline Automation: Automate data ingestion, preprocessing, and model retraining using AWS Step Functions or Apache Airflow.
Monitoring and Alerts: Integrate a robust monitoring system with tools like Prometheus and Grafana to monitor app performance and receive alerts for failures or anomalies.
MLOps Integration:
- Implement continuous training pipelines to keep the model updated with new data.
- Explore feature stores for better feature management and sharing.

Acknowledgements

Dataset Source:
The dataset used for this project is publicly available at https://www.kaggle.com/datasets/mohidabdulrehman/ultimate-car-price-prediction-dataset.
Tools and Platforms:
- Flask for building the web application.
- DVC for data and model versioning.
- MLFlow for experiment tracking and model management.
- Docker for containerizing the application.
- AWS ECS and AWS Fargate for deployment and orchestration.
- GitHub Actions for CI/CD pipeline integration.

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
.dvc		.dvc
.github/workflows		.github/workflows
data		data
notebooks/data_preprocessing		notebooks/data_preprocessing
public		public
src		src
.dockerignore		.dockerignore
.dvcignore		.dvcignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
app.py		app.py
models.dvc		models.dvc
requirements.txt		requirements.txt
setup.py		setup.py
test_health.py		test_health.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Car-Price-Prediction-App

Table of Contents

Introduction

Technologies Used

Project Workflow