- Introduction
- Technologies Used
- Project Workflow
- Exploratory Data Analysis (EDA)
- Feature Engineering
- Machine Learning Model
- Dockerization
- Deployment
- How to Run Locally
- API Endpoints
- Future Work
- Acknowledgements
The Car-Price-Prediction-App is a machine learning-based web application that predicts car prices based on user input. This project focuses on:
- Understanding the dataset through Exploratory Data Analysis (EDA).
- Enhancing performance with Feature Engineering.
- Employing modern DevOps practices for deployment using Docker and AWS ECS.
- Streamlining development and deployment using CI/CD pipelines via GitHub Actions.
- Framework: Flask
- Version Control: Git
- Data Tracking: DVC
- Experiment Tracking: MLFlow
- Containerization: Docker
- Cloud Deployment: AWS ECS with Fargate
- CI/CD: GitHub Actions
- EDA: Analyze and visualize the dataset to uncover trends and insights.
- Feature Engineering: Transform raw data into meaningful features.
- Model Development: Train and log models using MLFlow.
- Dockerization: Containerize the application.
- Deployment: Deploy the Dockerized app to AWS ECS and configure CI/CD.
- Objective: Understand the data distribution and identify trends affecting car prices.
- Techniques Used:
- Correlation analysis.
- Visualizations: scatter plots, histograms, heatmaps.
- Outlier detection and treatment.
Key Findings:
- Engine size and car brand significantly influence car prices.
- Certain features required transformations for better model accuracy.
In this project, a significant amount of time was spent on Exploratory Data Analysis (EDA) to understand the dataset before proceeding to model training. Below are some key visualizations from the EDA process:
- Transformations: Applied log transformations for skewed features.
- Encoding: Used one-hot encoding for categorical variables.
- Scaling: Standardized numerical features.
- Feature Selection: Retained only the most impactful features for prediction.
- Algorithm: Random Forest Regressor (or specify your model).
- Tools:
- DVC: To track and version raw and processed datasets.
- MLFlow: For tracking model metrics, hyperparameters, and outputs.
The application is containerized using Docker for consistent deployment across environments:
- Built a Docker image using the
Dockerfile
. - Tagged and pushed the image to Amazon ECR.
- Configured the container to serve predictions via Flask.
- Docker Image: Hosted on Amazon ECR.
- Orchestration: Managed with AWS ECS Fargate.
- Networking: Configured security groups and load balancer for external access.
- Monitoring: Logs and metrics tracked via AWS CloudWatch.
- GitHub Actions Workflow:
- Builds the Docker image.
- Runs health check tests.
- Deploy the image to ECS upon passing all tests.
- Clone the repository:
git clone https://github.com/your-repo/car-price-prediction-app.git cd car-price-prediction-app
- Install dependencies:
pip install -r requirements.txt
- Run Flask app:
python app.py
The following screenshots show the API call examples for predicting car prices and testing the app.
- Model Enhancements: Experiment with advanced algorithms like Gradient Boosting (XGBoost, LightGBM) or Neural Networks to improve prediction accuracy.
- Scalability:
- Implement auto-scaling in AWS ECS to handle varying traffic loads dynamically.
- Explore serverless options like AWS Lambda for specific components to optimize costs.
- User Interface Improvements:
- Create an intuitive dashboard for predictions and EDA visualizations using tools like Dash or Streamlit.
- Add interactive elements for custom data input and insights.
- Data Pipeline Automation: Automate data ingestion, preprocessing, and model retraining using AWS Step Functions or Apache Airflow.
- Monitoring and Alerts: Integrate a robust monitoring system with tools like Prometheus and Grafana to monitor app performance and receive alerts for failures or anomalies.
- MLOps Integration:
- Implement continuous training pipelines to keep the model updated with new data.
- Explore feature stores for better feature management and sharing.
-
Dataset Source:
The dataset used for this project is publicly available at https://www.kaggle.com/datasets/mohidabdulrehman/ultimate-car-price-prediction-dataset. -
Tools and Platforms:
- Flask for building the web application.
- DVC for data and model versioning.
- MLFlow for experiment tracking and model management.
- Docker for containerizing the application.
- AWS ECS and AWS Fargate for deployment and orchestration.
- GitHub Actions for CI/CD pipeline integration.