The challenge requested to build an fully automated end-to-end Machine Learning infrastructure, including reading data from a feature store, automated model creation with hyperoptimization tuning and automatically finding the most efficient model, storing models in a model registry, building a CI/CD pipeline for deploying the registered model into a production environment, serving the model using HTTP API or Streaming interfaces, implementing monitoring, writing tests and linters, creating regular automated reporting.
Note: this is a personal project to demonstrate an automated ML infrastructure. Security was not in the scope of this project. For deploying this infrastructure in a production environment please ensure proper credentials are set for each service and the infrastructure is not exposing unwanted endpoints to the public internet.
There is a Prefect orchestrator to run 2 flows on a scheduler basis. The Hyperoptimization deployment flow
is executed by a Prefect Agent
by pulling training data from the AWS S3 feature store
, runs hyperoptimization ML model builds and save each model (around 50 models per run) in the MLFlow model registry. On each run it also finds the most efficient model and it registers it in MlFlow to be ready for deployment.
An engineer must decide when and which models should be deployed. First copies the RUN_ID
of the selected deployment model from ML Flow and update .env.local
(or .env.cloud
for cloud deployment) with the new RUN_ID
field.
Once the RUN_ID
is updated, Github Actions triggers a new pipeline-run which will run tests and restarts the 2 servers (http-api and kinesis-streams servers). The servers will start by automatically loading the new model from ML Flow.
The Business Simulation using AWS Kinesis Streams
simulates business regularly (every 60sec) sending events to Kinesis stream for prediction. ML Model Serving Kinesis Stream service
is a ML serving server using Kinesis stream as input and output for running predictions is realtime.
The Business Simulation using HTTP API
simulates business regularly (every 60sec) sending http requests. ML Model Serving Flask HTTP API service
is a ML serving server using http APIs for running predictions in realtime. On each prediction request, input and prediction is saved in MongoDB for later processing and also sent to Evidently
for monitoring.
Evidently
is calculating data drift to understand if the running predictions are degrading over time.
Prometheus
is storing monitoring data and Grafana
is providing a dashboard UI to monitor the prediction data drift in realtime.
The Batch reporting flow
is running regularly (every 3 hours) on the MongoDB data to generate data drift reports. These reports are saved as html
and the File server - Nginx
gives access to the report files.
Data: Credit Card Churn Prediction
Description: A business manager of a consumer credit card bank is facing the problem of customer attrition. They want to analyze the data to find out the reason behind this and leverage the same to predict customers who are likely to drop off.
Source: https://www.kaggle.com/datasets/anwarsan/credit-card-bank-churn
- cleanup data
- exploratory data analysis
- train model
- ml pipeline for hyperparameter tuning
- model registry
- ML-serve API server
- ML-serve Stream server (optional)
- tests (partial)
- linters
- Makefile and CI/CD
- deploy to cloud
- logging and monitoring
- batch reporting
- docker and docker-compose everything
- reporting server
To do reminders:
- deploy on filechange only. Maybe with version as tag.
-
Removes rows with "Unknown" records, removes irellevant columns, lowercase column names, lowercase categoriacal values.
-
Categories that can be ordered hiarachically are converted into ints, like "income" or "education level".
-
Checks correlations on numerical columns.
-
Checks count on categorical columns.
-
Prepare a model using XGBoost.
-
Input data is split into 66%, 33%, 33% for training, validation and test data.
-
Measures MAE, MSE, RMSE.
-
Measures % of deviated predictions based on month threshold.
- dockerized MLFlow: Dockerfile-mlflow
- MLFlow UI:
http://localhost:5051
System is using Prefect to orchestrate DAGs. Every few hours, Prefect Agent will start and read the training data from S3, it will build models using XGBoost by running hyperparameterization on the configurations, generating 50 models and calculating accuracy (rmse) for each of them. All 50 models are registered in the MLFlow model registry experiments. At the end of each run, the best model will be registered in MLFlow as ready for deployment.
-
model training Prefect flow: model_train_flow.py
-
dockerized Prefect Server: Dockerfile-prefect
-
dockerized Prefect Agent: Dockerfile-prefect-agent
-
Prefect UI:
http://localhost:4200
There are 2 ML service servers. One serving predictions using HTTP API build in Python with Flask. Second serving predictions using AWS Kinesis streams, both consuming and publishing results back.
-
model serving using Python Flask HTTP API: predict-api-server.py
-
model serving using Python and AWS Kinesis Streams: serve_kinesis.py
-
dockerized Flask API server: Dockerfile-serve-api
-
dockerized AWS Kinesis server: Dockerfile-serve-kinesis
There are 2 Python scripts to simulate business requesting predictions from ML servers. One request data from HTTP API server and another one sending events to predictions
Kinesis stream and receiving results to results
Kinesis stream.
-
sending data for prediction using HTTP API: send_data-api.py
-
sending data for prediction using AWS Kinesis Streams: serve_kinesis.py
-
dockerized sending data to HTTP API: Dockerfile-send-data-api
-
dockerized sending data to AWS Kinesis Streams: Dockerfile-send-data-kinesis
There are 3 services for monitoring the model predictions is realtime:
- Evidently AI for calculating data drift. Evidently UI:
- Prometheus for collecting monitoring data. Prometheus UI:
- Grafana for Dashboards UI. Grafana UI: http://localhost:3000 (default user/pass: admin, admin)
There is a Prefect flow to generate reporting using Evidently: create_report.py. This will generate reports every few hours save them in MongoDB and also generate static html pages with all data charts.
Report file example: ...
There is also an Nginx server to expose these html reports.
- Nginx server: nginx
- Nginx address:
http://localhost:8888/
All containers are put together in docker compose files for easy deployment of the entire infrastructure. Docker-compose if perfect for this project, for a more advanced production environment where each service is deployed in different VM, I recommend using more advance tools.
- Deployment: model training: docker-compose-model-registry.yml
- Deployment: model serving: docker-compose-serve.yml
All deployment commands are grouped using the Makefile for simplicity of use.
The environment variables should be in .env
file. The Makefile will use one of these: .env.local or .env.cloud.
$> make help
Commands:
run: make run_tests to run tests locally
run: make reset_all to delete all containers and cleanup volumes
run: make setup-model-registry env=local (or env=cloud) to start model registry and training containers
run: make init_aws to setup and initialize AWS services (uses localstack container)
run: make apply-model-train-flow to apply the automated model training DAG
run: make setup-model-serve env=local (or env=cloud) to start the model serving containers
run: make apply-prediction-reporting to apply the automated prediction reporting DAG
run: make stop-serve to stop the model servers (http api and Stream)
run: make start-serve env=local to start the model servers (http api and Stream)
The continuos deployment is done using Github actions. Once a change is made to the repo, the deployment pipeline is triggered. This will restart the model servers to load a new model from the MLFlow model registry. The deployed model is always specified in .env.cloud
file under RUN_ID
environment variable.
The pipeline will:
-
run tests
-
ssh in the cloud virtual machine
-
restart model-server-api and model-server-streams containers
-
Github Actions runs: https://github.com/razorcd/mlops-project/actions
-
Github Actions configs: https://github.com/razorcd/mlops-project/blob/main/.github/workflows/github-actions-deployment.yml
To deploy in the cloud, the steps are similar except: use you cloud VM domain instead of localhost to access the UIs and replace env=local
with env=cloud
-
install
docker
,docker compose
,make
-
run
make reset_all
to ensure any existing containers are removed -
run
make setup-model-registry env=local
to start model training infrastructure -
open
http://localhost:5051
to see MLFlow UI. -
run
make init_aws
to setup training data and streams in AWS -
run
make apply-model-train-flow
to apply model training script to the orchestrator. This will run trainings regularly. -
open
http://localhost:4200/#deployments
, it will show themodel_tuning_and_uploading
deployment scheduled. Start aQuick Run
to not wait for the scheduler. This will run the model training pipeline and upload a bunch of models to MLFlow server and register the best model. -
from the MLFlow UI decide which model you want to deploy. Get the Run Id of the model and update
RUN_ID
in.env.local
file (or.env.cloud
for cloud deployment) -
run
make setup-model-serve env=local
to start prediction servers -
request a prediction using http API:
$> curl -X POST -H 'Content-Type: application/json' http://127.0.0.1:9696/predict -d '{"customer_age":50,"gender":"M","dependent_count":2,"education_level":3,"marital_status":"married","income_category":2,"card_category":"blue","months_on_book":4,"total_relationship_count":3,"credit_limit":4000,"total_revolving_bal":2511}'
{"churn chance":0.5,"model_run_id":"70cc813fa2d64c598e3f5cd93ad674af"}
- run
make apply-prediction-reporting
to apply reporting script to the orchestrator. This will generate reports regularly. - open
evidently_data_reporting
deployment. This runs every 3 hours. The system needs to collect 3+ hours of predictions data first before generating any report. Running the reporting manually at this time will not generate reports yet. - open
http://localhost:8888/
to see generated reports after 3+ hours. - open
http://localhost:8085/metrics
to see prometheus data. - open
http://localhost:3000/dashboards
to see Grafana realtime monitoring dashboard of data drift. (default user/pass: admin, admin)
Optionally:
- publish to Kinesis.
data
is the request json payload base64 encoded
aws kinesis put-record \
--stream-name predictions --endpoint-url=http://localhost:4566 \
--partition-key 1 \
--data "ewogICAgICAiY3VzdG9tZXJfYWdlIjogMTAwLAogICAgICAiZ2VuZGVyIjogIkYiLAogICAgICAiZGVwZW5kZW50X2NvdW50IjogMiwKICAgICAgImVkdWNhdGlvbl9sZXZlbCI6IDIsCiAgICAgICJtYXJpdGFsX3N0YXR1cyI6ICJtYXJyaWVkIiwKICAgICAgImluY29tZV9jYXRlZ29yeSI6IDIsCiAgICAgICJjYXJkX2NhdGVnb3J5IjogImJsdWUiLAogICAgICAibW9udGhzX29uX2Jvb2siOiA2LAogICAgICAidG90YWxfcmVsYXRpb25zaGlwX2NvdW50IjogMywKICAgICAgImNyZWRpdF9saW1pdCI6IDQwMDAsCiAgICAgICJ0b3RhbF9yZXZvbHZpbmdfYmFsIjogMjUwMAogICAgfQ=="
- consume from Kinesis
aws kinesis get-shard-iterator --shard-id shardId-000000000000 --endpoint-url=http://localhost:4566 --shard-iterator-type TRIM_HORIZON --stream-name results --query 'ShardIterator'
# copy shard iterator without quotes
aws kinesis get-records --endpoint-url=http://localhost:4566 --shard-iterator {SHARD_ITERATOR_HERE}
# decode Data based64
- run
docker logs -t {container}
to see logs for now
The entire infrastructure was deployed in the cloud using virtual machine provided by Digital Ocean.
Links:
- ML Flow model registry: http://188.166.115.79:5051/
- Prefect orchestrator: http://188.166.115.79:4200/
- predict using API:
curl -X POST -H 'Content-Type: application/json' http://188.166.115.79:9696/predict -d '{"customer_age":50,"gender":"M","dependent_count":2,"education_level":3,"marital_status":"married","income_category":2,"card_category":"blue","months_on_book":4,"total_relationship_count":3,"credit_limit":4000,"total_revolving_bal":2511}'
{"churn chance":0.5,"model_run_id":"8a19dc8026dc4e4cb972ad84194940fd"}
- publish to Kinesis.
data
is the request json payload base64 encoded
aws kinesis put-record \
--stream-name predictions --endpoint-url=http://188.166.115.79:4566 \
--partition-key 1 \
--data "ewogICAgICAiY3VzdG9tZXJfYWdlIjogMTAwLAogICAgICAiZ2VuZGVyIjogIkYiLAogICAgICAiZGVwZW5kZW50X2NvdW50IjogMiwKICAgICAgImVkdWNhdGlvbl9sZXZlbCI6IDIsCiAgICAgICJtYXJpdGFsX3N0YXR1cyI6ICJtYXJyaWVkIiwKICAgICAgImluY29tZV9jYXRlZ29yeSI6IDIsCiAgICAgICJjYXJkX2NhdGVnb3J5IjogImJsdWUiLAogICAgICAibW9udGhzX29uX2Jvb2siOiA2LAogICAgICAidG90YWxfcmVsYXRpb25zaGlwX2NvdW50IjogMywKICAgICAgImNyZWRpdF9saW1pdCI6IDQwMDAsCiAgICAgICJ0b3RhbF9yZXZvbHZpbmdfYmFsIjogMjUwMAogICAgfQ=="
- consume from Kinesis
aws kinesis get-shard-iterator --shard-id shardId-000000000000 --endpoint-url=http://188.166.115.79:4566 --shard-iterator-type TRIM_HORIZON --stream-name results --query 'ShardIterator'
# copy shard iterator without quotes
aws kinesis get-records --endpoint-url=http://188.166.115.79:4566 --shard-iterator {SHARD_ITERATOR_HERE}
# decode Data based64
- Prometheus UI: http://188.166.115.79:8085/metrics
- Grafana Dashboard UI: http://188.166.115.79:3000/dashboards (user/pass: admin, admin)
- Reports folder: http://188.166.115.79:8888/ (only after first 3 hours after deployment)
- Github Actions pipeline: https://github.com/razorcd/mlops-project/actions
- Github Actions: add ssh keys from server: https://zellwk.com/blog/github-actions-deploy/