This machine learning pipeline tool focuses mainly on using Tensorflow Extended library to train machine learning model using data from various data storage.
Notebook files are stored in /notebooks folder
| File | Description |
|---|---|
| IEEE-CIS-Fraud-Detection-preprocessor.ipynb | pyspark preprocessor notebook |
| IEEE-CIS-Fraud-Detection-Train-TF.ipynb | Tensorflow extended model training and publishing code. |
| IEEE-CIS-Fraud-Detection-Score-Spark.ipynb | pyspark score notebook. |
The stack is deployed using docker and docker-compose. docker and docker-compose are prerequisite.
docker-compose -f sml.yml up -dThis pipeline used IEEE-CIS Fraud Detection data from kaggle. And in the first iteration it was able to achive considerable ok score.
- features were selected based on backward elemination technique.
Model improvement is beyond the scope of this repository.

