Skip to content

An example project, showcasing a DVC pipeline using SageMaker SDK for data preparation and model training

Notifications You must be signed in to change notification settings

iterative/sagemaker-pipeline

Repository files navigation

sagemaker-pipeline

This repo takes the data processing and model training from https://github.com/aws-samples/amazon-sagemaker-immersion-day/blob/master/processing_xgboost.ipynb and converts it into a DVC pipeline. The code is minimally modified from the original notebook to modularize it into individual scripts and parametrize the s3 paths and training hyperparameters. To run it, modify the bucket and prefix paths in params.yaml and then use dvc repro or dvc exp run to execute the pipeline in SageMaker.

The pipeline has three stages:

  1. Prepare data from S3
  2. Run a preprocessing job using the Scikit Learn Processor
  3. Run a model training job using XGBoost

About

An example project, showcasing a DVC pipeline using SageMaker SDK for data preparation and model training

Topics

Resources

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages