Document how to do SLSA for ML and highlight gaps #978

MarkLodato · 2023-10-09T20:53:02Z

There have been some questions as to what "SLSA for ML" looks like. This issue attempts to give a short synopsis so that we can hopefully agree and turn that into durable documentation.

First, Machine Learning (ML) models fit the SLSA Model at a high level:

Any transformation process is a "build", such as data cleaning or model training.
Training data and input models are "dependencies".

This is not obvious to most readers, so we should document it.

Second, ML highlights some gaps or challenges in SLSA that are not really specific to ML but may be a higher priority or more painful for ML. They include:

ML training processes often use specialized ML hardware, highly distributed training jobs, and/or highly iterative notebooks like Colab. These may be more work to adapt to a verifiable build architecture where there is a trusted control plane that the tenant cannot influence. Alternatively, a reproducible build architecture may be challenging due to non-determinism.
Training data (considered "dependencies") is critical to the ML training process, yet:
- Standards for identifying and labeling datasets are less mature than they are for conventional software.
- SLSA Build track does not yet level setting a minimum completeness of dependencies (Workstream: SLSA Build L4 #977).
- SLSA does not yet have a transitive concept to describe properties of the entire ML supply chain, beyond a single build step. This is important for conventional software but critical for ML, since ML supply chains are very deep.

All of these are surmountable, but it's worth documenting.

Any thoughts in agreement or disagreement? I'll try to update this top post with the consensus. If you have other challenges, I can add them as well.

joshuagl · 2023-10-10T09:35:46Z

Thanks for the initial thoughts on this. This seems worth documenting indeed. I agree strongly with the assertion that any transformation process --> build and that data inputs seem to map well to dependencies.

github-project-automation bot added this to Issue triage Oct 9, 2023

github-project-automation bot moved this to 🆕 New in Issue triage Oct 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document how to do SLSA for ML and highlight gaps #978

Document how to do SLSA for ML and highlight gaps #978

MarkLodato commented Oct 9, 2023

joshuagl commented Oct 10, 2023

Document how to do SLSA for ML and highlight gaps #978

Document how to do SLSA for ML and highlight gaps #978

Comments

MarkLodato commented Oct 9, 2023

joshuagl commented Oct 10, 2023