You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There have been some questions as to what "SLSA for ML" looks like. This issue attempts to give a short synopsis so that we can hopefully agree and turn that into durable documentation.
First, Machine Learning (ML) models fit the SLSA Model at a high level:
Any transformation process is a "build", such as data cleaning or model training.
Training data and input models are "dependencies".
This is not obvious to most readers, so we should document it.
Second, ML highlights some gaps or challenges in SLSA that are not really specific to ML but may be a higher priority or more painful for ML. They include:
ML training processes often use specialized ML hardware, highly distributed training jobs, and/or highly iterative notebooks like Colab. These may be more work to adapt to a verifiable build architecture where there is a trusted control plane that the tenant cannot influence. Alternatively, a reproducible build architecture may be challenging due to non-determinism.
Training data (considered "dependencies") is critical to the ML training process, yet:
Standards for identifying and labeling datasets are less mature than they are for conventional software.
SLSA does not yet have a transitive concept to describe properties of the entire ML supply chain, beyond a single build step. This is important for conventional software but critical for ML, since ML supply chains are very deep.
All of these are surmountable, but it's worth documenting.
Any thoughts in agreement or disagreement? I'll try to update this top post with the consensus. If you have other challenges, I can add them as well.
The text was updated successfully, but these errors were encountered:
Thanks for the initial thoughts on this. This seems worth documenting indeed. I agree strongly with the assertion that any transformation process --> build and that data inputs seem to map well to dependencies.
There have been some questions as to what "SLSA for ML" looks like. This issue attempts to give a short synopsis so that we can hopefully agree and turn that into durable documentation.
First, Machine Learning (ML) models fit the SLSA Model at a high level:
This is not obvious to most readers, so we should document it.
Second, ML highlights some gaps or challenges in SLSA that are not really specific to ML but may be a higher priority or more painful for ML. They include:
All of these are surmountable, but it's worth documenting.
Any thoughts in agreement or disagreement? I'll try to update this top post with the consensus. If you have other challenges, I can add them as well.
The text was updated successfully, but these errors were encountered: