-
Notifications
You must be signed in to change notification settings - Fork 2k
Description
When executing or compiling a pipeline using the 2.10 kfp sdk with the following configuration:
task.set_accelerator_type("nvidia.com/gpu").set_accelerator_limit("1")
The pipeline server ignores the gpu option and is scheduled without the gpu in the resource configuration.
This appears to be a breaking change introduced in 2.10
Environment
-
How do you deploy Kubeflow Pipelines (KFP)?
Red Hat OpenShift AI -
KFP version:
-
KFP SDK version:
$ pip list | grep kfp
kfp 2.10.0
kfp-pipeline-spec 0.4.0
kfp-server-api 2.3.0
Steps to reproduce
- Create a python virtual environment
python -m venv venv
source venv/bin/activate
- Install kfp 2.10
pip install kfp==2.10
- Create the following pipeline with the file name
acc-test.py
from kfp import dsl, compiler
@dsl.component()
def empty_component():
pass
@dsl.pipeline(name='pipeline-accel')
def pipeline_accel():
task = empty_component()
task.set_accelerator_type("nvidia.com/gpu").set_accelerator_limit("1")
if __name__ == "__main__":
compiler.Compiler().compile(pipeline_accel, 'pipeline.yaml')
- Compile the pipeline
python acc-test.py
- Upload the pipeline and trigger an execution.
The pods created for the step will not include the nvidia.com/gpu in the pod spec resources, and the pod will get scheduled on a non-gpu node.
Expected result
The pod should include the resources definition for the GPUs and the pod should be scheduled on a GPU node.
resources:
limits:
nvidia.com/gpu: 1
requests:
nvidia.com/gpu: 1
Materials and reference
It looks like the bug was likely introduced in:
#11097
When compiling the pipeline with 2.10 it renders the following:
resources:
accelerator:
resourceCount: '1'
resourceType: nvidia.com/gpu
With older version such as 2.9, it renders the following:
resources:
accelerator:
count: '1'
type: nvidia.com/gpu
Fix in progress:
Labels
Impacted by this bug? Give it a 👍.