Skip to content

[bug] Pipelines generated from kfp 2.10 ignore accelerator #11374

@strangiato

Description

@strangiato

When executing or compiling a pipeline using the 2.10 kfp sdk with the following configuration:

task.set_accelerator_type("nvidia.com/gpu").set_accelerator_limit("1")

The pipeline server ignores the gpu option and is scheduled without the gpu in the resource configuration.

This appears to be a breaking change introduced in 2.10

Environment

  • How do you deploy Kubeflow Pipelines (KFP)?
    Red Hat OpenShift AI

  • KFP version:

  • KFP SDK version:

$ pip list | grep kfp  
kfp                      2.10.0
kfp-pipeline-spec        0.4.0
kfp-server-api           2.3.0

Steps to reproduce

  1. Create a python virtual environment
python -m venv venv
source venv/bin/activate
  1. Install kfp 2.10
pip install kfp==2.10
  1. Create the following pipeline with the file name acc-test.py
from kfp import dsl, compiler

@dsl.component()
def empty_component():
    pass

@dsl.pipeline(name='pipeline-accel')
def pipeline_accel():
    task = empty_component()
    task.set_accelerator_type("nvidia.com/gpu").set_accelerator_limit("1")

if __name__ == "__main__":
    compiler.Compiler().compile(pipeline_accel, 'pipeline.yaml')
  1. Compile the pipeline
python acc-test.py
  1. Upload the pipeline and trigger an execution.

The pods created for the step will not include the nvidia.com/gpu in the pod spec resources, and the pod will get scheduled on a non-gpu node.

Expected result

The pod should include the resources definition for the GPUs and the pod should be scheduled on a GPU node.

resources:
  limits:
    nvidia.com/gpu: 1
  requests:
    nvidia.com/gpu: 1

Materials and reference

It looks like the bug was likely introduced in:
#11097

When compiling the pipeline with 2.10 it renders the following:

        resources:
          accelerator:
            resourceCount: '1'
            resourceType: nvidia.com/gpu

With older version such as 2.9, it renders the following:

        resources:
          accelerator:
            count: '1'
            type: nvidia.com/gpu

Fix in progress:

#11373

Labels


Impacted by this bug? Give it a 👍.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions