Kubernetes AI Toolchain Operator (Kaito)

Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. The target models are popular large open sourced inference models such as falcon and llama 2. Kaito has the following key differentiations compared to most of the mainstream model deployment methodologies built on top of virtual machine infrastructures.

Manage large model files using container images. A http server is provided to perform inference calls using the model library.
Avoid tuning deployment parameters to fit GPU hardware by providing preset configurations.
Auto-provision GPU nodes based on model requirements.
Host large model images in public Microsoft Container Registry(MCR) if the license allows.

Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.

Architecture

Kaito follows the classic Kubernetes Custom Resource Definition(CRD)/controller design pattern. User manages a workspace custom resource which describes the GPU requirements and the inference specification. Kaito controllers will automate the deployment by reconciling the workspace custom resource.

The above figure presents the Kaito architecture overview. Its major components consist of:

Workspace controller: It reconciles the workspace custom resource, creates machine (explained below) custom resources to trigger node auto provisioning, and creates the inference workload (deployment or statefulset) based on the model preset configurations.
Node provisioner controller: The controller's name is gpu-provisioner in Kaito helm chart. It uses the machine CRD originated from Karpenter to interact with the workspace controller. It integrates with Azure Kubernetes Service(AKS) APIs to add new GPU nodes to the AKS cluster. Note that the gpu-provisioner is not an open sourced component. It can be replaced by other controllers if they support Karpenter-core APIs.

Installation

The following guidance assumes Azure Kubernetes Service(AKS) is used to host the Kubernetes cluster .

Enable Workload Identity and OIDC Issuer features

The gpu-povisioner controller requires the workload identity feature to acquire the access token to the AKS cluster.

export RESOURCE_GROUP="myResourceGroup"
export MY_CLUSTER="myCluster"
az aks update -g $RESOURCE_GROUP -n $MY_CLUSTER --enable-oidc-issuer --enable-workload-identity --enable-managed-identity

Create an identity and assign permissions

The identity kaitoprovisioner is created for the gpu-povisioner controller. It is assigned Contributor role for the managed cluster resource to allow changing $MY_CLUSTER (e.g., provisioning new nodes in it).

export SUBSCRIPTION="mySubscription"
az identity create --name kaitoprovisioner -g $RESOURCE_GROUP
export IDENTITY_PRINCIPAL_ID=$(az identity show --name kaitoprovisioner -g $RESOURCE_GROUP --subscription $SUBSCRIPTION --query 'principalId' | tr -d '"')
export IDENTITY_CLIENT_ID=$(az identity show --name kaitoprovisioner -g $RESOURCE_GROUP --subscription $SUBSCRIPTION --query 'clientId' | tr -d '"')
az role assignment create --assignee $IDENTITY_PRINCIPAL_ID --scope /subscriptions/$SUBSCRIPTION/resourceGroups/$RESOURCE_GROUP/providers/Microsoft.ContainerService/managedClusters/$MY_CLUSTER  --role "Contributor"

Install helm charts

Two charts will be installed in $MY_CLUSTER: gpu-provisioner chart and workspace chart.

helm install workspace ./charts/kaito/workspace

export NODE_RESOURCE_GROUP=$(az aks show -n $MY_CLUSTER -g $RESOURCE_GROUP --query nodeResourceGroup | tr -d '"')
export LOCATION=$(az aks show -n $MY_CLUSTER -g $RESOURCE_GROUP --query location | tr -d '"')
export TENANT_ID=$(az account show | jq -r ".tenantId")
yq -i '(.controller.env[] | select(.name=="ARM_SUBSCRIPTION_ID"))       .value = env(SUBSCRIPTION)'        ./charts/kaito/gpu-provisioner/values.yaml
yq -i '(.controller.env[] | select(.name=="LOCATION"))                  .value = env(LOCATION)'            ./charts/kaito/gpu-provisioner/values.yaml
yq -i '(.controller.env[] | select(.name=="ARM_RESOURCE_GROUP"))        .value = env(RESOURCE_GROUP)'      ./charts/kaito/gpu-provisioner/values.yaml
yq -i '(.controller.env[] | select(.name=="AZURE_NODE_RESOURCE_GROUP")) .value = env(NODE_RESOURCE_GROUP)' ./charts/kaito/gpu-provisioner/values.yaml
yq -i '(.controller.env[] | select(.name=="AZURE_CLUSTER_NAME"))        .value = env(MY_CLUSTER)'          ./charts/kaito/gpu-provisioner/values.yaml
yq -i '(.settings.azure.clusterName)                                           = env(MY_CLUSTER)'          ./charts/kaito/gpu-provisioner/values.yaml
yq -i '(.workloadIdentity.clientId)                                            = env(IDENTITY_CLIENT_ID)'  ./charts/kaito/gpu-provisioner/values.yaml
yq -i '(.workloadIdentity.tenantId)                                            = env(TENANT_ID)'           ./charts/kaito/gpu-provisioner/values.yaml
helm install gpu-provisioner ./charts/kaito/gpu-provisioner

Create the federated credential

The federated identity credential between the managed identity kaitoprovisioner and the service account used by the gpu-provisioner controller is created.

export AKS_OIDC_ISSUER=$(az aks show -n $MY_CLUSTER -g $RESOURCE_GROUP --subscription $SUBSCRIPTION --query "oidcIssuerProfile.issuerUrl" | tr -d '"')
az identity federated-credential create --name kaito-federatedcredential --identity-name kaitoprovisioner -g $RESOURCE_GROUP --issuer $AKS_OIDC_ISSUER --subject system:serviceaccount:"gpu-provisioner:gpu-provisioner" --audience api://AzureADTokenExchange --subscription $SUBSCRIPTION

Then the gpu-provisioner can access the managed cluster using a trust token with the same permissions of the kaitoprovisioner identity. Note that before finishing this step, the gpu-provisioner controller pod will constantly fail with the following message in the log:

panic: Configure azure client fails. Please ensure federatedcredential has been created for identity XXXX.

The pod will reach running state once the federated credential is created.

Clean up

helm uninstall gpu-provisioner
helm uninstall workspace

Quick start

After installing Kaito, one can try following commands to start a faclon-7b inference service.

$ cat examples/kaito_workspace_falcon_7b.yaml
apiVersion: kaito.sh/v1alpha1
kind: Workspace
metadata:
  name: workspace-falcon-7b
resource:
  instanceType: "Standard_NC12s_v3"
  labelSelector:
    matchLabels:
      apps: falcon-7b
inference:
  preset:
    name: "falcon-7b"

$ kubectl apply -f examples/kaito_workspace_falcon_7b.yaml

The workspace status can be tracked by running the following command. When the WORKSPACEREADY column becomes True, the model has been deployed successfully.

$ kubectl get workspace workspace-falcon-7b
NAME                  INSTANCE            RESOURCEREADY   INFERENCEREADY   WORKSPACEREADY   AGE
workspace-falcon-7b   Standard_NC12s_v3   True            True             True             10m

Next, one can find the inference service's cluster ip and use a temporal curl pod to test the service endpoint in the cluster.

$ kubectl get svc workspace-falcon-7b
NAME                  TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)            AGE
workspace-falcon-7b   ClusterIP   <CLUSTERIP>           <none>        80/TCP,29500/TCP   10m

$ kubectl run -it --rm --restart=Never curl --image=curlimages/curl sh
~ $ curl -X POST http://<CLUSTERIP>/chat -H "accept: application/json" -H "Content-Type: application/json" -d "{\"prompt\":\"YOUR QUESTION HERE\"}"

Contributing

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

License

See LICENSE.

Code of Conduct

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Contact

"Kaito devs" [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 226 Commits
.github		.github
api/v1alpha1		api/v1alpha1
charts/kaito		charts/kaito
cmd		cmd
config		config
docker		docker
docs		docs
examples		examples
hack		hack
pkg		pkg
presets		presets
test/e2e		test/e2e
.dockerignore		.dockerignore
.gitignore		.gitignore
.golangci.yaml		.golangci.yaml
CODEOWNERS		CODEOWNERS
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE		LICENSE
Makefile		Makefile
PROJECT		PROJECT
README.md		README.md
SECURITY.md		SECURITY.md
go.mod		go.mod
go.sum		go.sum
goreleaser.yml		goreleaser.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Kubernetes AI Toolchain Operator (Kaito)

Architecture

Installation

Enable Workload Identity and OIDC Issuer features

Create an identity and assign permissions

Install helm charts

Create the federated credential

Clean up

Quick start

Contributing

Trademarks

License

Code of Conduct

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

bangqipropel/kaito

Folders and files

Latest commit

History

Repository files navigation

Kubernetes AI Toolchain Operator (Kaito)

Architecture

Installation

Enable Workload Identity and OIDC Issuer features

Create an identity and assign permissions

Install helm charts

Create the federated credential

Clean up

Quick start

Contributing

Trademarks

License

Code of Conduct

Contact

About

Resources

License

Code of conduct

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages