This article describes how to deploy AI models on AKS arc with AI toolchain operator (KAITO).
AKS Arc, or Azure Kubernetes Service (AKS) on Azure Arc, is a service that extends Azure management and governance capabilities to Kubernetes clusters running anywhere, whether on-premises, at the edge, or in other cloud environments. It allows you to manage and secure your Kubernetes clusters consistently across different environments using Azure tools and services.
Follow the workflow below to enable this feature.
Currently KAITO supports models such as Falcon, Phi2, Phi3, Llama2, Llama2Chat, Mistral. Please refer to KAITO’s readme file for the latest models.
- Before you begin, please make sure you have the following details from your infrastructure administrator:
- An AKS Arc cluster that's up and running.
- We recommend using Linux machine for this feature.
- Your local kubectl environment configured to point to your AKS cluster.
- Run
az aksarc get-credentials --resource-group <ResourceGroupName> --name <ClusterName> --admin
to download the kubeconfig file.
- Run
- Make sure your Azure Local cluster is enabled with GPU, you can ask your infrastructure administrator to set it up for you. You also need to identify the right VM SKUs for your AKS cluster before creating the node pool. The instruction can be found at use GPU for compute-intensive workloads.
- Make sure the helm and kubectl are installed in your local machine.
- If you need to install or upgrade, please see instruction from Install Helm.
- If you need to install kubectl, please see instructions from Install kubectl.
Available GPU sku can be found here
Using Az CLI
az aksarc nodepool add --name "samplenodepool" --cluster-name "samplecluster" --resource-group "sample-rg" –node-vm-size "samplenodepoolsize" –os-type "Linux"
Explain the get node command
The node created will have random name in format moc-<random>, and will have the following label to reference its nodepool: msft.microsoft/nodepool-name = <culsterName>-<randon>-<nodepoolName>.This command is to check the node whoes label can pass the regex and print its name
- After node pool creation command succeeds, you can confirm whether the GPU node is provisioned using following command, replace
<clusterName>
and<nodepoolName>
with real value.
kubectl get nodes -o json | jq -r '.items[] | select(.metadata.labels["msft.microsoft/nodepool-name"] != null and (.metadata.labels["msft.microsoft/nodepool-name"] | tostring | test("^<culsterName>-.*-<nodepoolName>$"))) | .metadata.name'
- Please also ensure the node has allocatable GPU cores using command
kubectl get node moc-l1i9uh0ksne -o yaml | grep -A 10 "allocatable:"
allocatable:
cpu: 31580m
ephemeral-storage: "95026644016"
hugepages-1Gi: "0"
hugepages-2Mi: "0"
memory: 121761176Ki
nvidia.com/gpu: "2"
pods: "110"
capacity:
cpu: "32"
ephemeral-storage: 103110508Ki
User does not need to install the gpu-provisioner as the node will be provisioned manually.
- Git clone the KAITO repo to your local machine
- Install KAITO operator using command
helm install workspace ./charts/kaito/workspace --namespace kaito-workspace --create-namespace
Explain the Yaml file
If a user runs Kaito in an on-premise Kubernetes cluster where nodepool auto provision are unavailable, the GPU nodes can be pre-configured.- the user needs to add the node names in the
preferredNodes
field in theresource
spec. As a result, the Kaito controller will skip the steps for GPU node provisioning and use the prepared nodes to run the inference workload.
Using the same method user can try all kaito functionalities, example can be found on /examples folder.
- Create a YAML file with the following template, make sure to replace the placeholders in curly braces with your own information. Please note, the PresetName can be found from the supported model file in KAITO’s github repo.
apiVersion: kaito.sh/v1alpha1
kind: Workspace
metadata:
name: { YourDeploymentName }
resource:
labelSelector:
matchLabels:
apps: { YourNodeLabel }
preferredNodes:
- { YourNodeName }
inference:
preset:
name: { PresetName }
a sample yaml file can be
apiVersion: kaito.sh/v1alpha1
kind: Workspace
metadata:
name: workspace-falcon-7b
resource:
labelSelector:
matchLabels:
apps: falcon-7b
preferredNodes:
- moc-lmkq7webq9z
inference:
preset:
name: falcon-7b-instruct
- You need to label your GPU node first,
Kubectl label node samplenode apps=YourNodeLabel
and then apply the YAML filekubectl apply -f sampleyamlfile.yaml
- Validate the workspace using the command
kubectl get workspace
. Please also make sure bothResourceReady
andInferenceReady
fields are True before testing with the sample prompt.
NAME INSTANCE RESOURCEREADY INFERENCEREADY JOBSTARTED WORKSPACESUCCEEDED AGE
workspace-falcon-7b Standard_NC6s_v3 True True True 4d
- You may test the model with a sample prompt:
export CLUSTERIP=$(kubectl get svc workspace-falcon-7b -o jsonpath="{.spec.clusterIPs[0]}")
kubectl run -it --rm --restart=Never curl --image=curlimages/curl -- curl -X POST http://$CLUSTERIP/chat -H "accept: application/json" -H "Content-Type: application/json" -d "{\"prompt\":\"<sample_prompt>\"}"
a sample output will be like
$ kubectl run -it --rm --restart=Never curl --image=curlimages/curl -- curl -X POST http://$CLUSTERIP/chat -H "accept: application/json" -H "Content-Type: application/json" -d "{\"prompt\":\"write a poem\"}"
If you don't see a command prompt, try pressing enter.
{"Result":"write a poem about the first day of school, the last day of school, or a day in school.\nThe first day of school\nI wake at dawn\nand think of the new day\nas the sun rises\nI am excited\nthe first day of school\nI wake at dawn\nand think of the new day\nas the sun rises\nI am excited\nthe last day of school\nI wake at dawn\nand think of the last day\nas the sun rises\nI am sad\nthe last day of school\nI wake at dawn\nand think of the last day\nas the sun rises\nI am sad\na day in school\nthe first day of school\nI wake at dawn\nand think of the new day\nas the sun rises\nI am excited\nthe last day of school\nI wake at dawn\nand think of the last day\nas the sun rises\nI am sad\na day in school\nI walk down the stairs"}pod "curl" deleted