Automate the provisioning of a bare-metal multi-node Kubernetes cluster with Ansible. Uses all the industry-standard tools for an enterprise-grade cluster. Overview on how this cluster can be used for embeddings using GPU Nodes with promising auto scalability based on events and performance monitoring.
- Ubuntu: Tested with Ubuntu 22.04
- Ansible: an open source IT automation engine.
- ContainerD: an industry-standard container runtime.
- Kubernetes: an open-source system for automating deployment, scaling, and management of containerized applications.
- Calico: an open source networking and network security solution for containers (CNI).
- MetalLB: a bare metal load-balancer for Kubernetes.
- Nginx: an Ingress controller.
- Cert-Manager: adds certificates and certificate issuers as resource types in Kubernetes cluster.
- A Linux machine with a superuser privileges and pre-installed Ansible.
$ run ./setup-user-ansible.sh
- Ubuntu machines that are intended to become part of the new Kubernetes cluster. Make sure that your SSH key is already installed on the machines by running the following command:
$ ssh@<vm-machines> and generate ssh-keygen
$ ssh-copy-id <The remote username>@<The IPv4 address of the remote machine>
- Install Ansible to remote machines
$ sudo apt-get update
$ sudo apt-get install -y ansible
- Clone this Git repository to your local working station:
$ git clone https://github.com/vijyantg/kubeadm-cluster-with-ansible.git
- Change directory to the root directory of the project:
$ cd kubeadm-cluster-with-ansible
- Edit the values of the default variables to your requirements:
$ vi defaults/main.yaml
- Edit the Ansible inventory file to your requirements:
$ vi inventory/hosts.ini
- Run the Ansible Playbook:
$ ansible-playbook -i inventory/hosts.ini -K playbooks/cluster.yaml
-
Preparing for GPU Workload Deployment Once you add the GPU nodes to running cluster with pre-requisites like helm,git.,etc. To deploy GPU workloads on a Kubernetes cluster, you need to set up the nodes to recognize and utilize GPU resources effectively. You can run nvidia-smi test if your device is visible or not. This GPU installation outlines the steps to install GPU drivers and necessary tools using GPU operator ensuring your Kubernetes cluster is prepared for GPU-based workloads.
-
Scaling Considerations: Use Kubernetes Horizontal Pod Autoscaler to scale the deployment based on CPU/GPU utilization. First app can be deployed on the node allocated then you can use KEDA autoscaler to define ScaledObjects and ScaledJobs for your embedding processes, specifying the triggers (e.g., API requests or queue length) and the target deployment to scale. Ensure that the cluster has enough GPU nodes to handle scaling efficiently. You can configure KEDA to use custom metrics like GPU utilization, which allows for more granular control over scaling GPU workloads.
-
Performance Optimization: Optimize containers for GPU usage by fine-tuning the code and selecting appropriate libraries that utilize GPU acceleration. Consider deploying a GPU monitoring stack (e.g., Prometheus and Grafana) to visualize and optimize GPU performance in real-time.
-
References GPU-Devices