Skip to content

awslabs/data-on-eks

Data on EKS

(Pronounced: "Do.eks")

💡 Optimized Solutions for Data and AI on EKS

Build, Scale, and Optimize Data & AI/ML Platforms on Amazon EKS 🚀

Welcome to Data on EKS, your gateway to scaling Data and AI workloads on Amazon EKS. Unlock the potential of Gen AI with a rich collection of Terraform Blueprints featuring best practices for deploying robust solutions with advanced logging and observability.

Explore practical examples and patterns for running Data workloads on EKS using advanced frameworks such as Apache Spark for distributed data processing, Apache Flink for real-time stream processing, and Apache Kafka for high-throughput distributed messaging. Automate and orchestrate complex workflows with Apache Airflow and leverage the robust capabilities of Amazon EMR on EKS to build resilient clusters, seamlessly integrating Kubernetes with big data solutions for enhanced scalability and performance.

On the AI/ML front, Explore practical patterns for running AI/ML workloads on EKS, leveraging the power of the Ray ecosystem for distributed computing. Utilize advanced serving solutions like NVIDIA Triton Server, vLLM for efficient and scalable model inference, and TensorRT-LLM for optimizing deep learning models.

Take advantage of high-performance NVIDIA GPUs for intensive computational tasks and leverage AWS’s specialized hardware, including AWS Trainium for efficient model training and AWS Inferentia for cost-effective model inference at scale.

Note: DoEKS is in active development. For upcoming features and enhancements, check out the issues section.

🏗️ Architecture

The diagram below showcases the wide array of open-source data tools, Kubernetes operators, and frameworks used by DoEKS. It also highlights the seamless integration of AWS Data Analytics managed services with the powerful capabilities of DoEKS open-source tools.

image

🌟 Features

Data on EKS(DoEKS) solution is categorized into the following focus areas.

🎯 Data Analytics on EKS

🎯 AI/ML on EKS

🎯 Streaming Platforms on EKS

🎯 Scheduler Workflow Platforms on EKS

🎯 Distributed Databases & Query Engine on EKS

🏃‍♀️Getting Started

In this repository, you'll find a variety of deployment blueprints for creating Data/ML platforms with Amazon EKS clusters. These examples are just a small selection of the available blueprints - visit the DoEKS website for the complete list of options.

🧠 AI

🚀 Trainium-Inferentia on EKS 👈 This blueprint used for running Gen AI models on AWS Neuron accelerators.

🚀 JARK-Stack on EKS 👈 This blueprint deploys JARK stack for AI workloads with NVIDIA GPUs.

🚀 JupyterHub on EKS 👈 This blueprint deploys a self-managed JupyterHub on EKS with Amazon Cognito authentication.

🚀 Generative AI on EKS 👈 Collection of Generative AI Training and Inference LLM deployment patterns

📊 Data

🚀 EMR-on-EKS with Karpenter 👈 Start here if you are new to EMR on EKS. This blueprint deploys EMR on EKS cluster and uses Karpenter to scale Spark jobs.

🚀 Spark Operator with Apache YuniKorn on EKS 👈 This blueprint deploys EKS cluster and uses Spark Operator and Apache YuniKorn for running self-managed Spark jobs

🚀 Self-managed Airflow on EKS 👈 This blueprint sets up a self-managed Apache Airflow on an Amazon EKS cluster, following best practices.

🚀 Argo Workflows on EKS 👈 This blueprint sets up a self-managed Argo Workflow on an Amazon EKS cluster, following best practices.

🚀 Kafka on EKS 👈 This blueprint deploys a self-managed Kafka on EKS using the popular Strimzi Kafka operator.

📚 Documentation

For instructions on how to deploy Data on EKS patterns and run sample tests, visit the DoEKS website.

🏆 Motivation

Kubernetes is a widely adopted system for orchestrating containerized software at scale. As more users migrate their data and machine learning workloads to Kubernetes, they often face the complexity of managing the Kubernetes ecosystem and selecting the right tools and configurations for their specific needs.

At AWS, we understand the challenges users encounter when deploying and scaling data workloads on Kubernetes. To simplify the process and enable users to quickly conduct proof-of-concepts and build production-ready clusters, we have developed Data on EKS (DoEKS). DoEKS offers opinionated open-source blueprints that provide end-to-end logging and observability, making it easier for users to deploy and manage Spark on EKS, Kubeflow, MLFlow, Airflow, Presto, Kafka, Cassandra, and other data workloads. With DoEKS, users can confidently leverage the power of Kubernetes for their data and machine learning needs without getting overwhelmed by its complexity.

🤝 Support & Feedback

DoEKS is maintained by AWS Solution Architects and is not an AWS service. Support is provided on a best effort basis by the Data on EKS Blueprints community. If you have feedback, feature ideas, or wish to report bugs, please use the Issues section of this GitHub.

🔐 Security

See CONTRIBUTING for more information.

💼 License

This library is licensed under the Apache 2.0 License.

🙌 Community

We welcome all individuals who are enthusiastic about data on Kubernetes to become a part of this open source community. Your contributions and participation are invaluable to the success of this project.

Built with ❤️ at AWS.