New
Add-Ons
Troubleshooting
👁️ Overview
Introducing Workflows in Komodor, a powerful new feature designed to enhance visibility and troubleshooting for AI/ML workloads on Kubernetes. Accessible under the Kubernetes AddOns section, Workflows streamline monitoring and improve troubleshooting efficiency across various engines like ArgoWF and Airflow.
⭐️ Key Features
-
Out-of-the-Box Monitoring: Automatically tracks workflows from ArgoWF and Airflow without additional configuration.
Support for Custom Engines: Identify workflows from engines like MLFlow or Kubeflow by adding specific labels.
Workflow Pod Monitor: Runs on each cluster by default, automatically detecting and tracking workflow-related pods.
Organized Workflows Page: Navigate workflows via engine-specific tabs, with aggregation by DAG/Template showing the latest run status.
Detailed Run Information: Gain insights into pod phases, issues, and correlated infrastructure events like node terminations.
Please note: Workflow data is retained for 3 days, with options to view previous runs via a dropdown for each template.
🗣️ Motivation
Managing AI/ML workflows on Kubernetes can be challenging, particularly in complex environments where orphan pods are quickly deleted, leading to the loss of logs and data. The Workflows feature addresses these pain points by:
- Providing full visibility into workflows, including orphaned pods.
- Simplifying troubleshooting with automated issue detection and infrastructure correlation.
- Enabling streamlined operations for AI/ML workloads for data teams.
🚀 Getting Started
Explore the new Workflows feature under the Kubernetes AddOns section in your Komodor dashboard. Monitor and troubleshoot workflow issues effortlessly.
For more details, check out our documentation