Komodor changelog
Komodor changelog
komodor.io

Workflows

 

New

 

Add-Ons

 

Troubleshooting

  

👁️ Overview

Introducing Workflows in Komodor, a powerful new feature designed to enhance visibility and troubleshooting for AI/ML workloads on Kubernetes. Accessible under the Kubernetes AddOns section, Workflows streamline monitoring and improve troubleshooting efficiency across various engines like ArgoWF and Airflow.

Workflows List Airflow for docs.png

⭐️ Key Features

  • Out-of-the-Box Monitoring: Automatically tracks workflows from ArgoWF and Airflow without additional configuration.

  • Support for Custom Engines: Identify workflows from engines like MLFlow or Kubeflow by adding specific labels.

  • Workflow Pod Monitor: Runs on each cluster by default, automatically detecting and tracking workflow-related pods.

  • Organized Workflows Page: Navigate workflows via engine-specific tabs, with aggregation by DAG/Template showing the latest run status.

  • Detailed Run Information: Gain insights into pod phases, issues, and correlated infrastructure events like node terminations.

Please note: Workflow data is retained for 3 days, with options to view previous runs via a dropdown for each template.

wf with correlated node termination for docs.png

🗣️ Motivation

Managing AI/ML workflows on Kubernetes can be challenging, particularly in complex environments where orphan pods are quickly deleted, leading to the loss of logs and data. The Workflows feature addresses these pain points by:

  • Providing full visibility into workflows, including orphaned pods.
  • Simplifying troubleshooting with automated issue detection and infrastructure correlation.
  • Enabling streamlined operations for AI/ML workloads for data teams.

🚀 Getting Started

Explore the new Workflows feature under the Kubernetes AddOns section in your Komodor dashboard. Monitor and troubleshoot workflow issues effortlessly.

For more details, check out our documentation