kai-scheduler
by
nvidia

Description: KAI Scheduler is an open source Kubernetes Native scheduler for AI workloads at large scale

View nvidia/kai-scheduler on GitHub ↗

Summary Information

Updated 1 hour ago
Added to GitGenius on April 1st, 2025
Created on February 26th, 2025
Open Issues/Pull Requests: 109 (+3)
Number of forks: 154
Total Stargazers: 1,143 (+0)
Total Subscribers: 17 (+0)
Detailed Description

Kai-Scheduler is an open-source Kubernetes scheduler extension developed by NVIDIA, designed to optimize resource allocation for accelerated computing workloads, particularly those leveraging GPUs. It addresses limitations in the default Kubernetes scheduler when dealing with complex hardware topologies and diverse workload requirements common in AI, machine learning, and high-performance computing (HPC) environments. Instead of replacing the core Kubernetes scheduler, Kai-Scheduler functions as a pluggable, priority-based scheduler, allowing it to coexist and cooperate with the default scheduler, handling specific workload types while the default scheduler manages others.

The core problem Kai-Scheduler solves is efficient GPU resource allocation. The standard Kubernetes scheduler often struggles with effectively packing GPU workloads considering factors like GPU memory, interconnect bandwidth (like NVLink), and GPU affinity. Kai-Scheduler introduces a more sophisticated scheduling algorithm that understands these hardware characteristics and can make placement decisions that maximize GPU utilization and minimize communication overhead. It achieves this through a "filter and score" framework, similar to the default scheduler, but with custom filters and scoring functions tailored for accelerated workloads. Filters eliminate unsuitable nodes, and scoring functions rank the remaining nodes based on their suitability for the pod.

Key features of Kai-Scheduler include support for various GPU sharing technologies like Multi-Instance GPU (MIG) and virtual GPUs, allowing for finer-grained resource allocation and increased utilization.

kai-scheduler
by
nvidianvidia/kai-scheduler

Repository Details

Fetching additional details & charts...