KAI Scheduler Integration Guide

View as Markdown

KAI Scheduler is an open source Kubernetes Native scheduler for AI workloads at large scale. To use the KAI Scheduler for NVCF Workloads the following configuration should be applied post the installation of the KAI Scheduler in the cluster and the Optimized AI Workload Scheduling enabled on the cluster. NVCF Workloads deployed will be automatically BinPacked upon this cluster configuration changes.

KAI Scheduler Installation

Upgrade to latest KAI Scheduler release is recommended to get latest fixes and security patches

Create values.yaml with queue attributes:

kai-scheduler-queues.yaml
1defaultQueue:
2 createDefaultQueue: true
3 parentName: default-parent-queue
4 childName: default-queue
5 parentResources:
6 cpu:
7 quota: -1
8 limit: -1
9 overQuotaWeight: 1
10 gpu:
11 quota: -1
12 limit: -1
13 overQuotaWeight: 1
14 memory:
15 quota: -1
16 limit: -1
17 overQuotaWeight: 1
18 childResources:
19 cpu:
20 quota: -1
21 limit: -1
22 overQuotaWeight: 1
23 gpu:
24 quota: -1
25 limit: -1
26 overQuotaWeight: 1
27 memory:
28 quota: -1
29 limit: -1
30 overQuotaWeight: 1
$helm install kai-scheduler oci://ghcr.io/kai-scheduler/kai-scheduler/kai-scheduler -f values.yaml -n kai-scheduler --create-namespace --version v0.12.6