Is this page helpful?

NVIDIA KAI Scheduler Integration Guide#

KAI Scheduler is an open source Kubernetes Native scheduler for AI workloads at large scale. To use the KAI Scheduler for NVCF Workloads the following configuration should be applied post the installation of the KAI Scheduler in the cluster and the Optimized AI Workload Scheduling enabled on the cluster. NVCF Workloads deployed will be automatically BinPacked upon this cluster configuration changes.

KAI Scheduler Installation

Note

Upgrade to latest KAI Scheduler release is recommended to get latest fixes and security patches

Installation#

helm upgrade -i kai-scheduler oci://ghcr.io/nvidia/kai-scheduler/kai-scheduler -n kai-scheduler --create-namespace --version v0.12.6

KAI Scheduler Configuration

nvcf-kai-scheduler-config.yaml#

apiVersion: scheduling.run.ai/v2
kind: Queue
metadata:
    name: default-parent-queue
spec:
    resources:
        cpu:
            quota: -1
            limit: -1
            overQuotaWeight: 1
        gpu:
            quota: -1
            limit: -1
            overQuotaWeight: 1
        memory:
            quota: -1
            limit: -1
            overQuotaWeight: 1
---
apiVersion: scheduling.run.ai/v2
kind: Queue
metadata:
    name: default-queue
spec:
    parentQueue: default-parent-queue
    resources:
        cpu:
            quota: -1
            limit: -1
            overQuotaWeight: 1
        gpu:
            quota: -1
            limit: -1
            overQuotaWeight: 1
        memory:
            quota: -1
            limit: -1
            overQuotaWeight: 1

Save the configuration template to nvcf-kai-scheduler-config.yaml and apply as:

kubectl create -f nvcf-kai-scheduler-config.yaml