NVIDIA Run:ai Installation#

The installation of NVIDIA Run:ai is done through the cm-kubernetes-setup tool included with BCM 11.

Additional details can be found in the official Run:ai documentation <https://run-ai-docs.nvidia.com/self-hosted/2.22>

NVIDIA Run:ai Deployment#

Node Categories#

In the NVIDIA Base Command Manager (BCM), a node category is a way to group nodes that share the same hardware profile and intended role. Defining node categories allows the system to assign the appropriate software image and configurations to each group during provisioning

Before installing NVIDIA Run:ai, make sure BCM node categories are created for: Kubernetes system nodes (for example, k8s-system-user) NVIDIA Run:ai GPU worker nodes (for example, dgx-gb200-k8s)

These will be employed when setting up Run:ai for the first time via the BCM setup assistant. More details and full instructions are available in the Run:ai BCM Install Getting Started Guide

NVIDIA Run:ai Validation#

To validate Run:ai please refer to the Run:ai usage guides for deploying single-GPU training jobs, multi-node training jobs, single-GPU inference jobs, and multi-GPU inference jobs.

Run Your First Run:ai Distributed Training

Run Your First Run:ai Custom Inference Workload