NVIDIA Run:ai Installation#

The installation of NVIDIA Run:ai is done through the cm-kubernetes-setup tool included with BCM 11.

Detailed instructions can be found in the official Run:ai documentation.

NVIDIA Run:ai Deployment#

NIM Operator#

The NVIDIA NIM Operator automates the deployment and management of NVIDIA NIM microservices.

Installation#

You can install the operator directly through the Cluster Manager setup wizard:

Run the cm-kubernetes-setup wizard.
When you reach the Helm chart selection prompt, select NIM Operator.

For detailed configuration and usage instructions, please refer to the official documentation.

NVIDIA Dynamo#

Run:ai 2.23 introduces support for distributed inference via NVIDIA Dynamo. This enables optimized, multi-node scheduling for large-scale LLM inference.

Installation#

Download the Chart: Download the Dynamo Helm chart from NVIDIA NGC. This will provide the .tgz file required for the installation command.

Run Helm Install: Run the following command to install version 0.8.1:

helm install dynamo-platform dynamo-platform-0.8.1.tgz \
  --namespace dynamo \
  --create-namespace \
  --set "grove.enabled=true" \
  --set "kai-scheduler.enabled=true"

For further details on deployment and usage, please see the Dynamo Documentation.

Node Categories#

In the NVIDIA Base Command Manager (BCM), a node category is a way to group nodes that share the same hardware profile and intended role. Defining node categories allows the system to assign the appropriate software image and configurations to each group during provisioning.

Before installing NVIDIA Run:ai, make sure BCM node categories are created for:

Kubernetes nodes (for example, k8s-system-user)
NVIDIA Run:ai GPU worker nodes (for example, dgx-gb300-k8s)

These will be employed when setting up Run:ai for the first time via the BCM setup assistant. More details on node category configuration for Run:ai installation and full instructions are available in the Run:ai BCM Install Getting Started Guide.

NVIDIA Run:ai Validation#

To validate Run:ai please refer to the Run:ai usage guides for deploying training & inference jobs:

The distributed inference tutorials can also be used to validate multi-node inference: