Setup the Prerequisites#

Supported Platforms#

The following Nvidia GPUs are supported using the default helm chart:

  • 8 x H100 (80 GB)

  • 8 x A100 (80 GB)

  • 8 x L40S (48 GB)

Prerequisites#

  • 256+ GB system memory

  • Ubuntu 22.04

  • NVIDIA driver 535.161.08 (Recommended minimum version)

  • CUDA 12.2+ (CUDA driver installed with NVIDIA driver)

  • Kubernetes v1.31.2

  • NVIDIA GPU Operator v23.9

  • Helm v3.x

  • NGC API Key

Note

GPU operator related errors with microk8s observed with certain newer NVIDIA driver versions.

Install the NVIDIA Driver#

  1. Download and install the NVIDIA driver 535.161.08 from NVIDIA Unix drivers page at:

    https://www.nvidia.com/Download/driverResults.aspx/222416/en-us/

  2. Run the following commands:

    chmod 755 NVIDIA-Linux-x86_64-535.161.08.run
    sudo ./NVIDIA-Linux-x86_64-535.161.08.run --no-cc-version-check
    

Installing a Kubernetes Cluster#

Use the following commands to install a microk8s cluster on Ubuntu 22.04 on a single 8xH100 node:

# Install microk8s
sudo snap install microk8s --classic
# Enable nvidia and hostpath-storage add-ons
sudo microk8s enable nvidia
sudo microk8s enable hostpath-storage
# Install kubectl
sudo snap install kubectl --classic
# Verify microk8s is installed correctly
sudo microk8s kubectl get pod -A

Note

To join the group for admin access, avoid using sudo, and other information about microk8s setup/usage, please check: https://microk8s.io/docs/getting-started. Make sure sudo microk8s kubectl get pod -A shows all nodes in Running or Completed Status. This may take some time.

For a two node setup, run the following command on the control plane node:

sudo microk8s add-node

Run the following commands on the second worker node. Use the join string from the above command when joining the cluster.

# Install microk8s
sudo snap install microk8s --classic
# Enable nvidia and hostpath-storage add-ons
sudo microk8s enable nvidia
sudo microk8s enable hostpath-storage
sudo microk8s join <JOIN_STRING>  # This may take a few seconds to complete.

Obtain NGC API Key#

To obtain the required NGC API Key, follow the steps in:

Next, deploy the VSS Blueprint.