BYOC Requirements

Learn about the requirements for bringing your own compute to DGX Cloud Lepton.

To bring your own compute to DGX Cloud Lepton, ensure your machines meet the following requirements.

Software Requirements

Since BYOC in DGX Cloud Lepton follows the standards of NVIDIA DGX OS 6 and NVIDIA DGX OS 7, your machines must meet the following requirements.

Operating System

  • Your machine must run Ubuntu 22.04 LTS or a newer LTS version with a corresponding kernel version. For example, you can use Ubuntu 22.04 LTS with either kernel 5.15 or 6.8 since they are both compatible with 22.04 LTS. Refer to the official Ubuntu kernel release cycle for more details.
  • You must have root access to your machine.
  • Swap memory must be disabled.
  • Package auto-updates must be disabled.

CUDA Toolkit

  • For CUDA toolkit, we recommend version 12.4.1 or later with NVCC installed.

NVIDIA Driver and NVIDIA Fabric Manager

Standard Boot Option

If you select the Standard boot option for your machine (on Azure, this is called Security Option), DGX Cloud Lepton can automatically install the 550.144.03 NVIDIA driver with NVIDIA Fabric Manager for you.

If NVIDIA driver and NVIDIA Fabric Manager are already installed on your machine, DGX Cloud Lepton will not install them again.

Trusted Launch Boot Option

If you want to use the Trusted Launch boot option, you need to install the NVIDIA driver and NVIDIA Fabric Manager manually.

  • Release 550 family is recommended, especially version 550.144.03 or newer.
  • You can also use release 535 family, with version ≥ 535.230.02.
  • If you are using a system with NVSwitch, you must have NVIDIA Fabric Manager installed with the same version as the driver.

System Libraries

  • glibc version ≥ GLIBC_2.32

Software Packages

DGX Cloud Lepton will install GPUd, containerd, kubelet, and tailscale packages during the BYOC installation process.

If you have your own version of these packages already installed on the system, the BYOC installation process can fail.

NVIDIA recommends removing these packages from the system prior to adding the system to DGX Cloud Lepton.

Hardware Requirements

Storage Configuration

For best performance and stability, we recommend the following storage configuration (local to each GPU node):

  • Root: NVMe SSD in RAID-1 configuration (post-RAID capacity of at least 1 TB).
  • Data: At least 20 TB of NVMe SSD (can be multiple disks).

Additionally, the following are minimum storage requirements with three configuration options:

  1. Single OS Disk Setup
    • Minimum 640GB storage per GPU on the OS disk
  2. OS Disk + Data Volume Setup
    • OS disk: Minimum 128GB
    • Data volume: Minimum 640GB per GPU (e.g., using LVM)
  3. OS Disk + Multiple Data Disks Setup
    • OS disk: Minimum 128GB
    • Total data disks: Minimum 640GB per GPU
    • Note: DGX Cloud Lepton will automatically combine all data disks into a single volume.

Larger storage space is recommended for training clusters.

CPU Requirements

We recommend having at least eight physical CPU cores per GPU, for example, AMD EPYC 9004 Series or Intel 4th or 5th Gen Xeon CPUs.

Memory

  • Minimum 256GB RAM per GPU
  • ECC (Error-Correcting Code) support is recommended.

Network Configuration

Network Requirements

  • All outbound traffic must be allowed
  • DNS server must be properly configured
  • A dedicated public IP address per machine is preferred

If you want to use Dev Pod on DGX Cloud Lepton, you need to open ports 40000 to 65535 for your machines.

RDMA Configuration

For multi-node training workloads, follow your cloud provider's guidelines to set up RDMA for east/west traffic.

IP Address Restrictions

Avoid using or relying on IP addresses from the following CIDR ranges:

  • 10.50.0.0/16
  • 172.20.0.0/16
  • 100.64.0.0/10

Performance Verification

Your machines must pass the NCCL test provided by NVIDIA.

After ensuring your machines meet these requirements, refer to this guide to add them to DGX Cloud Lepton.

Copyright @ 2025, NVIDIA Corporation.