BYOC Requirements

To bring your own compute to DGX Cloud Lepton, ensure your machines meet the following requirements.

Software Requirements

Since BYOC in DGX Cloud Lepton follows the standard of NVIDIA DGX OS 6 and NVIDIA DGX OS 7, your machines must meet the following requirements.

Operating System

  • Your machine must run Ubuntu 22.04 LTS or a newer LTS version with a corresponding kernel version. For example, you can use Ubuntu 22.04 LTS with either kernel 5.15 or 6.8 since they are both compatible with 22.04 LTS. Refer to the official Ubuntu kernel release cycle for more details.
  • You must have root access to your machine.
  • Swap memory must be disabled.
  • Package auto-updates must be disabled.

CUDA Toolkit

  • For CUDA toolkit, we recommend using version ≥ 12.4.1 with NVCC installed.

NVIDIA Driver and NVIDIA Fabric Manager

Standard Boot Option

If you select the Standard boot option for your machine (on Azure, this is called Security Option), DGX Cloud Lepton can automatically install the 550.144.03 NVIDIA driver with NVIDIA Fabric Manager for you.

Note

If NVIDIA driver and NVIDIA Fabric Manager are already installed on your machine, DGX Cloud Lepton will not install them again.

Trusted Launch Boot Option

If you want to use the Trusted Launch boot option, you need to install the NVIDIA driver and NVIDIA Fabric Manager manually.

  • Release 550 family is recommended, especially version 550.144.03 or newer.
  • You can also use release 535 family, with version ≥ 535.230.02.
  • If you are using a system with NVSwitch, you must have NVIDIA Fabric Manager installed with the same version as the driver.

System Libraries

  • glibc version ≥ GLIBC_2.32

Hardware Requirements

Storage Configuration

For best performance and stability, we recommend the following storage configuration (local to each GPU node):

  • Root: NVMe SSD in RAID-1 configuration (post-RAID capacity of at least 1 TB).
  • Data: At least 20 TB of NVMe SSD (can be multiple disks).

Additionally, here is a minimum requirement for storage with the following three options:

  1. Single OS Disk Setup

    • Minimum 640GB storage per GPU on the OS disk
  2. OS Disk + Data Volume Setup

    • OS disk: Minimum 128GB
    • Data volume: Minimum 640GB per GPU (e.g., using LVM)
  3. OS Disk + Multiple Data Disks Setup

    • OS disk: Minimum 128GB
    • Total data disks: Minimum 640GB per GPU
    • Note: Lepton will automatically combine all data disks into a single volume
Note

It is recommended to have larger storage space for training clusters.

CPU Requirements

We recommend having at least eight physical CPU cores per GPU, for example, AMD EPYC 9004 Series or Intel 4th or 5th Gen Xeon CPUs.

Memory

  • Minimum 256GB RAM per GPU
  • ECC (Error-Correcting Code) support is recommended.

Network Configuration

Network Requirements

  • All outbound traffic must be allowed
  • DNS server must be properly configured
  • A dedicated public IP address per machine is preferred
Note

If you want to use Dev Pod on DGX Cloud Lepton, you need to open ports from 40000 to 65535 for your machines.

RDMA Configuration

For multi-node training workloads, you need to follow Cloud provider guidelines to set up RDMA for east/west traffic.

IP Address Restrictions

Avoid using or relying on IP addresses from the following CIDR ranges:

  • 10.50.0.0/16
  • 172.20.0.0/16
  • 100.64.0.0/10

Performance Verification

Your machines must pass the NCCL test provided by NVIDIA.

Once you've ensured your machines meet the requirements, you can refer to this guide to add them to DGX Cloud Lepton.

Copyright @ 2025, NVIDIA Corporation.