BYOC Requirements

To bring your own compute to DGX Cloud Lepton, please make sure your machines have met the following requirements.

Software Requirements

As BYOC in DGX Cloud Lepton follows the standard of NVIDIA DGX OS 6 and NVIDIA DGX OS 7, you need to make sure your machines have met the following requirements.

Operating System

  • Your machine needs to be running on Ubuntu 22.04 LTS or a newer LTS version with a corresponding kernel version. For example, you can use Ubuntu 22.04 LTS with either kernel 5.15 or 6.8 since they are both fit for 22.04 LTS. Refer to the official Ubuntu kernel release cycle for more details.
  • You need to have root access to your machine.
  • Swap memory needs to be disabled.
  • Package auto-updates need to be disabled.

CUDA Toolkit

  • For CUDA toolkit, we recommend to use version ≥ 12.4.1 with NVCC installed.

NVIDIA Driver and NVIDIA Fabric Manager

Standard Boot Option

If you select the Standard boot option for your machine (on Azure, this is called Security Option), DGX Cloud Lepton can automatically install the 550.144.03 NVIDIA driver with NVIDIA Fabric Manager for you.

Note

If NVIDIA driver and NVIDIA Fabric Manager are already installed on your machine, DGX Cloud Lepton will not install them again.

Trusted Launch Boot Option

If you want to use the Trusted Launch boot option, you need to install the NVIDIA driver and NVIDIA Fabric Manager manually.

  • Release 550 family is recommended, and especially the 550.144.03 version or newer versions.
  • You can also use release 535 family, with version ≥ 535.230.02.
  • If you are using a system with NVSwitch, you need to have NVIDIA Fabric Manager installed with the same version as the driver.

System Libraries

  • glibc version ≥ GLIBC_2.32

Hardware Requirements

Storage Configuration

For best performance and stability, we recommend the following storage configuration (Local to Each GPU Node).

  • Root: NVMe SSD in RAID-1 configuration (post-RAID capacity of at least 1 TB).
  • Data: At least 20 TB of NVMe SSD (can be multiple disks).

And also, here is a minimum request for storage with the following three options:

  1. Single OS Disk Setup

    • Minimum 640GB storage per GPU on the OS disk
  2. OS Disk + Data Volume Setup

    • OS disk: Minimum 128GB
    • Data volume: Minimum 640GB per GPU (e.g., using LVM)
  3. OS Disk + Multiple Data Disks Setup

    • OS disk: Minimum 128GB
    • Total data disks: Minimum 640GB per GPU
    • Note: Lepton will automatically combine all data disks into a single volume
Note

It is recommended to have a larger storage space for training clusters.

CPU Requirements

Recommended to have at least 8 physical CPU cores per GPU, for example, AMD EPYC 9004 Series or Intel 4th or 5th Gen Xeon CPUs.

Memory

  • Minimum 256GB RAM per GPU
  • Recommended to have ECC (Error-Correcting Code) support.

Network Configuration

Network Requirements

  • All outbound traffic must be allowed
  • DNS server must be properly configured
  • Preference for a dedicated public IP address per machine
Note

If you want to use Dev Pod on DGX Cloud Lepton, you need to open ports from 40000 to 65535 for your machines.

RDMA Configuration

For multi-node training workload, you need to follow Cloud provider guidelines to set up RDMA for east/west traffic

IP Address Restrictions

Avoid using or relying on IP addresses from the following CIDR ranges:

  • 10.50.0.0/16
  • 172.20.0.0/16
  • 100.64.0.0/10

Performance Verification

Your machines need to pass the NCCL test provided by NVIDIA.

Once you've made sure your machines meet the requirements, you can refer to this guide to bring them to DGX Cloud Lepton.

Copyright @ 2025, NVIDIA Corporation.