DGX OS 6 / Ubuntu 22.04
The NVIDIA DGX OS 6 User Guide is also available as a PDF.
- About DGX OS 6
- Release Guidance
- Release Notes
- Initial Setup
- Reimaging the System
- Installing DGX Software on Ubuntu
- Prerequisites
- Installation Considerations
- Installing Ubuntu
- Installing the DGX Software Stack
- Installing DGX System Configurations and Tools
- Configuring Data Drives
- Installing the GPU Driver
- Installing the Mellanox OpenFabrics Enterprise Distribution (MLNX_OFED)
- Installing Docker and the NVIDIA Container Toolkit
- Installing the NVIDIA System Management (NVSM) Tool [Recommended]
- Additional Software Installed By DGX OS
- Next Steps and Additional Information
- Upgrading the OS
- System Configurations
- Network Configuration
- Configuring ConnectX from InfiniBand to Ethernet
- Docker Configuration
- Managing CPU Mitigations
- Managing the DGX Crash Dump Feature
- Connecting to Serial Over LAN
- Filesystem Quotas
- Running Workloads on Systems with Mixed Types of GPUs
- Using Multi-Instance GPUs
- Updating the containerd Override File for MIG configurations
- Data Storage Configuration
- Running NGC Containers
- Installing ConnectX-7 Firmware
- Managing Self-Encrypting Drives
- Overview
- Installing the Software
- Configuring Trusted Computing
- Initializing the System for Drive Encryption
- Enabling Drive Locking
- Initialization Examples
- Disabling Drive Locking
- Enabling Drive Locking
- Exporting the Vault
- Erasing Your Data
- Clearing the TPM
- Changing Disk Passwords, Adding Disks, or Replacing Disks
- Recovering From Lost Keys
- Managing and Upgrading Software
- Known Issues
- DGX System Device ID Not Found in /usr/share/misc/pci.ids
- Virtualization Not Supported
- Excessive Growth of OpenSM Log Causing DGX Systems to Become Inoperable
- Incorrect DCGM Version After Upgrade from 5.X to 6.2.0
- Errors Occur When Loading Mirrored Repositories on Air-Gapped Systems
- Reduced Network Communication Speeds on DGX H100 System
- NVSM Raises Alerts for Missing Devices on DGX H100 System
- DGX A800 Station/Server: mig-parted config
- Erroneous Insufficient Power Error May Occur for PCIe Slots
- Applications that call the cuCTXCreate API Might Experience a Performance Drop
- Incorrect nvidia-container-toolkit version after upgrade from 5.X to 6.0
- UBSAN error and mstconfig stack dump in kernel logs at boot
- The BMC Redfish interface is not active on first boot after installation
- DGX OS Connectivity Requirements
- DGX Software Stack
- PXE Boot Setup
- Pre-requisites
- Overview of the PXE Server
- Mount the BaseOS 6.0.0 ISO
- Configure the TFTP directory
- Parameters unique to the Base OS installer
- Configure DHCP
- Optional: Configure CX-4/5/6/7 cards to PXE boot
- Query UEFI PXE ROM state
- MOFED Instructions
- Optional: Configure the DGX-Server to PXE boot automatically
- Configure network boot priorities
- Make the DGX-Server PXE boot
- Other IPMI boot options
- Autoinstall Customizations
- NVIDIA-Specific Autoinstall Variables
- Common Customizations
- Network Configuration
- Creating a User
- Air-Gapped Installations
- Cloud-init Configuration File
- Installing Docker Containers