DGX Software Stack
NVIDIA DGX Software Packages
The following tables list the packages installed as part of the DGX Software Stack, broken out by metapackage name and platform.
DGX A100 and DGX A800 |
DGX Station A100 and DGX Station A800 |
DGX H100/H200 and DGX H800 |
---|---|---|
dgx-a100-system-configurations
dgx-a800-system-configurations
|
dgxstation-a100-system-configurations
dgxstation-a800-system-configurations
|
dgx-h100-system-configurations
dgx-h200-system-configurations
dgx-h800-system-configurations
|
dgx-release |
dgx-release |
dgx-release |
nv-cpu-governor |
nv-cpu-governor |
nv-cpu-governor |
nv-hugepage |
nv-hugepage |
nv-hugepage |
nv-iommu-pt |
nv-iommu-pt |
nv-iommu-pt |
nv-ipmi-devintf |
nv-ipmi-devintf |
nv-ipmi-devintf |
nv-limits |
nv-limits |
nv-limits |
nv-update-disable |
nv-update-disable |
nv-update-disable |
nvgpu-services-list |
nvgpu-services-list |
nvgpu-services-list |
nvidia-acs-disable |
nvidia-acs-disable |
|
nvidia-crashdump |
nvidia-crashdump |
nvidia-crashdump |
nvidia-disable-opensm |
||
nvidia-esm-hook-epilogue |
nvidia-esm-hook-epilogue |
nvidia-esm-hook-epilogue |
nvidia-fs-loader |
nvidia-fs-loader |
nvidia-fs-loader |
nvidia-kbd-udev |
||
nvidia-kernel-defaults |
nvidia-kernel-defaults |
nvidia-kernel-defaults |
nvidia-mlnx-ofed-netdev-rename |
||
nvidia-nvme-smartd |
nvidia-nvme-smartd |
nvidia-nvme-smartd |
nvidia-pci-bridge-power |
nvidia-pci-bridge-power |
nvidia-pci-bridge-power |
nvidia-pci-no-realloc |
||
nvidia-redfish-config |
nvidia-redfish-config |
nvidia-redfish-config |
nvidia-relaxed-ordering-gpu |
nvidia-relaxed-ordering-gpu |
|
nvidia-relaxed-ordering-nvme |
nvidia-relaxed-ordering-nvme |
nvidia-relaxed-ordering-nvme |
dgx-a100-system-tools
dgx-a800-system-tools
|
dgxstation-a100-system-tools
dgxstation-a800-system-tools
|
dgx-h100-system-tools
dgx-h200-system-tools
dgx-h800-system-tools
|
dgx-release |
dgx-release |
dgx-release |
ipmitool |
ipmitool |
ipmitool |
nv-common-apis |
nv-common-apis |
nv-common-apis |
nv-env-paths |
nv-env-paths |
nv-env-paths |
nvdebug |
||
nvidia-mig-manager |
nvidia-mig-manager |
nvidia-mig-manager |
nvidia-raid-config |
nvidia-raid-config |
nvidia-raid-config |
nvme-cli |
nvme-cli |
nvme-cli |
tpm2-tools |
tpm2-tools |
tpm2-tools |
dgx-a100-system-tools-extra
dgx-a800-system-tools-extra
|
dgxstation-a100-system-tools-extra
dgxstation-a800-system-tools-extra
|
dgx-h100-system-tools-extra
dgx-h200-system-tools-extra
dgx-h800-system-tools-extra
|
msecli |
msecli |
msecli |
DGX-1 |
DGX-2 |
DGX Station |
---|---|---|
dgx1-system-configurations |
dgx2-system-configurations |
dgxstation-system-configurations |
dgx-release |
dgx-release |
dgx-release |
nv-ast-modeset |
||
nv-cpu-governor |
nv-cpu-governor |
|
nv-enable-nvme-hot-plug |
||
nv-hugepage |
nv-hugepage |
nv-hugepage |
nv-ipmi-devintf |
nv-ipmi-devintf |
|
nv-limits |
nv-limits |
nv-limits |
nv-update-disable |
nv-update-disable |
nv-update-disable |
nvgpu-services-list |
nvgpu-services-list |
nvgpu-services-list |
nvidia-crashdump |
nvidia-crashdump |
nvidia-crashdump |
nvidia-esm-hook-epilogue |
nvidia-esm-hook-epilogue |
nvidia-esm-hook-epilogue |
nvidia-fs-loader |
nvidia-fs-loader |
nvidia-fs-loader |
nvidia-kernel-defaults |
nvidia-kernel-defaults |
nvidia-kernel-defaults |
nvidia-nvme-smartd |
||
nvidia-pci-bridge-power |
nvidia-pci-bridge-power |
|
dgx1-system-tools |
dgx2-system-tools |
dgxstation-system-tools |
dgx-release |
dgx-release |
dgx-release |
ipmitool |
ipmitool |
|
nv-common-apis |
nv-common-apis |
nv-common-apis |
nv-env-paths |
nv-env-paths |
nv-env-paths |
nvidia-raid-config |
nvidia-raid-config |
|
nvme-cli |
||
tpm-tools |
||
dgx1-system-tools-extra |
dgx2-system-tools-extra |
dgxstation-system-tools-extra |
msecli |
||
nvidia-raid-config |
||
storcli |
The following packages are installed by the nvidia-mlnx-ofed-misc metapackage:
mlnx-fw-updater
mlnx-pxe-setup
nvidia-mlnx-config
nvidia-peermem-loader
The following additional packages are part of the DGX Software Stack:
nv-docker-options
nvidia-logrotate
nvidia-motd
nvidia-ipmisol
The following table lists all packages that will be installed as part of the system configuration package with more details:
Package |
Description |
1 |
2 |
A |
H |
---|---|---|---|---|---|
dgx-release |
Release information. |
R |
R |
R |
R |
nv-ast-modeset |
Disable the Aspeed display driver. It can cause issues with connected monitors. The AST2xxx is the BMC used in our servers. |
R |
|||
nv-cpu-governor |
Set CPU governor to performance set CPU governor mode to performance with systemd script. |
R |
R |
R |
R |
nv-docker-options |
Increase SHMEM and other resources. |
R |
R |
R |
R |
nv-enable-nvme-hot-plug |
Configure kernel parameters for NVMe hot plug (see also kernel section below). |
R |
|||
nv-hugepage |
Set the “transparent_hugepage=madvise” kernel parameter. |
R |
R |
R |
R |
nv-iommu-pt |
Set iommu=pt for AMD Rome platforms. |
R |
R |
||
nv-ipmi-devintf |
Add the ipmi_devintf module for accessing the BMC using the ipmi tool. |
R |
R |
R |
R |
nv-limits |
Increase the process resource limits for users (ulimits nofile 50000) |
R |
R |
R |
R |
nv-update-disable |
Disable automatic system upgrades. Users need to explicitly upgrade their systems using apt. |
R |
R |
R |
R |
nvgpu-services-list |
List GPU-consuming services in JSON format, such as DCGM or NVSM, and required by the firmware update mechanism. |
R |
R |
R |
R |
nvidia-acs-disable |
Disable the PCIe ACS capability to allow for better GPU-direct performance in bare-metal use cases on DGX A100 and DGX H100/H200. |
R |
R |
||
nvidia-crashdump |
Tools to manage kernel crash dumps. They are disabled by default. |
R |
R |
R |
R |
nvidia-disable-opensm |
Disable the opensm service by default. |
R |
|||
nvidia-esm-hook-epilogue |
NVIDIA package that adds text after ESM apt upgrade message to clarify DGX contract and availability of Extended Security Maintenance updates for additional packages in the Ubuntu Universe repository. |
R |
R |
R |
R |
nvidia-fs-loader |
Create a configuration file in /etc/modules.load.d to load the nvidia-fs module. |
R |
R |
R |
R |
nvidia-ipmisol |
Enable serial output through the BMC using Serial Over LAN (SOL) |
O |
O |
O |
O |
nvidia-kbd-udev |
Create a udev rule in /etc/udev/rules.d to send a signal to the BMC when the Caps Lock key is toggled. This tells the BMC to turn on or off the “CAPS” indicator on the virtual console. |
R |
|||
nvidia-kernel-defaults |
Disable ARP for security improvements net.ipv4.conf |
R |
R |
R |
R |
nvidia-logrotate |
Modify the logrotate configuration |
O |
O |
O |
O |
nvidia-mlnx-ofed-netdev-rename |
Reset mlnx enp* devices back to the original names. |
R |
|||
nvidia-motd |
Modify message-of-the-day (MOTD) to display NVSM health monitoring alerts and release information. |
O |
O |
O |
O |
nvidia-nvme-smartd |
Enable SMART monitoring on NVME devices. By default, smartd will skip NVME devices. |
R |
R |
R |
|
nvidia-pci-bridge-power |
Set the bridge power control setting to “on” for all PCI bridges. |
R |
R |
R |
R |
nvidia-pci-no-realloc |
Configure the system to disable PCI resource reallocation. |
R |
|||
nvidia-redfish-config |
Configure the redfish interface with an interface name and IP address. The interface name is “bmc _redfish0”, while the IP address is read from DMI type 42. |
R |
R |
||
nvidia-relaxed-ordering-gpu |
Set a reg-key to enable PCIe relaxed-ordering in the GPUs |
R |
|||
nvidia-relaxed-ordering-nvme |
Install a script that users can call to enable relaxed-ordering in NVME devices. |
R |
R |
Legend:
- 1
DGX-1
- 2
DGX-2
- A
DGX A100, DGX A800
- H
DGX H100/H200, DGX H800
- R
Required package
- O
Optional package
DGX Kernel Parameters
Kernel Parameter |
Description |
Package |
Location |
---|---|---|---|
ast.modeset=0 |
Disable the Aspeed display driver. The AST2xxx is the BMC used in the servers DGX-1 and DGX-2. |
nv-ast-modeset |
/etc/default/grub.d/nomodeset.cfg |
pci=realloc=on |
Allow kernel to reallocate PCI resources if allocations done by BIOS are insufficient. This and pcie_ports=native are both required for NVME hot-plug on DGX-2. |
nv-enable-nvme-hot-plug |
/etc/default/grub.d/enable-nvme-hot-plug.cfg |
pcie_ports=native |
Use Linux native services for PME, AER, DPC, PCIe hotplug, that is, not firmware first. This and pci=realloc=on are both required for NVME hot-plug on DGX-2. |
nv-enable-nvme-hot-plug |
/etc/default/grub.d/enable-nvme-hot-plug.cfg |
transparent_hugepage=madvise |
Disable huge pages system-wide and only enable them inside MADV_HUGEPAGE madvise regions to prevent applications from allocating more memory resources than necessary. |
nv-hugepage |
/etc/default/grub.d/hugepage.cfg |
iommu=pt |
Enable pass-through mode only and disable DMA translations. This enables optimizations for the CPU inside the DGX A100. |
nv-iommu-pt |
/etc/default/grub.d/iommu.cfg |
crashkernel |
Amount of memory to use for crash dumps. |
nvidia-crashdump |
/etc/default/grub.d/ipmisol.cfg |
console=ttyS[0-1],115200n8 |
Set console to serial port 0 or 1, using 115200 baud, no parity, and 8 data bits.
DGX-2, DGX H100/H200, and DGX H800: console=ttyS0,115200n8
Other systems: console=ttyS1,115200n8
|
nvidia-ipmisol |
kernel cmdline |
net.ipv4.conf.all.arp _announce = 2 |
Always use the best local address for this target. |
nvidia-kernel-defaults |
/etc/sysctl.d/20-nvidia-defaults.conf |
net.ipv4.conf.default.arp _announce = 2 |
Always use the best local address for this target. |
nvidia-kernel-defaults |
/etc/sysctl.d/20-nvidia-defaults.conf |
net.ipv4.conf.all.arp _ignore = 1 |
Only reply to ARP requests on the interface which contains the target IP address. |
nvidia-kernel-defaults |
/etc/sysctl.d/20-nvidia-defaults.conf |
net.ipv4.conf.default.arp _ignore = 1 |
Only reply to ARP requests on the interface which contains the target IP address. |
nvidia-kernel-defaults |
/etc/sysctl.d/20-nvidia-defaults.conf |
NVreg_EnablePCIERelax edOrderingMode=1 |
Set a reg-key to enable PCIe relaxed-ordering in the GPUs. |
nvidia-relaxed-ordering-gpu |
/etc/modprobe.d/nvidia-relaxed-ordering.conf |