DGX Software Stack
NVIDIA DGX Software Packages
The following tables list the packages installed as part of the DGX Software Stack, broken out by metapackage name and platform.
DGX A100 and DGX A800 |
DGX Station A100 and DGX Station A800 |
DGX H100 |
---|---|---|
dgx-a100-system-configurations
dgx-a800-system-configurations
|
dgxstation-a100-system-configurations
dgxstation-a800-system-configurations
|
dgx-h100-system-configurations
|
dgx-release |
dgx-release |
dgx-release |
nv-cpu-governor |
nv-cpu-governor |
nv-cpu-governor |
nv-hugepage |
nv-hugepage |
nv-hugepage |
nv-iommu-pt |
nv-iommu-pt |
nv-iommu-pt |
nv-ipmi-devintf |
nv-ipmi-devintf |
nv-ipmi-devintf |
nv-limits |
nv-limits |
nv-limits |
nv-update-disable |
nv-update-disable |
nv-update-disable |
nvgpu-services-list |
nvgpu-services-list |
nvgpu-services-list |
nvidia-acs-disable |
nvidia-acs-disable |
nvidia-acs-disable |
nvidia-crashdump |
nvidia-crashdump |
nvidia-crashdump |
nvidia-esm-hook-epilogue |
nvidia-esm-hook-epilogue |
nvidia-esm-hook-epilogue |
nvidia-fs-loader |
nvidia-fs-loader |
nvidia-fs-loader |
nvidia-kernel-defaults |
nvidia-kernel-defaults |
nvidia-kernel-defaults |
nvidia-nvme-smartd |
nvidia-nvme-smartd |
nvidia-nvme-smartd |
nvidia-pci-bridge-power |
nvidia-pci-bridge-power |
nvidia-pci-bridge-power |
nvidia-pci-norealloc |
||
nvidia-redfish-config |
nvidia-redfish-config |
nvidia-redfish-config |
nvidia-relaxed-ordering-gpu |
nvidia-relaxed-ordering-gpu |
nvidia-relaxed-ordering-gpu |
nvidia-relaxed-ordering-nvme |
nvidia-relaxed-ordering-nvme |
nvidia-relaxed-ordering-nvme |
dgx-a100-system-tools-extra
dgx-a800-system-tools-extra
|
dgxstation-a100-system-tools-extra
dgxstation-a800-system-tools-extra
|
dgx-h100-system-tools-extra
|
dgx-release |
dgx-release |
dgx-release |
ipmitool |
ipmitool |
ipmitool |
nv-common-apis |
nv-common-apis |
nv-common-apis |
nv-env-paths |
nv-env-paths |
nv-env-paths |
nvidia-mig-manager |
nvidia-mig-manager |
|
nvidia-raid-config |
nvidia-raid-config |
nvidia-raid-config |
nvme-cli |
nvme-cli |
nvme-cli |
tpm2-tools |
tpm2-tools |
tpm2-tools |
dgx-a100-system-tools
dgx-a800-system-tools
|
dgxstation-a100-system-tools
dgxstation-a800-system-tools
|
dgx-h100-system-tools
|
msecli |
msecli |
msecli |
DGX-1 |
DGX-2 |
DGX Station |
---|---|---|
dgx1-system-configurations |
dgx2-system-configurations |
dgxstation-system-configurations |
dgx-release |
dgx-release |
dgx-release |
nv-ast-modeset |
||
nv-cpu-governor |
nv-cpu-governor |
|
nv-hugepage |
nv-hugepage |
nv-hugepage |
nv-iommu-pt |
||
nv-ipmi-devintf |
nv-ipmi-devintf |
|
nv-limits |
nv-limits |
nv-limits |
nv-update-disable |
nv-update-disable |
nv-update-disable |
nvgpu-services-list |
nvgpu-services-list |
nvgpu-services-list |
nvidia-acs-disable |
||
nvidia-crashdump |
nvidia-crashdump |
nvidia-crashdump |
nvidia-esm-hook-epilogue |
nvidia-esm-hook-epilogue |
nvidia-esm-hook-epilogue |
nvidia-fs-loader |
nvidia-fs-loader |
nvidia-fs-loader |
nvidia-kernel-defaults |
nvidia-kernel-defaults |
nvidia-kernel-defaults |
nvidia-nvme-smartd |
||
nvidia-pci-bridge-power |
nvidia-pci-bridge-power |
|
nvidia-redfish-config |
||
nvidia-relaxed-ordering-gpu |
||
nvidia-relaxed-ordering-nvme |
||
dgx1-system-tools |
dgx2-system-tools |
dgxstation-system-tools |
dgx-release |
dgx-release |
dgx-release |
ipmitool |
ipmitool |
|
nv-common-apis |
nv-common-apis |
nv-common-apis |
nv-env-paths |
nv-env-paths |
nv-env-paths |
nvidia-raid-config |
nvidia-raid-config |
|
nvme-cli |
||
tpm-tools |
||
dgx1-system-tools-extra |
dgx2-system-tools-extra |
dgxstation-system-tools-extra |
msecli |
||
nvidia-raid-config |
||
storcli |
The following packages are installed by the nvidia-mlnx-ofed-misc metapackage:
mlnx-fw-updater
mlnx-pxe-setup
nvidia-mlnx-config
nvidia-peermem-loader
The following additional packages are part of the DGX Software Stack:
nv-docker-options
nvidia-logrotate
nvidia-motd
nvidia-ipmisol
The following table lists all packages that will be installed as part of the system configuration package with more details:
Package |
Description |
1 |
2 |
A |
H |
---|---|---|---|---|---|
dgx-release |
Release information |
R |
R |
R |
R |
nv-ast-modeset |
Disable the Aspeed display driver.It can cause issues with connected monitors. The AST2xxx is the BMC used in our servers. |
R |
R |
R |
R |
nv-enable-nvme-hot-plug |
Configure kernel parameters for NVMe hot plug (see also kernel section below). |
R |
|||
nv-hugepage |
Sets the “transparent_hugepage=madvise” kernel parameter. |
R |
R |
R |
R |
nv-iommu-pt |
Sets iommu=pt for AMD Rome platforms. |
R |
R |
||
nv-ipmi-devintf |
Add the ipmi_devintf module for accessing the BMC using the ipmi tool. |
R |
R |
R |
R |
nv-limits |
Increase the process resource limits for users (ulimits nofile 50000) |
R |
R |
R |
R |
nv-update-disable |
Disable automatic system upgrades. Users need to explicitly upgrade their systems using apt. |
R |
R |
R |
R |
nvgpu-services-list |
Lists GPU-consuming services in JSON format, such as DCGM or NVSM, and required by the firmware update mechanism. |
R |
R |
R |
R |
nvidia-acs-disable |
Disables the PCIe ACS capability to allow for better GPU-direct performance in bare-metal use cases on DGX A100 and DGX H100. |
R |
R |
||
nvidia-crashdump |
Tools to manage kernel crash dumps. They are disabled by default. |
R |
R |
R |
R |
nv-docker-options |
Increases SHMEM and other resources. |
R |
R |
R |
R |
nvidia-ipmisol |
Enables serial output through the BMC using Serial Over LAN (SOL) |
O |
O |
O |
O |
nvidia-kernel-defaults |
Disable ARP for security improvements net.ipv4.conf |
R |
R |
R |
R |
nvidia-logrotate |
Modify the logrotate configuration |
O |
O |
O |
O |
nvidia-motd |
Modify message-of-the-day (MOTD) to display NVSM health monitoring alerts and release information. |
O |
O |
O |
O |
nvidia-nvme-smartd |
Enables SMART monitoring on NVME devices. By default, smartd will skip NVME devices. |
R |
R |
R |
|
nvidia-pci-bridge-power |
Sets the bridge power control setting to “on” for all PCI bridges. |
R |
R |
R |
R |
nvidia-relaxed-ordering-gpu |
Sets a reg-key to enable PCIe relaxed-ordering in the GPUs |
R |
R |
||
nvidia-relaxed-ordering-nvme |
Installs a script that users can call to enable relaxed-ordering in NVME devices. |
R |
R |
||
nvidia-redfish-config |
Configures the redfish interface with an interface name and IP address. The interface name is “bmc _redfish0”, while the IP address is read from DMI type 42. |
R |
R |
Legend:
- 1
DGX-1
- 2
DGX-2
- A
DGX A100
- H
DGX H100
- R
Required package
- O
Optional package
DGX Kernel Parameters
Kernel Parameter |
Description |
Package |
---|---|---|
ast.modeset=0 |
Disable the Aspeed display driver. The AST2xxx is the BMC used in our servers. [DGX-1, DGX-2, DGX A100, DGX Station A100, DGX H100] |
nv-ast-modeset |
crashkernel=1G-:0M |
Don’t reserve any memory for crash dumps (when crash is disabled = default) |
nvidia-crashdump |
crashkernel=1G-:512M |
Reserve 512MB for crash dumps (when crash is enabled) |
nvidia-crashdump |
pci=realloc=on |
Allows kernel to reallocate PCI resources if allocations done by BIOS are insufficient. This and pcie_ports=native are both required for NVME hot-plug on DGX2. |
nv-enable-nvme-hot-plug |
pcie_ports=native |
Use Linux native services for PME, AER, DPC, PCIe hotplug. I.e. not firmware first. This and pci=realloc=on are both required for NVME hot-plug on DGX2. |
nv-enable-nvme-hot-plug |
transparent_hugepage=madvise |
Disable huge pages system-wide and only enable them inside MADV_HUGEPAGE madvise regions to prevent applications from allocating more memory resources than necessary. |
nv-hugepage |
iommu=pt |
Enable pass through mode only and disable DMA translations. This enables optimizations for the CPU inside the DGX A100. |
nv-iommu-pt |
console=ttyS1,115200n8 |
Set console to serial port 1, using 115200 baud, no parity, 8 data bits [DGX-2 and DGX H100] |
nvidia-ipmisol |
console=ttyS0,115200n8 |
Set console to serial port 0, using 115200 baud, no parity, 8 data bits |
nvidia-ipmisol |