DGX Software Stack

NVIDIA DGX Software Packages

The following tables list the packages installed as part of the DGX Software Stack, broken out by metapackage name and platform.

DGX A100, DGX Station A100, DGX A800, DGX Station A800, DGX H100

DGX A100 and DGX A800

DGX Station A100 and DGX Station A800

DGX H100

dgx-a100-system-configurations
dgx-a800-system-configurations
dgxstation-a100-system-configurations
dgxstation-a800-system-configurations
dgx-h100-system-configurations

dgx-release

dgx-release

dgx-release

nv-cpu-governor

nv-cpu-governor

nv-cpu-governor

nv-hugepage

nv-hugepage

nv-hugepage

nv-iommu-pt

nv-iommu-pt

nv-iommu-pt

nv-ipmi-devintf

nv-ipmi-devintf

nv-ipmi-devintf

nv-limits

nv-limits

nv-limits

nv-update-disable

nv-update-disable

nv-update-disable

nvgpu-services-list

nvgpu-services-list

nvgpu-services-list

nvidia-acs-disable

nvidia-acs-disable

nvidia-acs-disable

nvidia-crashdump

nvidia-crashdump

nvidia-crashdump

nvidia-esm-hook-epilogue

nvidia-esm-hook-epilogue

nvidia-esm-hook-epilogue

nvidia-fs-loader

nvidia-fs-loader

nvidia-fs-loader

nvidia-kernel-defaults

nvidia-kernel-defaults

nvidia-kernel-defaults

nvidia-nvme-smartd

nvidia-nvme-smartd

nvidia-nvme-smartd

nvidia-pci-bridge-power

nvidia-pci-bridge-power

nvidia-pci-bridge-power

nvidia-pci-norealloc

nvidia-redfish-config

nvidia-redfish-config

nvidia-redfish-config

nvidia-relaxed-ordering-gpu

nvidia-relaxed-ordering-gpu

nvidia-relaxed-ordering-gpu

nvidia-relaxed-ordering-nvme

nvidia-relaxed-ordering-nvme

nvidia-relaxed-ordering-nvme

dgx-a100-system-tools-extra
dgx-a800-system-tools-extra
dgxstation-a100-system-tools-extra
dgxstation-a800-system-tools-extra
dgx-h100-system-tools-extra

dgx-release

dgx-release

dgx-release

ipmitool

ipmitool

ipmitool

nv-common-apis

nv-common-apis

nv-common-apis

nv-env-paths

nv-env-paths

nv-env-paths

nvidia-mig-manager

nvidia-mig-manager

nvidia-raid-config

nvidia-raid-config

nvidia-raid-config

nvme-cli

nvme-cli

nvme-cli

tpm2-tools

tpm2-tools

tpm2-tools

dgx-a100-system-tools
dgx-a800-system-tools
dgxstation-a100-system-tools
dgxstation-a800-system-tools
dgx-h100-system-tools

msecli

msecli

msecli

DGX-1, DGX-2, and DGX Station

DGX-1

DGX-2

DGX Station

dgx1-system-configurations

dgx2-system-configurations

dgxstation-system-configurations

dgx-release

dgx-release

dgx-release

nv-ast-modeset

nv-cpu-governor

nv-cpu-governor

nv-hugepage

nv-hugepage

nv-hugepage

nv-iommu-pt

nv-ipmi-devintf

nv-ipmi-devintf

nv-limits

nv-limits

nv-limits

nv-update-disable

nv-update-disable

nv-update-disable

nvgpu-services-list

nvgpu-services-list

nvgpu-services-list

nvidia-acs-disable

nvidia-crashdump

nvidia-crashdump

nvidia-crashdump

nvidia-esm-hook-epilogue

nvidia-esm-hook-epilogue

nvidia-esm-hook-epilogue

nvidia-fs-loader

nvidia-fs-loader

nvidia-fs-loader

nvidia-kernel-defaults

nvidia-kernel-defaults

nvidia-kernel-defaults

nvidia-nvme-smartd

nvidia-pci-bridge-power

nvidia-pci-bridge-power

nvidia-redfish-config

nvidia-relaxed-ordering-gpu

nvidia-relaxed-ordering-nvme

dgx1-system-tools

dgx2-system-tools

dgxstation-system-tools

dgx-release

dgx-release

dgx-release

ipmitool

ipmitool

nv-common-apis

nv-common-apis

nv-common-apis

nv-env-paths

nv-env-paths

nv-env-paths

nvidia-raid-config

nvidia-raid-config

nvme-cli

tpm-tools

dgx1-system-tools-extra

dgx2-system-tools-extra

dgxstation-system-tools-extra

msecli

nvidia-raid-config

storcli

The following packages are installed by the nvidia-mlnx-ofed-misc metapackage:

  • mlnx-fw-updater

  • mlnx-pxe-setup

  • nvidia-mlnx-config

  • nvidia-peermem-loader

The following additional packages are part of the DGX Software Stack:

  • nv-docker-options

  • nvidia-logrotate

  • nvidia-motd

  • nvidia-ipmisol

The following table lists all packages that will be installed as part of the system configuration package with more details:

Package

Description

1

2

A

H

dgx-release

Release information

R

R

R

R

nv-ast-modeset

Disable the Aspeed display driver.It can cause issues with connected monitors. The AST2xxx is the BMC used in our servers.

R

R

R

R

nv-enable-nvme-hot-plug

Configure kernel parameters for NVMe hot plug (see also kernel section below).

R

nv-hugepage

Sets the “transparent_hugepage=madvise” kernel parameter.

R

R

R

R

nv-iommu-pt

Sets iommu=pt for AMD Rome platforms.

R

R

nv-ipmi-devintf

Add the ipmi_devintf module for accessing the BMC using the ipmi tool.

R

R

R

R

nv-limits

Increase the process resource limits for users (ulimits nofile 50000)

R

R

R

R

nv-update-disable

Disable automatic system upgrades. Users need to explicitly upgrade their systems using apt.

R

R

R

R

nvgpu-services-list

Lists GPU-consuming services in JSON format, such as DCGM or NVSM, and required by the firmware update mechanism.

R

R

R

R

nvidia-acs-disable

Disables the PCIe ACS capability to allow for better GPU-direct performance in bare-metal use cases on DGX A100 and DGX H100.

R

R

nvidia-crashdump

Tools to manage kernel crash dumps. They are disabled by default.

R

R

R

R

nv-docker-options

Increases SHMEM and other resources.

R

R

R

R

nvidia-ipmisol

Enables serial output through the BMC using Serial Over LAN (SOL)

O

O

O

O

nvidia-kernel-defaults

Disable ARP for security improvements net.ipv4.conf

R

R

R

R

nvidia-logrotate

Modify the logrotate configuration

O

O

O

O

nvidia-motd

Modify message-of-the-day (MOTD) to display NVSM health monitoring alerts and release information.

O

O

O

O

nvidia-nvme-smartd

Enables SMART monitoring on NVME devices. By default, smartd will skip NVME devices.

R

R

R

nvidia-pci-bridge-power

Sets the bridge power control setting to “on” for all PCI bridges.

R

R

R

R

nvidia-relaxed-ordering-gpu

Sets a reg-key to enable PCIe relaxed-ordering in the GPUs

R

R

nvidia-relaxed-ordering-nvme

Installs a script that users can call to enable relaxed-ordering in NVME devices.

R

R

nvidia-redfish-config

Configures the redfish interface with an interface name and IP address. The interface name is “bmc _redfish0”, while the IP address is read from DMI type 42.

R

R

Legend:

1

DGX-1

2

DGX-2

A

DGX A100

H

DGX H100

R

Required package

O

Optional package

DGX Kernel Parameters

Kernel Parameter

Description

Package

ast.modeset=0

Disable the Aspeed display driver. The AST2xxx is the BMC used in our servers.

[DGX-1, DGX-2, DGX A100, DGX Station A100, DGX H100]

nv-ast-modeset

crashkernel=1G-:0M

Don’t reserve any memory for crash dumps (when crash is disabled = default)

nvidia-crashdump

crashkernel=1G-:512M

Reserve 512MB for crash dumps (when crash is enabled)

nvidia-crashdump

pci=realloc=on

Allows kernel to reallocate PCI resources if allocations done by BIOS are insufficient.

This and pcie_ports=native are both required for NVME hot-plug on DGX2.

nv-enable-nvme-hot-plug

pcie_ports=native

Use Linux native services for PME, AER, DPC, PCIe hotplug. I.e. not firmware first.

This and pci=realloc=on are both required for NVME hot-plug on DGX2.

nv-enable-nvme-hot-plug

transparent_hugepage=madvise

Disable huge pages system-wide and only enable them inside MADV_HUGEPAGE madvise regions to prevent applications from allocating more memory resources than necessary.

nv-hugepage

iommu=pt

Enable pass through mode only and disable DMA translations. This enables optimizations for the CPU inside the DGX A100.

nv-iommu-pt

console=ttyS1,115200n8

Set console to serial port 1, using 115200 baud, no parity, 8 data bits

[DGX-2 and DGX H100]

nvidia-ipmisol

console=ttyS0,115200n8

Set console to serial port 0, using 115200 baud, no parity, 8 data bits

nvidia-ipmisol