DGX Software Stack

NVIDIA DGX Software Packages

The following tables list the packages installed as part of the DGX Software Stack, broken out by metapackage name and platform.

DGX A100, DGX Station A100, DGX A800, DGX Station A800, DGX H100/H200, and DGX H800

DGX A100 and DGX A800

DGX Station A100 and DGX Station A800

DGX H100/H200 and DGX H800

dgx-a100-system-configurations
dgx-a800-system-configurations
dgxstation-a100-system-configurations
dgxstation-a800-system-configurations
dgx-h100-system-configurations
dgx-h200-system-configurations
dgx-h800-system-configurations

dgx-release

dgx-release

dgx-release

nv-cpu-governor

nv-cpu-governor

nv-cpu-governor

nv-hugepage

nv-hugepage

nv-hugepage

nv-iommu-pt

nv-iommu-pt

nv-iommu-pt

nv-ipmi-devintf

nv-ipmi-devintf

nv-ipmi-devintf

nv-limits

nv-limits

nv-limits

nv-update-disable

nv-update-disable

nv-update-disable

nvgpu-services-list

nvgpu-services-list

nvgpu-services-list

nvidia-acs-disable

nvidia-acs-disable

nvidia-crashdump

nvidia-crashdump

nvidia-crashdump

nvidia-disable-opensm

nvidia-esm-hook-epilogue

nvidia-esm-hook-epilogue

nvidia-esm-hook-epilogue

nvidia-fs-loader

nvidia-fs-loader

nvidia-fs-loader

nvidia-kbd-udev

nvidia-kernel-defaults

nvidia-kernel-defaults

nvidia-kernel-defaults

nvidia-mlnx-ofed-netdev-rename

nvidia-nvme-smartd

nvidia-nvme-smartd

nvidia-nvme-smartd

nvidia-pci-bridge-power

nvidia-pci-bridge-power

nvidia-pci-bridge-power

nvidia-pci-no-realloc

nvidia-redfish-config

nvidia-redfish-config

nvidia-redfish-config

nvidia-relaxed-ordering-gpu

nvidia-relaxed-ordering-gpu

nvidia-relaxed-ordering-nvme

nvidia-relaxed-ordering-nvme

nvidia-relaxed-ordering-nvme

dgx-a100-system-tools
dgx-a800-system-tools
dgxstation-a100-system-tools
dgxstation-a800-system-tools
dgx-h100-system-tools
dgx-h200-system-tools
dgx-h800-system-tools

dgx-release

dgx-release

dgx-release

ipmitool

ipmitool

ipmitool

nv-common-apis

nv-common-apis

nv-common-apis

nv-env-paths

nv-env-paths

nv-env-paths

nvdebug

nvidia-mig-manager

nvidia-mig-manager

nvidia-mig-manager

nvidia-raid-config

nvidia-raid-config

nvidia-raid-config

nvme-cli

nvme-cli

nvme-cli

tpm2-tools

tpm2-tools

tpm2-tools

dgx-a100-system-tools-extra
dgx-a800-system-tools-extra
dgxstation-a100-system-tools-extra
dgxstation-a800-system-tools-extra
dgx-h100-system-tools-extra
dgx-h200-system-tools-extra
dgx-h800-system-tools-extra

msecli

msecli

msecli

DGX-1, DGX-2, and DGX Station

DGX-1

DGX-2

DGX Station

dgx1-system-configurations

dgx2-system-configurations

dgxstation-system-configurations

dgx-release

dgx-release

dgx-release

nv-ast-modeset

nv-cpu-governor

nv-cpu-governor

nv-enable-nvme-hot-plug

nv-hugepage

nv-hugepage

nv-hugepage

nv-ipmi-devintf

nv-ipmi-devintf

nv-limits

nv-limits

nv-limits

nv-update-disable

nv-update-disable

nv-update-disable

nvgpu-services-list

nvgpu-services-list

nvgpu-services-list

nvidia-crashdump

nvidia-crashdump

nvidia-crashdump

nvidia-esm-hook-epilogue

nvidia-esm-hook-epilogue

nvidia-esm-hook-epilogue

nvidia-fs-loader

nvidia-fs-loader

nvidia-fs-loader

nvidia-kernel-defaults

nvidia-kernel-defaults

nvidia-kernel-defaults

nvidia-nvme-smartd

nvidia-pci-bridge-power

nvidia-pci-bridge-power

dgx1-system-tools

dgx2-system-tools

dgxstation-system-tools

dgx-release

dgx-release

dgx-release

ipmitool

ipmitool

nv-common-apis

nv-common-apis

nv-common-apis

nv-env-paths

nv-env-paths

nv-env-paths

nvidia-raid-config

nvidia-raid-config

nvme-cli

tpm-tools

dgx1-system-tools-extra

dgx2-system-tools-extra

dgxstation-system-tools-extra

msecli

nvidia-raid-config

storcli

The following packages are installed by the nvidia-mlnx-ofed-misc metapackage:

  • mlnx-fw-updater

  • mlnx-pxe-setup

  • nvidia-mlnx-config

  • nvidia-peermem-loader

The following additional packages are part of the DGX Software Stack:

  • nv-docker-options

  • nvidia-logrotate

  • nvidia-motd

  • nvidia-ipmisol

The following table lists all packages that will be installed as part of the system configuration package with more details:

Package

Description

1

2

A

H

dgx-release

Release information.

R

R

R

R

nv-ast-modeset

Disable the Aspeed display driver. It can cause issues with connected monitors. The AST2xxx is the BMC used in our servers.

R

nv-cpu-governor

Set CPU governor to performance set CPU governor mode to performance with systemd script.

R

R

R

R

nv-docker-options

Increase SHMEM and other resources.

R

R

R

R

nv-enable-nvme-hot-plug

Configure kernel parameters for NVMe hot plug (see also kernel section below).

R

nv-hugepage

Set the “transparent_hugepage=madvise” kernel parameter.

R

R

R

R

nv-iommu-pt

Set iommu=pt for AMD Rome platforms.

R

R

nv-ipmi-devintf

Add the ipmi_devintf module for accessing the BMC using the ipmi tool.

R

R

R

R

nv-limits

Increase the process resource limits for users (ulimits nofile 50000)

R

R

R

R

nv-update-disable

Disable automatic system upgrades. Users need to explicitly upgrade their systems using apt.

R

R

R

R

nvgpu-services-list

List GPU-consuming services in JSON format, such as DCGM or NVSM, and required by the firmware update mechanism.

R

R

R

R

nvidia-acs-disable

Disable the PCIe ACS capability to allow for better GPU-direct performance in bare-metal use cases on DGX A100 and DGX H100/H200.

R

R

nvidia-crashdump

Tools to manage kernel crash dumps. They are disabled by default.

R

R

R

R

nvidia-disable-opensm

Disable the opensm service by default.

R

nvidia-esm-hook-epilogue

NVIDIA package that adds text after ESM apt upgrade message to clarify DGX contract and availability of Extended Security Maintenance updates for additional packages in the Ubuntu Universe repository.

R

R

R

R

nvidia-fs-loader

Create a configuration file in /etc/modules.load.d to load the nvidia-fs module.

R

R

R

R

nvidia-ipmisol

Enable serial output through the BMC using Serial Over LAN (SOL)

O

O

O

O

nvidia-kbd-udev

Create a udev rule in /etc/udev/rules.d to send a signal to the BMC when the Caps Lock key is toggled. This tells the BMC to turn on or off the “CAPS” indicator on the virtual console.

R

nvidia-kernel-defaults

Disable ARP for security improvements net.ipv4.conf

R

R

R

R

nvidia-logrotate

Modify the logrotate configuration

O

O

O

O

nvidia-mlnx-ofed-netdev-rename

Reset mlnx enp* devices back to the original names.

R

nvidia-motd

Modify message-of-the-day (MOTD) to display NVSM health monitoring alerts and release information.

O

O

O

O

nvidia-nvme-smartd

Enable SMART monitoring on NVME devices. By default, smartd will skip NVME devices.

R

R

R

nvidia-pci-bridge-power

Set the bridge power control setting to “on” for all PCI bridges.

R

R

R

R

nvidia-pci-no-realloc

Configure the system to disable PCI resource reallocation.

R

nvidia-redfish-config

Configure the redfish interface with an interface name and IP address. The interface name is “bmc _redfish0”, while the IP address is read from DMI type 42.

R

R

nvidia-relaxed-ordering-gpu

Set a reg-key to enable PCIe relaxed-ordering in the GPUs

R

nvidia-relaxed-ordering-nvme

Install a script that users can call to enable relaxed-ordering in NVME devices.

R

R

Legend:

1

DGX-1

2

DGX-2

A

DGX A100, DGX A800

H

DGX H100/H200, DGX H800

R

Required package

O

Optional package

DGX Kernel Parameters

Kernel Parameter

Description

Package

Location

ast.modeset=0

Disable the Aspeed display driver. The AST2xxx is the BMC used in the servers DGX-1 and DGX-2.

nv-ast-modeset

/etc/default/grub.d/nomodeset.cfg

pci=realloc=on

Allow kernel to reallocate PCI resources if allocations done by BIOS are insufficient. This and pcie_ports=native are both required for NVME hot-plug on DGX-2.

nv-enable-nvme-hot-plug

/etc/default/grub.d/enable-nvme-hot-plug.cfg

pcie_ports=native

Use Linux native services for PME, AER, DPC, PCIe hotplug, that is, not firmware first. This and pci=realloc=on are both required for NVME hot-plug on DGX-2.

nv-enable-nvme-hot-plug

/etc/default/grub.d/enable-nvme-hot-plug.cfg

transparent_hugepage=madvise

Disable huge pages system-wide and only enable them inside MADV_HUGEPAGE madvise regions to prevent applications from allocating more memory resources than necessary.

nv-hugepage

/etc/default/grub.d/hugepage.cfg

iommu=pt

Enable pass-through mode only and disable DMA translations. This enables optimizations for the CPU inside the DGX A100.

nv-iommu-pt

/etc/default/grub.d/iommu.cfg

crashkernel

Amount of memory to use for crash dumps.

nvidia-crashdump

/etc/default/grub.d/ipmisol.cfg

console=ttyS[0-1],115200n8

Set console to serial port 0 or 1, using 115200 baud, no parity, and 8 data bits.
DGX-2, DGX H100/H200, and DGX H800: console=ttyS0,115200n8
Other systems: console=ttyS1,115200n8

nvidia-ipmisol

kernel cmdline

net.ipv4.conf.all.arp _announce = 2

Always use the best local address for this target.

nvidia-kernel-defaults

/etc/sysctl.d/20-nvidia-defaults.conf

net.ipv4.conf.default.arp _announce = 2

Always use the best local address for this target.

nvidia-kernel-defaults

/etc/sysctl.d/20-nvidia-defaults.conf

net.ipv4.conf.all.arp _ignore = 1

Only reply to ARP requests on the interface which contains the target IP address.

nvidia-kernel-defaults

/etc/sysctl.d/20-nvidia-defaults.conf

net.ipv4.conf.default.arp _ignore = 1

Only reply to ARP requests on the interface which contains the target IP address.

nvidia-kernel-defaults

/etc/sysctl.d/20-nvidia-defaults.conf

NVreg_EnablePCIERelax edOrderingMode=1

Set a reg-key to enable PCIe relaxed-ordering in the GPUs.

nvidia-relaxed-ordering-gpu

/etc/modprobe.d/nvidia-relaxed-ordering.conf