DGX Software Stack#

NVIDIA DGX Software Packages#

The following tables list the packages installed as part of the DGX Software Stack, broken out by metapackage name and platform.

DGX A100, DGX Station A100, DGX A800, DGX Station A800, DGX H100/H200, and DGX H800#

DGX A100 and DGX A800

DGX Station A100 and DGX Station A800

DGX H100/H200 and DGX H800

dgx-a100-system-configurations
dgx-a800-system-configurations
dgxstation-a100-system-configurations
dgxstation-a800-system-configurations
dgx-h100-system-configurations
dgx-h200-system-configurations
dgx-h800-system-configurations

dgx-release

dgx-release

dgx-release

nv-cpu-governor

nv-cpu-governor

nv-cpu-governor

nv-hugepage

nv-hugepage

nv-hugepage

nv-iommu-pt

nv-iommu-pt

nv-iommu-pt

nv-ipmi-devintf

nv-ipmi-devintf

nv-ipmi-devintf

nv-limits

nv-limits

nv-limits

nv-update-disable

nv-update-disable

nv-update-disable

nvgpu-services-list

nvgpu-services-list

nvgpu-services-list

nvidia-acs-disable

nvidia-acs-disable

nvidia-crashdump

nvidia-crashdump

nvidia-crashdump

nvidia-disable-opensm

nvidia-disable-opensm

nvidia-disable-opensm

nvidia-esm-hook-epilogue

nvidia-esm-hook-epilogue

nvidia-esm-hook-epilogue

nvidia-fs-loader

nvidia-fs-loader

nvidia-fs-loader

nvidia-kbd-udev

nvidia-kbd-udev

nvidia-kbd-udev

nvidia-kernel-defaults

nvidia-kernel-defaults

nvidia-kernel-defaults

nvidia-mlnx-ofed-netdev-rename

nvidia-mlnx-ofed-netdev-rename

nvidia-mlnx-ofed-netdev-rename

nvidia-nvme-smartd

nvidia-nvme-smartd

nvidia-nvme-smartd

nvidia-pci-bridge-power

nvidia-pci-bridge-power

nvidia-pci-bridge-power

nvidia-pci-no-realloc

nvidia-redfish-config

nvidia-redfish-config

nvidia-redfish-config

nvidia-relaxed-ordering-gpu

nvidia-relaxed-ordering-gpu

nvidia-relaxed-ordering-nvme

nvidia-relaxed-ordering-nvme

nvidia-relaxed-ordering-nvme

dgx-a100-system-tools
dgx-a800-system-tools
dgxstation-a100-system-tools
dgxstation-a800-system-tools
dgx-h100-system-tools
dgx-h200-system-tools
dgx-h800-system-tools

dgx-release

dgx-release

dgx-release

ipmitool

ipmitool

ipmitool

nv-common-apis

nv-common-apis

nv-common-apis

nv-env-paths

nv-env-paths

nv-env-paths

nvdebug

nvidia-mig-manager

nvidia-mig-manager

nvidia-mig-manager

nvidia-raid-config

nvidia-raid-config

nvidia-raid-config

nvme-cli

nvme-cli

nvme-cli

tpm2-tools

tpm2-tools

tpm2-tools

dgx-a100-system-tools-extra
dgx-a800-system-tools-extra
dgxstation-a100-system-tools-extra
dgxstation-a800-system-tools-extra
dgx-h100-system-tools-extra
dgx-h200-system-tools-extra
dgx-h800-system-tools-extra

msecli

msecli

msecli

DGX-1, DGX-2, and DGX Station#

DGX-1

DGX-2

DGX Station

dgx1-system-configurations
dgx2-system-configurations
dgxstation-system-configurations

dgx-release

dgx-release

dgx-release

nv-ast-modeset

nv-cpu-governor

nv-cpu-governor

nv-enable-nvme-hot-plug

nv-hugepage

nv-hugepage

nv-hugepage

nv-ipmi-devintf

nv-ipmi-devintf

nv-limits

nv-limits

nv-limits

nv-update-disable

nv-update-disable

nv-update-disable

nvgpu-services-list

nvgpu-services-list

nvgpu-services-list

nvidia-crashdump

nvidia-crashdump

nvidia-crashdump

nvidia-disable-opensm

nvidia-disable-opensm

nvidia-disable-opensm

nvidia-esm-hook-epilogue

nvidia-esm-hook-epilogue

nvidia-esm-hook-epilogue

nvidia-fs-loader

nvidia-fs-loader

nvidia-fs-loader

nvidia-kbd-udev

nvidia-kbd-udev

nvidia-kbd-udev

nvidia-kernel-defaults

nvidia-kernel-defaults

nvidia-kernel-defaults

nvidia-mlnx-ofed-netdev-rename

nvidia-mlnx-ofed-netdev-rename

nvidia-mlnx-ofed-netdev-rename

nvidia-nvme-smartd

nvidia-pci-bridge-power

nvidia-pci-bridge-power

dgx1-system-tools
dgx2-system-tools
dgxstation-system-tools

dgx-release

dgx-release

dgx-release

ipmitool

ipmitool

nv-common-apis

nv-common-apis

nv-common-apis

nv-env-paths

nv-env-paths

nv-env-paths

nvidia-raid-config

nvidia-raid-config

nvme-cli

tpm-tools

dgx1-system-tools-extra
dgx2-system-tools-extra
dgxstation-system-tools-extra

msecli

nvidia-raid-config

storcli

The following packages are installed by the nvidia-mlnx-ofed-misc metapackage:

  • mlnx-fw-updater

  • mlnx-pxe-setup

  • nvidia-mlnx-config

  • nvidia-peermem-loader

The following additional packages are part of the DGX Software Stack:

  • nv-docker-options

  • nvidia-logrotate

  • nvidia-motd

  • nvidia-ipmisol

Base OS 6.3.1 Installed Packages#

The following table lists all packages that will be installed as part of the system configuration package with more details:

Package Name

Description

1

2

A

H

containerd.io

An open and reliable container runtime.

X

X

X

X

cuda-compute-repo

CUDA compute repository configuration files.

X

X

X

X

cuda-nvml-dev-12-4

NVML native dev links, headers.

X

X

X

X

dgx-release

Package updates the DGX OS release information.

X

X

X

X

dgx-repo

DGX repository configuration files.

X

X

X

X

dgx-server-grub

DGX Server grub customizer.

X

X

X

X

docker-ce

Docker.

X

X

X

X

hpc-sdk-repo

NVIDIA HPC SDK repository configuration files.

X

X

X

X

mlnx-pxe-setup

Provide a script to enable PXE booting using Mellanox cards.

X

X

X

X

msecli

Micron Storage Executive CLI.

X

X

X

nv-ast-modeset

Disable ast driver during boot.

X

nv-common-apis

Install commonly used scripts used by Nvidia systems.

X

X

X

X

nv-cpu-governor

Set CPU governor to performance.

X

X

X

X

nv-docker-options

Docker daemon options.

X

X

X

X

nv-enable-nvme-hot-plug

Set PCIe kernel parameters during boot.

X

nv-env-paths

Configure PATH variable.

X

X

X

X

nv-hugepage

Enable transparent huge pages.

X

X

X

X

nv-iommu-pt

Enable iommu in passthrough mode.

X

X

nv-ipmi-devintf

Load the ipmi_devintf module.

X

X

X

X

nv-limits

Increase the file limit.

X

X

X

X

nv-persistence-mode

Enable persistence mode.

X

X

X

X

nv-update-disable

Disable OS update prompt.

X

X

X

X

nvdebug

NVIDIA Debug tool.

X

nvgpu-services-list

List of all GPU-related services.

X

X

X

X

nvidia-acs-disable

Disable the PCIe ACS capability.

X

X

nvidia-chardev-links

Install udev rule that creates symlinks to NVIDIA devices.

X

X

X

X

nvidia-conf-cachefilesd

Systemd settings for cachefilesd.

X

X

X

X

nvidia-crashdump

NVIDIA crash dump policy.

X

X

X

X

nvidia-disable-opensm

Disable opensm.

X

X

X

X

nvidia-esm-hook-epilogue

NVIDIA package to clarify ESM policy.

X

X

X

X

nvidia-fs-loader

Load the nvidia-fs module.

X

X

X

X

nvidia-ipmisol

Enable IPMI Serial-over-LAN.

X

X

X

X

nvidia-kbd-udev

Enable caps lock indicator on BMC virtual console.

X

X

X

X

nvidia-kernel-defaults

sysctl default kernel settings for DGX.

X

X

X

X

nvidia-logrotate

NVIDIA logrotate policy.

X

X

X

X

nvidia-manage-ofed

Tool to manage OFED installations.

X

X

X

X

nvidia-mig-manager

NVIDIA MIG Partition Editor and Systemd Service.

X

X

nvidia-mlnx-config

Configure the MLNX devices.

X

X

X

X

nvidia-mlnx-names

Change the device names of Mellanox devices.

X

X

X

X

nvidia-mlnx-ofed-netdev-rename

Reset mlnx enp* devices back to their original names.

X

X

X

X

nvidia-motd

Custom motd files for NVIDIA platforms.

X

X

X

X

nvidia-mstflint-loader

Load the mstflint-access module.

X

X

X

X

nvidia-nvme-smartd

Enable SMART monitoring on NVME devices.

X

X

X

nvidia-oem-config-bmc

Ubiquity plugin to configure BMC on NVIDIA platforms.

X

X

X

X

nvidia-oem-config-crypt-passwd

Ubiquity plugin to reset crypt password.

X

X

X

X

nvidia-oem-config-eula

Ubiquity plugin to display EULA.

X

X

X

X

nvidia-oem-config-grub-passwd

Ubiquity plugin to configure GRUB password on NVIDIA platforms.

X

X

X

X

nvidia-oem-config-postact

Ubiquity plugin to complete final actions before booting.

X

X

X

X

nvidia-pci-bridge-power

Set PCI bridge power control to on.

X

X

X

X

nvidia-pci-no-realloc

Disable PCI resource reallocation.

X

nvidia-peermem-loader

Load the nvidia-peermem module.

X

X

X

X

nvidia-raid-config

DGX RAID Configuration.

X

X

X

X

nvidia-redfish-config

Configure Redfish Host Interface.

X

X

nvidia-relaxed-ordering-gpu

Configure PCIe Relaxed Ordering.

X

nvidia-relaxed-ordering-nvme

Configure PCIe Relaxed Ordering.

X

X

nvidia-repo-keys

Add keys to apt trusted.gpg database.

X

X

X

X

nvidia-systemd-reorder

Fixe the start-up order for NVIDIA services.

X

X

X

nvipmitool

NVIDIA customizes ipmitool, which supports subcommands for NVIDIA platforms.

X

X

X

X

nvsm

REST API services for DGX System Management.

X

X

X

X

storcli

Storage Command Line Tool, manages storage controllers.

X

ubiquity

Ubuntu live CD installer.

X

X

X

X

Legend:

1:

DGX-1

2:

DGX-2

A:

DGX A100, DGX A800

H:

DGX H100/H200, DGX H800

DGX Kernel Parameters#

Parameter Name

Description

Package

Location

ast.modeset=0

Disable the Aspeed display driver. The AST2xxx is the BMC used in our servers DGX-1 and DGX-2.

nv-ast-modeset

/etc/default/grub.d/nomodeset.cfg

pci=realloc=on

Allow kernel to reallocate PCI resources if allocations done by BIOS are insufficient. This and pcie_ports=native are both required for NVME hot-plug on DGX2.

nv-enable-nvme-hot-plug

/etc/default/grub.d/enable-nvme-hot-plug.cfg

pcie_ports=native

Use Linux native services for PME, AER, DPC, PCIe hotplug, that is, not firmware first. This and pci=realloc=on are both required for NVME hot-plug on DGX2.

nv-enable-nvme-hot-plug

/etc/default/grub.d/enable-nvme-hot-plug.cfg

transparent_hugepage=madvise

Disable huge pages system-wide and only enable them inside MADV_HUGEPAGE madvise regions to prevent applications from allocating more memory resources than necessary.

nv-hugepage

/etc/default/grub.d/hugepage.cfg

iommu=pt

Enable pass through mode only and disable DMA translations. This enables optimizations for the CPU inside the DGX A100.

nv-iommu-pt

/etc/default/grub.d/iommu.cfg

crashkernel

Amount of memory to use for crash dumps.

nvidia-crashdump

/etc/default/grub.d/ipmisol.cfg

console=ttyS[0-1],115200n8

Set console to serial port 0 or 1, using 115200 baud, no parity, 8 data bits For dgx-2, dgx-h100, dgx-h800: console=ttyS0,115200n8. Other system types: console=ttyS1,115200n8

nvidia-ipmisol

kernel cmdline

net.ipv4.conf.all.arp_announce = 2

Always use the best local address for this target.

nvidia-kernel-defaults

/etc/sysctl.d/20-nvidia-defaults.conf

net.ipv4.conf.default.arp_announce = 2

Always use the best local address for this target.

nvidia-kernel-defaults

/etc/sysctl.d/20-nvidia-defaults.conf

net.ipv4.conf.all.arp_ignore = 1

Only reply to ARP requests on the interface which contains the target IP address.

nvidia-kernel-defaults

/etc/sysctl.d/20-nvidia-defaults.conf

net.ipv4.conf.default.arp_ignore = 1

Only reply to ARP requests on the interface which contains the target IP address.

nvidia-kernel-defaults

/etc/sysctl.d/20-nvidia-defaults.conf

setpci -d ::207 68.w=5000:f000

Set MaxReadReq size to 4KB for all Network (2) Infiniband (07) devices.

nvidia-mlnx-config

/etc/systemd/system/nvidia-mlnx-config.service

setpci -d ::207 68.w

Set MaxReadReq size to 4KB for all Network (2) Infiniband (07) devices.

nvidia-mlnx-config

/etc/systemd/system/nvidia-mlnx-config.service

NVreg_EnablePCIERelaxedOrderingMode=1

Set a reg-key to enable PCIe relaxed ordering in the GPUs.

nvidia-relaxed-ordering-gpu

/etc/modprobe.d/nvidia-relaxed-ordering.conf