DGX Software Stack#
NVIDIA DGX Software Packages#
The following tables list the packages installed as part of the DGX Software Stack, broken out by metapackage name and platform.
DGX A100 and DGX A800 |
DGX Station A100 and DGX Station A800 |
DGX H100/H200 and DGX H800 |
---|---|---|
dgx-a100-system-configurations
dgx-a800-system-configurations
|
dgxstation-a100-system-configurations
dgxstation-a800-system-configurations
|
dgx-h100-system-configurations
dgx-h200-system-configurations
dgx-h800-system-configurations
|
dgx-release |
dgx-release |
dgx-release |
nv-cpu-governor |
nv-cpu-governor |
nv-cpu-governor |
nv-hugepage |
nv-hugepage |
nv-hugepage |
nv-iommu-pt |
nv-iommu-pt |
nv-iommu-pt |
nv-ipmi-devintf |
nv-ipmi-devintf |
nv-ipmi-devintf |
nv-limits |
nv-limits |
nv-limits |
nv-update-disable |
nv-update-disable |
nv-update-disable |
nvgpu-services-list |
nvgpu-services-list |
nvgpu-services-list |
nvidia-acs-disable |
nvidia-acs-disable |
|
nvidia-crashdump |
nvidia-crashdump |
nvidia-crashdump |
nvidia-disable-opensm |
nvidia-disable-opensm |
nvidia-disable-opensm |
nvidia-esm-hook-epilogue |
nvidia-esm-hook-epilogue |
nvidia-esm-hook-epilogue |
nvidia-fs-loader |
nvidia-fs-loader |
nvidia-fs-loader |
nvidia-kbd-udev |
nvidia-kbd-udev |
nvidia-kbd-udev |
nvidia-kernel-defaults |
nvidia-kernel-defaults |
nvidia-kernel-defaults |
nvidia-mlnx-ofed-netdev-rename |
nvidia-mlnx-ofed-netdev-rename |
nvidia-mlnx-ofed-netdev-rename |
nvidia-nvme-smartd |
nvidia-nvme-smartd |
nvidia-nvme-smartd |
nvidia-pci-bridge-power |
nvidia-pci-bridge-power |
nvidia-pci-bridge-power |
nvidia-pci-no-realloc |
||
nvidia-redfish-config |
nvidia-redfish-config |
nvidia-redfish-config |
nvidia-relaxed-ordering-gpu |
nvidia-relaxed-ordering-gpu |
|
nvidia-relaxed-ordering-nvme |
nvidia-relaxed-ordering-nvme |
nvidia-relaxed-ordering-nvme |
dgx-a100-system-tools
dgx-a800-system-tools
|
dgxstation-a100-system-tools
dgxstation-a800-system-tools
|
dgx-h100-system-tools
dgx-h200-system-tools
dgx-h800-system-tools
|
dgx-release |
dgx-release |
dgx-release |
ipmitool |
ipmitool |
ipmitool |
nv-common-apis |
nv-common-apis |
nv-common-apis |
nv-env-paths |
nv-env-paths |
nv-env-paths |
nvdebug |
||
nvidia-mig-manager |
nvidia-mig-manager |
nvidia-mig-manager |
nvidia-raid-config |
nvidia-raid-config |
nvidia-raid-config |
nvme-cli |
nvme-cli |
nvme-cli |
tpm2-tools |
tpm2-tools |
tpm2-tools |
dgx-a100-system-tools-extra
dgx-a800-system-tools-extra
|
dgxstation-a100-system-tools-extra
dgxstation-a800-system-tools-extra
|
dgx-h100-system-tools-extra
dgx-h200-system-tools-extra
dgx-h800-system-tools-extra
|
msecli |
msecli |
msecli |
DGX-1 |
DGX-2 |
DGX Station |
---|---|---|
dgx1-system-configurations
|
dgx2-system-configurations
|
dgxstation-system-configurations
|
dgx-release |
dgx-release |
dgx-release |
nv-ast-modeset |
||
nv-cpu-governor |
nv-cpu-governor |
|
nv-enable-nvme-hot-plug |
||
nv-hugepage |
nv-hugepage |
nv-hugepage |
nv-ipmi-devintf |
nv-ipmi-devintf |
|
nv-limits |
nv-limits |
nv-limits |
nv-update-disable |
nv-update-disable |
nv-update-disable |
nvgpu-services-list |
nvgpu-services-list |
nvgpu-services-list |
nvidia-crashdump |
nvidia-crashdump |
nvidia-crashdump |
nvidia-disable-opensm |
nvidia-disable-opensm |
nvidia-disable-opensm |
nvidia-esm-hook-epilogue |
nvidia-esm-hook-epilogue |
nvidia-esm-hook-epilogue |
nvidia-fs-loader |
nvidia-fs-loader |
nvidia-fs-loader |
nvidia-kbd-udev |
nvidia-kbd-udev |
nvidia-kbd-udev |
nvidia-kernel-defaults |
nvidia-kernel-defaults |
nvidia-kernel-defaults |
nvidia-mlnx-ofed-netdev-rename |
nvidia-mlnx-ofed-netdev-rename |
nvidia-mlnx-ofed-netdev-rename |
nvidia-nvme-smartd |
||
nvidia-pci-bridge-power |
nvidia-pci-bridge-power |
|
dgx1-system-tools
|
dgx2-system-tools
|
dgxstation-system-tools
|
dgx-release |
dgx-release |
dgx-release |
ipmitool |
ipmitool |
|
nv-common-apis |
nv-common-apis |
nv-common-apis |
nv-env-paths |
nv-env-paths |
nv-env-paths |
nvidia-raid-config |
nvidia-raid-config |
|
nvme-cli |
||
tpm-tools |
||
dgx1-system-tools-extra
|
dgx2-system-tools-extra
|
dgxstation-system-tools-extra
|
msecli |
||
nvidia-raid-config |
||
storcli |
The following packages are installed by the nvidia-mlnx-ofed-misc metapackage:
mlnx-fw-updater
mlnx-pxe-setup
nvidia-mlnx-config
nvidia-peermem-loader
The following additional packages are part of the DGX Software Stack:
nv-docker-options
nvidia-logrotate
nvidia-motd
nvidia-ipmisol
Base OS 6.3.1 Installed Packages#
The following table lists all packages that will be installed as part of the system configuration package with more details:
Package Name |
Description |
1 |
2 |
A |
H |
---|---|---|---|---|---|
containerd.io |
An open and reliable container runtime. |
X |
X |
X |
X |
cuda-compute-repo |
CUDA compute repository configuration files. |
X |
X |
X |
X |
cuda-nvml-dev-12-4 |
NVML native dev links, headers. |
X |
X |
X |
X |
dgx-release |
Package updates the DGX OS release information. |
X |
X |
X |
X |
dgx-repo |
DGX repository configuration files. |
X |
X |
X |
X |
dgx-server-grub |
DGX Server grub customizer. |
X |
X |
X |
X |
docker-ce |
Docker. |
X |
X |
X |
X |
hpc-sdk-repo |
NVIDIA HPC SDK repository configuration files. |
X |
X |
X |
X |
mlnx-pxe-setup |
Provide a script to enable PXE booting using Mellanox cards. |
X |
X |
X |
X |
msecli |
Micron Storage Executive CLI. |
X |
X |
X |
|
nv-ast-modeset |
Disable ast driver during boot. |
X |
|||
nv-common-apis |
Install commonly used scripts used by Nvidia systems. |
X |
X |
X |
X |
nv-cpu-governor |
Set CPU governor to performance. |
X |
X |
X |
X |
nv-docker-options |
Docker daemon options. |
X |
X |
X |
X |
nv-enable-nvme-hot-plug |
Set PCIe kernel parameters during boot. |
X |
|||
nv-env-paths |
Configure PATH variable. |
X |
X |
X |
X |
nv-hugepage |
Enable transparent huge pages. |
X |
X |
X |
X |
nv-iommu-pt |
Enable iommu in passthrough mode. |
X |
X |
||
nv-ipmi-devintf |
Load the ipmi_devintf module. |
X |
X |
X |
X |
nv-limits |
Increase the file limit. |
X |
X |
X |
X |
nv-persistence-mode |
Enable persistence mode. |
X |
X |
X |
X |
nv-update-disable |
Disable OS update prompt. |
X |
X |
X |
X |
nvdebug |
NVIDIA Debug tool. |
X |
|||
nvgpu-services-list |
List of all GPU-related services. |
X |
X |
X |
X |
nvidia-acs-disable |
Disable the PCIe ACS capability. |
X |
X |
||
nvidia-chardev-links |
Install udev rule that creates symlinks to NVIDIA devices. |
X |
X |
X |
X |
nvidia-conf-cachefilesd |
Systemd settings for cachefilesd. |
X |
X |
X |
X |
nvidia-crashdump |
NVIDIA crash dump policy. |
X |
X |
X |
X |
nvidia-disable-opensm |
Disable opensm. |
X |
X |
X |
X |
nvidia-esm-hook-epilogue |
NVIDIA package to clarify ESM policy. |
X |
X |
X |
X |
nvidia-fs-loader |
Load the nvidia-fs module. |
X |
X |
X |
X |
nvidia-ipmisol |
Enable IPMI Serial-over-LAN. |
X |
X |
X |
X |
nvidia-kbd-udev |
Enable caps lock indicator on BMC virtual console. |
X |
X |
X |
X |
nvidia-kernel-defaults |
sysctl default kernel settings for DGX. |
X |
X |
X |
X |
nvidia-logrotate |
NVIDIA logrotate policy. |
X |
X |
X |
X |
nvidia-manage-ofed |
Tool to manage OFED installations. |
X |
X |
X |
X |
nvidia-mig-manager |
NVIDIA MIG Partition Editor and Systemd Service. |
X |
X |
||
nvidia-mlnx-config |
Configure the MLNX devices. |
X |
X |
X |
X |
nvidia-mlnx-names |
Change the device names of Mellanox devices. |
X |
X |
X |
X |
nvidia-mlnx-ofed-netdev-rename |
Reset mlnx enp* devices back to their original names. |
X |
X |
X |
X |
nvidia-motd |
Custom motd files for NVIDIA platforms. |
X |
X |
X |
X |
nvidia-mstflint-loader |
Load the mstflint-access module. |
X |
X |
X |
X |
nvidia-nvme-smartd |
Enable SMART monitoring on NVME devices. |
X |
X |
X |
|
nvidia-oem-config-bmc |
Ubiquity plugin to configure BMC on NVIDIA platforms. |
X |
X |
X |
X |
nvidia-oem-config-crypt-passwd |
Ubiquity plugin to reset crypt password. |
X |
X |
X |
X |
nvidia-oem-config-eula |
Ubiquity plugin to display EULA. |
X |
X |
X |
X |
nvidia-oem-config-grub-passwd |
Ubiquity plugin to configure GRUB password on NVIDIA platforms. |
X |
X |
X |
X |
nvidia-oem-config-postact |
Ubiquity plugin to complete final actions before booting. |
X |
X |
X |
X |
nvidia-pci-bridge-power |
Set PCI bridge power control to on. |
X |
X |
X |
X |
nvidia-pci-no-realloc |
Disable PCI resource reallocation. |
X |
|||
nvidia-peermem-loader |
Load the nvidia-peermem module. |
X |
X |
X |
X |
nvidia-raid-config |
DGX RAID Configuration. |
X |
X |
X |
X |
nvidia-redfish-config |
Configure Redfish Host Interface. |
X |
X |
||
nvidia-relaxed-ordering-gpu |
Configure PCIe Relaxed Ordering. |
X |
|||
nvidia-relaxed-ordering-nvme |
Configure PCIe Relaxed Ordering. |
X |
X |
||
nvidia-repo-keys |
Add keys to apt trusted.gpg database. |
X |
X |
X |
X |
nvidia-systemd-reorder |
Fixe the start-up order for NVIDIA services. |
X |
X |
X |
|
nvipmitool |
NVIDIA customizes ipmitool, which supports subcommands for NVIDIA platforms. |
X |
X |
X |
X |
nvsm |
REST API services for DGX System Management. |
X |
X |
X |
X |
storcli |
Storage Command Line Tool, manages storage controllers. |
X |
|||
ubiquity |
Ubuntu live CD installer. |
X |
X |
X |
X |
Legend:
- 1:
DGX-1
- 2:
DGX-2
- A:
DGX A100, DGX A800
- H:
DGX H100/H200, DGX H800
DGX Kernel Parameters#
Parameter Name |
Description |
Package |
Location |
---|---|---|---|
ast.modeset=0 |
Disable the Aspeed display driver. The AST2xxx is the BMC used in our servers DGX-1 and DGX-2. |
nv-ast-modeset |
/etc/default/grub.d/nomodeset.cfg |
pci=realloc=on |
Allow kernel to reallocate PCI resources if allocations done by BIOS are insufficient. This and pcie_ports=native are both required for NVME hot-plug on DGX2. |
nv-enable-nvme-hot-plug |
/etc/default/grub.d/enable-nvme-hot-plug.cfg |
pcie_ports=native |
Use Linux native services for PME, AER, DPC, PCIe hotplug, that is, not firmware first. This and pci=realloc=on are both required for NVME hot-plug on DGX2. |
nv-enable-nvme-hot-plug |
/etc/default/grub.d/enable-nvme-hot-plug.cfg |
transparent_hugepage=madvise |
Disable huge pages system-wide and only enable them inside MADV_HUGEPAGE madvise regions to prevent applications from allocating more memory resources than necessary. |
nv-hugepage |
/etc/default/grub.d/hugepage.cfg |
iommu=pt |
Enable pass through mode only and disable DMA translations. This enables optimizations for the CPU inside the DGX A100. |
nv-iommu-pt |
/etc/default/grub.d/iommu.cfg |
crashkernel |
Amount of memory to use for crash dumps. |
nvidia-crashdump |
/etc/default/grub.d/ipmisol.cfg |
console=ttyS[0-1],115200n8 |
Set console to serial port 0 or 1, using 115200 baud, no parity, 8 data bits For dgx-2, dgx-h100, dgx-h800: console=ttyS0,115200n8. Other system types: console=ttyS1,115200n8 |
nvidia-ipmisol |
kernel cmdline |
net.ipv4.conf.all.arp_announce = 2 |
Always use the best local address for this target. |
nvidia-kernel-defaults |
/etc/sysctl.d/20-nvidia-defaults.conf |
net.ipv4.conf.default.arp_announce = 2 |
Always use the best local address for this target. |
nvidia-kernel-defaults |
/etc/sysctl.d/20-nvidia-defaults.conf |
net.ipv4.conf.all.arp_ignore = 1 |
Only reply to ARP requests on the interface which contains the target IP address. |
nvidia-kernel-defaults |
/etc/sysctl.d/20-nvidia-defaults.conf |
net.ipv4.conf.default.arp_ignore = 1 |
Only reply to ARP requests on the interface which contains the target IP address. |
nvidia-kernel-defaults |
/etc/sysctl.d/20-nvidia-defaults.conf |
setpci -d ::207 68.w=5000:f000 |
Set MaxReadReq size to 4KB for all Network (2) Infiniband (07) devices. |
nvidia-mlnx-config |
/etc/systemd/system/nvidia-mlnx-config.service |
setpci -d ::207 68.w |
Set MaxReadReq size to 4KB for all Network (2) Infiniband (07) devices. |
nvidia-mlnx-config |
/etc/systemd/system/nvidia-mlnx-config.service |
NVreg_EnablePCIERelaxedOrderingMode=1 |
Set a reg-key to enable PCIe relaxed ordering in the GPUs. |
nvidia-relaxed-ordering-gpu |
/etc/modprobe.d/nvidia-relaxed-ordering.conf |