DGX Station Software Stack#
This document outlines the software specifications for the NVIDIA DGX Station GB300 product, providing an overview of its operating system and core software stack.
The DGX Station GB300 is designed for AI researchers, data scientists, and developers who require datacenter-class Grace Blackwell performance in a desk-side form factor. It delivers exceptional performance and flexibility for the most demanding AI-driven workloads.
Overview#
The DGX Station offers a robust and versatile software environment tailored for advanced AI applications. It features a base operating system derived from Ubuntu 24.04 and integrates the NVIDIA AI stack, providing access to essential tools and libraries for AI and machine learning workflows.
System Architecture#
Operating System#
Ubuntu with NVIDIA AI Developer Tools: Ubuntu 24.04 server image with desktop packages
Kernel: Linux v6.17 with necessary patches
Boot and Hardware Enablement#
Boot Configuration
Boot Mode: UEFI (default), with USB-based boot support
Initial Setup: Configurable system settings on first boot:
Timezone, language, keyboard layout
Username, password, and hostname
Operational Modes
Desktop Mode: Standard operation with display, keyboard, and mouse
Headless Mode: Network-accessible through SSH, webserver
Firmware and Updates
BSP Firmware: NVIDIA-supported with independent OS updates
Update Methods: OEM established process
System Updates: Repository-based over-the-air (OTA) updates
Hardware Support
Storage: Four internal M.2 NVMe SSDs: two PCIe Gen5 x4 OS drives (M.2 2280) configured for software RAID 1, and two PCIe Gen6 x4 data/cache drives (M.2 2280) for high-speed storage.
System Memory: 496 GB ECC-enabled system memory using LPDDR5X SOCAMM modules.
GPU Driver: NVIDIA Open GPU Kernel driver (
nvidia-open) optimized for the Blackwell B300 GPU and the NVIDIA AI stack.USB Support: USB 3.2 driver support for:
Baseline devices
HID devices
Webcams
Networking
Ethernet: Support for a 1x 10 GbE RJ45 interface for in-band management and 2x 400 GbE QSFP ports through an NVIDIA ConnectX-8 NIC, using NVIDIA DOCA host drivers and NVIDIA networking software for Ethernet.
BMC Network: Dedicated 1 GbE RJ45 interface connected to the BMC for out-of-band management.
Wireless: WiFi support varies across OEM systems; Networking is primarily provided through wired Ethernet interfaces.
System Recovery
Re-imaging: USB boot media or BMC virtual media based recovery
Image Sources: OEM repositories
Security
Boot and Firmware Security: Secure firmware update workflow using the BMC and UEFI capsule updates through Redfish
UpdateService, with OEM-provisioned keys and images for production systems.Secure Boot and TPM: Support for UEFI Secure Boot and TPM-based attestation as provided by the NVIDIA BaseOS and platform firmware configuration.
Display and Desktop Interface#
Display Capabilities#
Video Outputs
Display output for the host operating system is provided by a PCIe add-in GPU installed in the
PCIe x16 Gen5slot.The BMC Mini DisplayPort output is reserved for BMC console access and platform management and is not used for the primary desktop display.
Audio Support
USB Audio
Bluetooth Audio
Desktop Experience#
Interface: Regular Ubuntu desktop
Pre-installed: NVIDIA Container Toolkit, NVIDIA CUDA Toolkit, Data Center GPU Manager, NVIDIA DOCA-OFED, NVIDIA GPU Driver, NVIDIA Optimized Kernel
Graphics: Ubuntu (XOrg) GUI desktop with preinstalled browser
Acceleration: Desktop and application acceleration using OpenGL/Vulkan
Video: Desktop video acceleration (nvenc/nvdec) for browsers and media players (VLC)
DRM Content Support
Browser playback in fallback resolutions
Enhanced copy protection consistent with Ubuntu 24.04 and NVIDIA GPU driver capabilities.
Performance and Power Management#
RTD3: Runtime D3 support
Power States: Product-defined PStates for optimized performance
Suspend/Resume: Basic functionality support
Software Stack#
Core AI Libraries#
NVIDIA AI Software
NCCL (NVIDIA Collective Communications Library)
cuDNN (CUDA Deep Neural Network library)
TensorRT-LLM
TensorRT
All supported toolkits and math libraries
CUDA Toolkit
CUDA 13.1
Latest fully-tested CUDA Toolkit, with CUDA examples included
Development Tools#
Linux Development Tools
build-essentials
gdb, vim
Support for C, C++, Perl, Python development
GPU Development Tools
Nsight Systems
Nsight Compute
Nsight Graphics
Nsight Deep Learning Designer
JupyterLab extensions
CUDA GDB
Container and Orchestration#
Docker Support
NVIDIA Docker containers
NVIDIA Container Runtime for Docker included
Multiple bare metal container support
Data Science and Analytics#
RAPIDS OSS project support
cuDF
cuML
cuGraph
XGBoost
Deep Learning Frameworks
vLLM
SGLang
PyTorch
TensorRT
cuDNN
Compute Support
OpenCL support included
Additional Software Support#
Omniverse: NVIDIA Omniverse support
GPU Driver: GSP-RM/OpenRM-based NVIDIA Open GPU Kernel (
nvidia-open) driver as the default configuration.
System Management#
Monitoring and Diagnostics#
System Monitoring
nvidia-smi for basic system health monitoring
GPU and CPU telemetry through system monitoring agents
Out-of-band telemetry made available through the Baseboard Management Controller
Hardware Diagnostics
Hardware error recording (FDR) for error history
Field diagnostics software for RMA flow management
Manufacturing diagnostics with external SOC support for MODS
CPU and GPU testing capabilities
Remote Management
Secure out-of-band remote management through BMC (web UI and Redfish)
Security Features#
Secure Boot and TPM
Secure boot and TPM (Trusted Platform Module) support
Default: ON
Firmware Security
Signing infrastructure for all firmware/BSP components
Secure firmware updates through BMC and UEFI
Performance Specifications#
Chip Features#
CPU Configuration
Architecture: 72-core NVIDIA Grace CPU based on ARM Neoverse V2.
CPU–GPU Topology: Grace CPU connected to a single Blackwell B300 GPU through NVIDIA NVLink Chip-to-Chip, as defined in the DGX Station GB300 reference platform.