DGX Station Software Stack#

This document outlines the software specifications for the NVIDIA DGX Station GB300 product, providing an overview of its operating system and core software stack.

The DGX Station GB300 is designed for AI researchers, data scientists, and developers who require datacenter-class Grace Blackwell performance in a desk-side form factor. It delivers exceptional performance and flexibility for the most demanding AI-driven workloads.

Overview#

The DGX Station offers a robust and versatile software environment tailored for advanced AI applications. It features a base operating system derived from Ubuntu 24.04 and integrates the NVIDIA AI stack, providing access to essential tools and libraries for AI and machine learning workflows.

System Architecture#

Operating System#

  • Ubuntu with NVIDIA AI Developer Tools: Ubuntu 24.04 server image with desktop packages

  • Kernel: Linux v6.17 with necessary patches

Boot and Hardware Enablement#

Boot Configuration

  • Boot Mode: UEFI (default), with USB-based boot support

  • Initial Setup: Configurable system settings on first boot:

    • Timezone, language, keyboard layout

    • Username, password, and hostname

Operational Modes

  • Desktop Mode: Standard operation with display, keyboard, and mouse

  • Headless Mode: Network-accessible through SSH, webserver

Firmware and Updates

  • BSP Firmware: NVIDIA-supported with independent OS updates

  • Update Methods: OEM established process

  • System Updates: Repository-based over-the-air (OTA) updates

Hardware Support

  • Storage: Four internal M.2 NVMe SSDs: two PCIe Gen5 x4 OS drives (M.2 2280) configured for software RAID 1, and two PCIe Gen6 x4 data/cache drives (M.2 2280) for high-speed storage.

  • System Memory: 496 GB ECC-enabled system memory using LPDDR5X SOCAMM modules.

  • GPU Driver: NVIDIA Open GPU Kernel driver (nvidia-open) optimized for the Blackwell B300 GPU and the NVIDIA AI stack.

  • USB Support: USB 3.2 driver support for:

    • Baseline devices

    • HID devices

    • Webcams

Networking

  • Ethernet: Support for a 1x 10 GbE RJ45 interface for in-band management and 2x 400 GbE QSFP ports through an NVIDIA ConnectX-8 NIC, using NVIDIA DOCA host drivers and NVIDIA networking software for Ethernet.

  • BMC Network: Dedicated 1 GbE RJ45 interface connected to the BMC for out-of-band management.

  • Wireless: WiFi support varies across OEM systems; Networking is primarily provided through wired Ethernet interfaces.

System Recovery

  • Re-imaging: USB boot media or BMC virtual media based recovery

  • Image Sources: OEM repositories

Security

  • Boot and Firmware Security: Secure firmware update workflow using the BMC and UEFI capsule updates through Redfish UpdateService, with OEM-provisioned keys and images for production systems.

  • Secure Boot and TPM: Support for UEFI Secure Boot and TPM-based attestation as provided by the NVIDIA BaseOS and platform firmware configuration.

Display and Desktop Interface#

Display Capabilities#

Video Outputs

  • Display output for the host operating system is provided by a PCIe add-in GPU installed in the PCIe x16 Gen5 slot.

  • The BMC Mini DisplayPort output is reserved for BMC console access and platform management and is not used for the primary desktop display.

Audio Support

  • USB Audio

  • Bluetooth Audio

Desktop Experience#

  • Interface: Regular Ubuntu desktop

  • Pre-installed: NVIDIA Container Toolkit, NVIDIA CUDA Toolkit, Data Center GPU Manager, NVIDIA DOCA-OFED, NVIDIA GPU Driver, NVIDIA Optimized Kernel

  • Graphics: Ubuntu (XOrg) GUI desktop with preinstalled browser

  • Acceleration: Desktop and application acceleration using OpenGL/Vulkan

  • Video: Desktop video acceleration (nvenc/nvdec) for browsers and media players (VLC)

DRM Content Support

  • Browser playback in fallback resolutions

  • Enhanced copy protection consistent with Ubuntu 24.04 and NVIDIA GPU driver capabilities.

Performance and Power Management#

  • RTD3: Runtime D3 support

  • Power States: Product-defined PStates for optimized performance

  • Suspend/Resume: Basic functionality support

Software Stack#

Core AI Libraries#

NVIDIA AI Software

  • NCCL (NVIDIA Collective Communications Library)

  • cuDNN (CUDA Deep Neural Network library)

  • TensorRT-LLM

  • TensorRT

  • All supported toolkits and math libraries

CUDA Toolkit

  • CUDA 13.1

  • Latest fully-tested CUDA Toolkit, with CUDA examples included

Development Tools#

Linux Development Tools

  • build-essentials

  • gdb, vim

  • Support for C, C++, Perl, Python development

GPU Development Tools

  • Nsight Systems

  • Nsight Compute

  • Nsight Graphics

  • Nsight Deep Learning Designer

  • JupyterLab extensions

  • CUDA GDB

Container and Orchestration#

Docker Support

  • NVIDIA Docker containers

  • NVIDIA Container Runtime for Docker included

  • Multiple bare metal container support

Data Science and Analytics#

RAPIDS OSS project support

  • cuDF

  • cuML

  • cuGraph

  • XGBoost

Deep Learning Frameworks

  • vLLM

  • SGLang

  • PyTorch

  • TensorRT

  • cuDNN

Compute Support

  • OpenCL support included

Additional Software Support#

  • Omniverse: NVIDIA Omniverse support

  • GPU Driver: GSP-RM/OpenRM-based NVIDIA Open GPU Kernel (nvidia-open) driver as the default configuration.

System Management#

Monitoring and Diagnostics#

System Monitoring

  • nvidia-smi for basic system health monitoring

  • GPU and CPU telemetry through system monitoring agents

  • Out-of-band telemetry made available through the Baseboard Management Controller

Hardware Diagnostics

  • Hardware error recording (FDR) for error history

  • Field diagnostics software for RMA flow management

  • Manufacturing diagnostics with external SOC support for MODS

  • CPU and GPU testing capabilities

Remote Management

  • Secure out-of-band remote management through BMC (web UI and Redfish)

Security Features#

Secure Boot and TPM

  • Secure boot and TPM (Trusted Platform Module) support

  • Default: ON

Firmware Security

  • Signing infrastructure for all firmware/BSP components

  • Secure firmware updates through BMC and UEFI

Performance Specifications#

Chip Features#

CPU Configuration

  • Architecture: 72-core NVIDIA Grace CPU based on ARM Neoverse V2.

  • CPU–GPU Topology: Grace CPU connected to a single Blackwell B300 GPU through NVIDIA NVLink Chip-to-Chip, as defined in the DGX Station GB300 reference platform.