Is this page helpful?

NVIDIA DGX Station Development Guide#

This guide describes NVIDIA DGX Station, a deskside AI computer built on the NVIDIA GB300 Grace Blackwell Desktop Superchip with up to 748 GB of coherent memory for developing and running large-scale AI workloads locally.

What is DGX Station?#

As the AI revolution continues to spread across numerous industry domains, the technologies behind it are driving the evolution of the AI developer landscape. The availability of hundreds of thousands of publicly available AI models makes it easier for AI developers to prototype, experiment, and deploy AI-powered solutions.

Demand for AI at the edge is requiring developers to create small language AI models, quantize larger models, and even fine-tune models for domain-specific use cases. The popularity of multimodal AI models that can understand and process information from various types of data such as text, speech, images, and videos is rapidly growing. Recently introduced reasoning models, which can generate internal chains of thought before producing answers, require more compute performance.

Agentic solutions can execute complex multi-step tasks, often relying on several AI models that need to run concurrently. The common thread across the AI developer landscape is the need for computing solutions necessary to drive the diverse set of workflows in the cloud, in on-prem servers and workstations, and on local machines at the edge.

As developers navigate the continually evolving AI technology landscape, the NVIDIA CUDA-powered AI Software stack provides the common foundation for computing devices, from the cloud to the edge, to drive innovation and seamless transition across computing platforms. But challenges for AI developers still remain.

Developers often use desktop and laptop computers as their primary development platform and are presented with two main issues: lack of local GPU memory and compute required for their AI workflow, or lack of access to the software required to complete their work. When developers run into these issues, they are often forced to shift work to the cloud or a shared datacenter. This creates two major problems: first, it pulls costly and scarce GPU resources away from production workloads, making development and prototyping slow and expensive; second, many researchers, developers, or small startups do not have reliable access to the identical compute architecture that will ultimately run their workloads in production. To address these challenges, developers need a new class of accessible AI computers purpose-built for building and running AI locally.

DGX Station - Designed to Build and Run AI#

NVIDIA DGX™ Station defines a new class of computing devices designed to build and run AI. Powered by the NVIDIA B300 Blackwell GPU connected to a Grace CPU through a C2C link, NVIDIA DGX Station delivers up to 20 petaFLOPs of AI performance for demanding AI workloads. With up to 748 GB of coherent system memory (up to 252 GB of HBM3e and 496 GB of the CPU LPDDR5x), developers can experiment, fine-tune, or run inference on large model sizes. They can even run multiple different models concurrently to create interesting agentic AI applications. The included NVIDIA ConnectX™ 8 SuperNIC enables connecting multiple NVIDIA DGX Station systems to take on even bigger workloads and build advanced distributed compute applications.

NVIDIA DGX Station supports the same software stack that runs on datacenter GPUs already familiar to most AI developers. The operating system, Ubuntu with NVIDIA AI Developer Tools, comes preconfigured with CUDA, cuDNN, TensorRT, container toolkit, and many more standard components of the NVIDIA AI software stack. Developers can seamlessly hit the ground running out-of-the-box using common tools and libraries such as PyTorch, Jupyter, vLLM, SG Lang and Ollama to prototype, fine-tune, and run inference.

Compatibility with various domain-specific software platforms such as NVIDIA Omniverse, NVIDIA Isaac, RAPIDs, and others extends the capabilities of DGX Station as a development platform in the areas of robotics, vision processing, data science, and others.

Built on NVIDIA Grace Blackwell Architecture#

At the heart of DGX Station is the GB300 Grace Blackwell Desktop Superchip. It combines a server-class Blackwell Ultra GPU with fifth-generation Tensor Cores and a 72-core Grace Arm CPU, delivering up to 20 petaFLOPs of sparse FP4 AI compute. Support for new precision formats such as NVFP4 enables developers to unlock greater performance and experiment with lower precisions before taking their AI recipes to AI Factories and Cloud Infrastructure for large-scale deployments. The CPU and GPU are connected through NVLink™-C2C, creating a coherent memory model with up to 5x higher bandwidth than PCIe Gen 5. This design accelerates preprocessing, orchestration, and real-time inferencing in a single, tightly integrated system. The GPU’s 252 GB of HBM3e delivers up to 7.1 TB/s of GPU memory bandwidth, while the 496 GB of LPDDR5X CPU memory delivers up to 396 GB/s of system memory bandwidth, and the dedicated NVIDIA ConnectX-8 chip in DGX Station delivers 800 Gbps of connectivity when paired with a second DGX Station.

Up to 748 GB of Coherent Memory for up to 1T Parameter Models#

DGX Station is equipped with up to 748 GB of coherent system memory enabling developers to run even large LLMs locally (such as Deepseek-R1, GPT-OSS-120B, and Qwen2.5-235B). Developers can now locally run inference on models as large as 1T parameters and perform full fine-tuning locally.

Beyond raw compute, DGX Station is compatible with a large ecosystem of tools: from leading frameworks like PyTorch, TensorFlow, and Hugging Face Transformers, to optimized inference engines such as SGLang and vLLM, to fine-tuning frameworks like LLaMA-Factory and Unsloth, as well as agent frameworks, local inference apps, and end-to-end DevOps tooling.

For workloads that require even larger models, DGX Station includes NVIDIA ConnectX™ networking, allowing two systems to be clustered together for seamless multi-node scaling. This enables developers to scale beyond a single node to handle larger models and more demanding tasks, while maintaining a consistent software environment.

In addition, the Grace Blackwell architecture introduces new optimizations such as NVFP4 precision and speculative decoding, which improve efficiency and responsiveness when running large-scale generative AI models. These advances allow developers to prototype and iterate quickly, while preserving the ability to scale seamlessly to larger NVIDIA platforms.

Bringing Blackwell Architecture from the Cloud to Your Desk#

DGX Station provides developers with a high-performance deskside AI supercomputer for locally prototyping models and AI applications, freeing up valuable compute resources in their cluster environments better suited for training and deploying production models. As prototypes mature, developers can easily connect to a cloud instance through NVIDIA Sync and continue their work on the same NVIDIA Blackwell architecture.