For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Digest
  • Getting Started
    • Quickstart
    • Introduction
    • Local Installation
    • Building from Source
    • Kubernetes Deployment
    • Contribution Guide
  • Resources
    • Support Matrix
    • Feature Matrix
    • Release Artifacts
    • Examples
    • Glossary
  • Digest
    • NVIDIA Dynamo Snapshot: Fast Startup for Inference Workloads on Kubernetes
    • DynoSim: Simulating the Pareto Frontier
    • Dynamo Day 0 support for TokenSpeed
    • Multi-Turn Agentic Harnesses
    • Full-Stack Optimizations for Agentic Inference
    • Flash Indexer: Inter-Galactic KV Routing
  • Kubernetes Deployment
  • Feature Guides
    • KV Cache Aware Routing
    • Disaggregated Serving
    • KV Cache Offloading
    • Benchmarking
    • Tool Calling & Reasoning Parsing
    • Fault Tolerance
    • Observability (Local)
    • Inference Simulation
    • Agents
    • LoRA Adapters
    • Multimodal
    • Diffusion
    • Fastokens Tokenizer
  • Backends
    • SGLang
    • TensorRT-LLM
    • vLLM
  • Components
    • Frontend
    • Router
    • Planner
    • Profiler
    • KVBM
  • Integrations
      • LWS
      • Gateway API Inference Extension (GAIE)
  • Design Docs
    • Overall Architecture
    • Architecture Flow
    • Disaggregated Serving
    • Distributed Runtime
  • Documentation
    • Dynamo Docs Guide
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoDocumentation
Digest
On this page
  • Prerequisites
  • Orchestrator Selection
  • Multinode Spec
  • Backend Behavior
IntegrationsKubernetes Integrations

LWS

LeaderWorkerSet integration for multinode Dynamo deployments
||View as Markdown|
Previous

KV Events for Custom Engines

Next

Gateway API Inference Extension (GAIE)

Dynamo can use LeaderWorkerSet (LWS) as the Kubernetes orchestration layer for multinode workloads. LWS is the lightweight path for spanning one Dynamo worker service across multiple nodes; Dynamo pairs it with Volcano for gang scheduling.

Use LWS when you want a simpler multinode orchestrator than Grove, or when your cluster already standardizes on LWS and Volcano. Grove remains the default when both Grove and LWS are available.

Prerequisites

  • Kubernetes cluster with GPU nodes.
  • LWS version 0.7.0 or newer.
  • Volcano installed for gang scheduling.
  • Dynamo Kubernetes Platform installed.

The installation guide includes the exact Helm commands for LWS and Volcano.

Orchestrator Selection

For multinode deployments, the Dynamo operator selects an orchestrator based on what is installed:

Cluster stateOperator behavior
Grove and LWS installedUses Grove by default.
Grove and LWS installed, DGD has nvidia.com/enable-grove: "false"Uses LWS.
Only LWS installedUses LWS.
Neither Grove nor LWS installedRejects multinode deployments.

To force the LWS path when Grove is also present:

1apiVersion: nvidia.com/v1alpha1
2kind: DynamoGraphDeployment
3metadata:
4 name: my-multinode-deployment
5 annotations:
6 nvidia.com/enable-grove: "false"
7spec:
8 # ...

Multinode Spec

Set multinode.nodeCount on the service that should span nodes. The total GPU count is multinode.nodeCount multiplied by the per-node GPU limit:

1apiVersion: nvidia.com/v1alpha1
2kind: DynamoGraphDeployment
3metadata:
4 name: qwen3-multinode
5 annotations:
6 nvidia.com/enable-grove: "false"
7spec:
8 services:
9 backend:
10 multinode:
11 nodeCount: 2
12 resources:
13 limits:
14 gpu: "4"
15 extraPodSpec:
16 mainContainer:
17 args:
18 - "--tp-size"
19 - "8"

In this example, Dynamo asks LWS to place the backend across 2 nodes with 4 GPUs per node, for 8 GPUs total. Make sure your backend’s tensor parallel or distributed execution flags match that total.

Backend Behavior

The operator injects backend-specific multinode settings into the generated LeaderWorkerSet:

BackendLWS behavior
vLLMUses Ray for multi-node tensor or pipeline parallelism, and injects data-parallel flags for DP deployments.
SGLangInjects --dist-init-addr, --nnodes, and per-node --node-rank.
TensorRT-LLMWraps the leader command with mpirun and configures worker nodes with SSH.

For detailed backend-specific behavior and examples, see the Multinode Deployments guide.