LWS | NVIDIA Dynamo Documentation

Dynamo can use LeaderWorkerSet (LWS) as the Kubernetes orchestration layer for multinode workloads. LWS is the lightweight path for spanning one Dynamo worker service across multiple nodes; Dynamo pairs it with Volcano for gang scheduling.

Use LWS when you want a simpler multinode orchestrator than Grove, or when your cluster already standardizes on LWS and Volcano. Grove remains the default when both Grove and LWS are available.

Prerequisites

Kubernetes cluster with GPU nodes.
LWS version 0.7.0 or newer.
Volcano installed for gang scheduling.
Dynamo Kubernetes Platform installed.

The installation guide includes the exact Helm commands for LWS and Volcano.

Orchestrator Selection

For multinode deployments, the Dynamo operator selects an orchestrator based on what is installed:

Cluster state	Operator behavior
Grove and LWS installed	Uses Grove by default.
Grove and LWS installed, DGD has `nvidia.com/enable-grove: "false"`	Uses LWS.
Only LWS installed	Uses LWS.
Neither Grove nor LWS installed	Rejects multinode deployments.

To force the LWS path when Grove is also present:

1 apiVersion: nvidia.com/v1alpha1
2 kind: DynamoGraphDeployment
3 metadata:
4   name: my-multinode-deployment
5   annotations:
6     nvidia.com/enable-grove: "false"
7 spec:
8   # ...

Multinode Spec

Set multinode.nodeCount on the service that should span nodes. The total GPU count is multinode.nodeCount multiplied by the per-node GPU limit:

1 apiVersion: nvidia.com/v1alpha1
2 kind: DynamoGraphDeployment
3 metadata:
4   name: qwen3-multinode
5   annotations:
6     nvidia.com/enable-grove: "false"
7 spec:
8   services:
9     backend:
10       multinode:
11         nodeCount: 2
12       resources:
13         limits:
14           gpu: "4"
15       extraPodSpec:
16         mainContainer:
17           args:
18             - "--tp-size"
19             - "8"

In this example, Dynamo asks LWS to place the backend across 2 nodes with 4 GPUs per node, for 8 GPUs total. Make sure your backend’s tensor parallel or distributed execution flags match that total.

Backend Behavior

The operator injects backend-specific multinode settings into the generated LeaderWorkerSet:

Backend	LWS behavior
vLLM	Uses Ray for multi-node tensor or pipeline parallelism, and injects data-parallel flags for DP deployments.
SGLang	Injects `--dist-init-addr`, `--nnodes`, and per-node `--node-rank`.
TensorRT-LLM	Wraps the leader command with `mpirun` and configures worker nodes with SSH.

For detailed backend-specific behavior and examples, see the Multinode Deployments guide.