This guide covers how rolling updates work for DynamoGraphDeployment (DGD) resources. Rolling updates allow you to update worker configurations (images, resources, environment variables, etc.) with minimal downtime by gradually replacing old pods with new ones.
The behavior of rolling updates depends on the backing resource type of your deployment. DGDs backed by Kubernetes Deployments benefit from managed rolling updates with namespace isolation, while Grove and LWS-backed deployments use their native update mechanisms.
Consider a disaggregated deployment with separate prefill and decode workers. You want to update the tensor parallelism of the decode worker to 2.
Before — original deployment:
After — updated with parallelism tuning:
Apply the update:
Monitor rolling update progress:
For DGDs backed by Grove (PodCliques, PodCliqueSets) or LWS (LeaderWorkerSets), the operator does not manage rolling updates directly. Instead, these deployments rely on the native rolling update mechanisms of their underlying resources.
maxUnavailable: 1 and maxSurge: 0. LWS follows the same maxUnavailable: 1 and maxSurge: 0 strategy.The following diagram illustrates the rolling update of the decode worker in a Grove PodCliqueSet (PCS). Only the decode PodClique is updated — the frontend and prefill PodCliques are unaffected:
Because old and new workers share the same Dynamo namespace, they are grouped together by the router. In a disaggregated setup, this can lead to cross-generation communication — for example, the router might send a request from a newly deployed prefill worker to an old decode worker (or vice versa). If the old and new versions are incompatible, this can result in errors.
For Grove and LWS deployments with disaggregated prefill/decode workers, be aware that during a rolling update, new workers may communicate with old workers. Ensure that your worker versions are backward-compatible, or consider using Deployment-backed DGDs which provide namespace isolation during updates.
Managed rolling updates with namespace isolation are planned for Grove and LWS-backed deployments in a future release. See Future Work for details.
For DGDs backed by Kubernetes Deployments (single-node, non-multinode services), the Dynamo operator implements managed rolling updates with namespace isolation. This is tracked in the DGD status and provides stronger guarantees for disaggregated deployments.
Spec change detection — The operator computes a hash of all worker service specs (prefill, decode, and worker component types). When this hash changes, a rolling update is triggered.
Namespace isolation — New worker DynamoComponentDeployments (DCDs) are created with the spec hash appended to their Dynamo namespace. This means new workers register in a different Dynamo namespace than old workers, preventing cross-generation discovery. A new prefill worker will only discover and route to new decode workers, avoiding compatibility issues.
Gradual replacement — The operator gradually scales up new worker DCDs and scales down old ones, respecting maxSurge and maxUnavailable constraints. When a worker service is updated (all new replicas are ready, all old replicas are terminated), it is marked as completed.
Cleanup — Once all worker services have completed the transition, old worker DCDs are deleted and the rolling update is marked as completed.
Only worker component types (worker, prefill, decode) participate in managed rolling updates. Non-worker components like frontend are updated in-place without namespace isolation.
The rolling update progress is tracked in .status.rollingUpdate with the following phases:
The status also tracks:
startTime — When the rolling update began.endTime — When the rolling update completed.updatedServices — List of worker services that have completed the transition.You can configure the rolling update strategy per service using annotations:
Values can be absolute integers (e.g., "1", "2") or percentages (e.g., "25%", "50%"). Percentages are resolved against the desired replica count — rounding up for maxSurge and rounding down for maxUnavailable. The operator ensures at least one of maxSurge or maxUnavailable is greater than zero to guarantee forward progress.
Example — zero-downtime update with surge capacity:
This ensures that all 4 existing prefill replicas remain available while 1 new replica is brought up at a time.
Example — fast update allowing temporary capacity reduction:
This avoids creating extra pods but allows up to 2 decode replicas to be unavailable at a time, speeding up the transition.
Worker DCDs always include a hash suffix derived from the worker specs: {dgd-name}-{service-name}-{hash} (e.g., vllm-disagg-vllmdecodeworker-a1b2c3d4). During a rolling update, the new worker DCDs are created with the new spec hash while the old DCDs retain the previous hash, allowing both generations to coexist:
vllm-disagg-vllmdecodeworker-a1b2c3d4 (previous hash)vllm-disagg-vllmdecodeworker-f5e6d7c8 (new hash)The hash is computed from a SHA-256 digest of all worker service specs (excluding non-pod-template fields like replicas, autoscaling, and ingress). This means:
The current worker hash is stored as the annotation nvidia.com/current-worker-hash on the DGD resource, and individual worker DCDs are labeled with nvidia.com/dynamo-worker-hash for filtering.
During a rolling update, the DGD status aggregates information from both old and new worker DCDs:
This provides a holistic view of the deployment’s health during the transition.
The following enhancements are planned for future releases:
maxSurge and maxUnavailable at the DGD API level, regardless of the backing resource type.