> For clean Markdown content of this page, append .md to this URL. For the complete documentation index, see https://docs.nvidia.com/dynamo/llms.txt. For full content including API reference and SDK examples, see https://docs.nvidia.com/dynamo/llms-full.txt.

# NVIDIA Dynamo Snapshot: Fast Startup for Inference Workloads on Kubernetes

> NVIDIA Dynamo Snapshot combines CUDA and host checkpointing to restore warm inference workers quickly on Kubernetes.

![Kubernetes checkpoint and restore lifecycle with NVIDIA Dynamo Snapshot.](https://files.buildwithfern.com/dynamo.docs.buildwithfern.com/dynamo/d4167d87ad413eab02b80d20dec1ac3a3b2382ba6e37695f4430b8e65c735b69/digest/snapshot/dynamo-snapshot-lifecycle.webp)

Cold-starting inference replicas on Kubernetes can take minutes while engines load weights, warm kernels, and compile graphs. In our blog post, [NVIDIA Dynamo Snapshot: Fast Startup for Inference Workloads on Kubernetes](https://developer.nvidia.com/blog/nvidia-dynamo-snapshot-fast-startup-for-inference-workloads-on-kubernetes/), we introduce Dynamo Snapshot, a checkpoint/restore approach that combines `cuda-checkpoint`, CRIU, and a privileged `snapshot-agent` DaemonSet to restore warm workers from shared storage. We also walk through KV cache unmapping, CRIU restore optimizations, and GPU Memory Service (GMS), which bring the `gpt-oss-120b` prototype below five seconds and reduce startup time by 21x.