For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Digest
  • Getting Started
    • Quickstart
    • Introduction
    • Local Installation
    • Building from Source
    • Kubernetes Deployment
    • Contribution Guide
  • Resources
    • Support Matrix
    • Feature Matrix
    • Release Artifacts
    • Examples
    • Glossary
  • Digest
    • NVIDIA Dynamo Snapshot: Fast Startup for Inference Workloads on Kubernetes
    • DynoSim: Simulating the Pareto Frontier
    • Dynamo Day 0 support for TokenSpeed
    • Multi-Turn Agentic Harnesses
    • Full-Stack Optimizations for Agentic Inference
    • Flash Indexer: Inter-Galactic KV Routing
  • Kubernetes Deployment
  • Feature Guides
    • KV Cache Aware Routing
    • Disaggregated Serving
    • KV Cache Offloading
    • Tool Call and Reasoning Parsing
    • Benchmarking
    • Fault Tolerance
    • Observability (Local)
    • Inference Simulation
      • Live Simulation with Mocker
      • Simulation Runs
      • Simulation Sweeps
      • Planner Simulation Benchmarking
    • Agents
    • LoRA Adapters
    • Multimodal
    • Diffusion
    • Fastokens Tokenizer
  • Backends
    • SGLang
    • TensorRT-LLM
    • vLLM
  • Components
    • Frontend
    • Router
    • Planner
    • Profiler
    • KVBM
  • Integrations
  • Design Docs
    • Overall Architecture
    • Architecture Flow
    • Disaggregated Serving
    • Distributed Runtime
  • Documentation
    • Dynamo Docs Guide
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoDocumentation
Digest
On this page
  • Components
  • Workflow
  • Where AIC Fits
  • Choosing an Entry Point
Feature Guides

DynoSim

Simulate Dynamo deployment choices before spending GPU time
||View as Markdown|
Previous

Logging

Next

Live Simulation with Mocker

DynoSim is Dynamo’s simulation stack for exploring serving configurations before validating them on real clusters. It is not a separate service; it is the product surface that connects workload-driven simulation runs, configuration sweeps, the mocker engine, Planner simulation, Router simulation, and AIC-backed timing models into one workflow.

Use DynoSim when you want to answer questions such as:

  • Which aggregated or disaggregated topology should this workload use?
  • How many prefill and decode workers fit within my GPU budget?
  • How sensitive is the deployment to startup time, queue pressure, prefix reuse, or router tuning?
  • Which candidates should I validate with AIPerf on real GPUs?

Components

ComponentEntry PointRole
DynoSim runpython -m dynamo.replayRuns one workload against one simulated Dynamo configuration and emits metrics plus a report
DynoSim sweepdynamo.profiler.utils.replay_optimizeSweeps many simulation trials across TP shape, worker split, router knobs, SLA constraints, and GPU budget
Live simulation with Mockerpython -m dynamo.mockerRuns simulated workers inside a live Dynamo deployment path, including worker registration and KV event publishing
Mocker corelib/mockerModels engine scheduling, KV allocation, prefix caching, preemption, and timing
AICAI Configurator SDKSupplies calibrated timing and candidate-shape data for supported model/backend/GPU tuples
Planner simulation--planner-config on DynoSim runsRuns Planner decisions in the simulation loop to study scaling behavior and SLA compliance

Workflow

Start with a single DynoSim run to verify the workload shape and engine arguments. Use DynoSim sweeps when you want to search the design space. Use live Mocker deployments when you need to exercise the real Dynamo frontend, router, worker registration, KV events, and planner paths without running model inference. Validate the shortlist on real GPUs before production rollout.

Where AIC Fits

AIC provides performance models and candidate-shape information. DynoSim uses those models as one timing source inside the mocker engine and sweep optimizer. Mocker still owns the scheduler and KV-memory simulation: batching, prefix-cache hits, preemption, block allocation, and request lifecycle are simulated by Dynamo’s mocker core, while AIC-backed timing predicts how long prefill and decode work should take for supported model/backend/GPU combinations.

Choosing an Entry Point

GoalStart Here
Run one trace or synthetic workload through one configDynoSim Runs
Sweep topology and router choices under SLA/GPU constraintsDynoSim Sweeps
Exercise a live frontend/router setup without GPUsLive Simulation with Mocker
Study Planner scaling decisions against a tracePlanner DynoSim Benchmarking
Generate a deployable Kubernetes config from model/SLA intentModel Deployment Guide

DynoSim narrows the search space; it does not replace real-hardware validation. Use it to move quickly, find promising candidates, and understand failure modes before spending cluster time.