For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Digest
  • Getting Started
    • Quickstart
    • Introduction
    • Local Installation
    • Building from Source
    • Kubernetes Deployment
    • Contribution Guide
  • Resources
    • Support Matrix
    • Feature Matrix
    • Release Artifacts
    • Examples
    • Glossary
  • Digest
    • DynoSim: Simulating the Pareto Frontier
    • Dynamo Day 0 support for TokenSpeed
    • Multi-Turn Agentic Harnesses
    • Full-Stack Optimizations for Agentic Inference
    • Flash Indexer: Inter-Galactic KV Routing
  • Kubernetes Deployment
  • User Guides
    • Disaggregated Serving
    • KV Cache Aware Routing
    • KV Cache Offloading
    • Tool Calling
    • Reasoning
    • Agents
    • Multimodal
    • Diffusion
    • LoRA Adapters
    • Fastokens Tokenizer
    • Observability (Local)
    • Fault Tolerance
    • Benchmarking
      • DynoSim Overview
      • Live Simulation with Mocker
      • DynoSim Runs
      • DynoSim Sweeps
      • Planner DynoSim Benchmarking
    • Writing Python Workers
    • Writing Python Unified Backends
    • Writing Rust Unified Backends
  • Backends
    • SGLang
    • TensorRT-LLM
    • vLLM
  • Components
    • Frontend
    • Router
    • Planner
    • Profiler
    • KVBM
  • Integrations
    • LMCache
    • FlexKV
    • KV Events for Custom Engines
  • Design Docs
    • Overall Architecture
    • Architecture Flow
    • Disaggregated Serving
    • Distributed Runtime
  • Documentation
    • Dynamo Docs Guide
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoDocumentation
Digest
On this page
  • Components
  • Workflow
  • Where AIC Fits
  • Choosing an Entry Point
User GuidesDynoSim

DynoSim

Simulate Dynamo deployment choices before spending GPU time
||View as Markdown|
Previous

Dynamo Benchmarking

Next

Live Simulation with Mocker

DynoSim is Dynamo’s simulation stack for exploring serving configurations before validating them on real clusters. It is not a separate service; it is the product surface that connects workload-driven simulation runs, configuration sweeps, the mocker engine, Planner simulation, Router simulation, and AIC-backed timing models into one workflow.

Use DynoSim when you want to answer questions such as:

  • Which aggregated or disaggregated topology should this workload use?
  • How many prefill and decode workers fit within my GPU budget?
  • How sensitive is the deployment to startup time, queue pressure, prefix reuse, or router tuning?
  • Which candidates should I validate with AIPerf on real GPUs?

Components

ComponentEntry PointRole
DynoSim runpython -m dynamo.replayRuns one workload against one simulated Dynamo configuration and emits metrics plus a report
DynoSim sweepdynamo.profiler.utils.replay_optimizeSweeps many simulation trials across TP shape, worker split, router knobs, SLA constraints, and GPU budget
Live simulation with Mockerpython -m dynamo.mockerRuns simulated workers inside a live Dynamo deployment path, including worker registration and KV event publishing
Mocker corelib/mockerModels engine scheduling, KV allocation, prefix caching, preemption, and timing
AICAI Configurator SDKSupplies calibrated timing and candidate-shape data for supported model/backend/GPU tuples
Planner simulation--planner-config on DynoSim runsRuns Planner decisions in the simulation loop to study scaling behavior and SLA compliance

Workflow

Start with a single DynoSim run to verify the workload shape and engine arguments. Use DynoSim sweeps when you want to search the design space. Use live Mocker deployments when you need to exercise the real Dynamo frontend, router, worker registration, KV events, and planner paths without running model inference. Validate the shortlist on real GPUs before production rollout.

Where AIC Fits

AIC provides performance models and candidate-shape information. DynoSim uses those models as one timing source inside the mocker engine and sweep optimizer. Mocker still owns the scheduler and KV-memory simulation: batching, prefix-cache hits, preemption, block allocation, and request lifecycle are simulated by Dynamo’s mocker core, while AIC-backed timing predicts how long prefill and decode work should take for supported model/backend/GPU combinations.

Choosing an Entry Point

GoalStart Here
Run one trace or synthetic workload through one configDynoSim Runs
Sweep topology and router choices under SLA/GPU constraintsDynoSim Sweeps
Exercise a live frontend/router setup without GPUsLive Simulation with Mocker
Study Planner scaling decisions against a tracePlanner DynoSim Benchmarking
Generate a deployable Kubernetes config from model/SLA intentModel Deployment Guide

DynoSim narrows the search space; it does not replace real-hardware validation. Use it to move quickly, find promising candidates, and understand failure modes before spending cluster time.