Recipes | NVIDIA Dynamo Documentation

Start with the model, runtime, and hardware you need to run. Model cards are sorted newest to oldest by release date. Use Feature Benchmarks when you want evidence that a Dynamo feature or topology helps under controlled traffic.

Recipe catalog

15 model families

Filter by provider, runtime, and hardware. Each card notes its serving technique and workload shape.

42Deployable configurations

Provider

Runtime

Hardware

Kimi-K3

Moonshot / NVIDIA

Day-0 vLLM recipes for Moonshot’s Kimi-K3 on GB200 and GB300, aggregated or with prefill/decode disaggregation and KV-aware routing.

vLLMGB200 · GB300Day-0

Open recipe

GLM-5.2

Z.AI / NVIDIA

SGLang recipes for long-context agentic traffic on B200 (NVFP4) or H200 (FP8), aggregated or disaggregated, with KV-aware routing, EAGLE MTP speculation, and HiCache CPU offload.

SGLang4x/12x B200 · 8x/16x H200Agentic · up to 500K context

Open recipe

Inkling NVFP4

Thinking Machines

Day-0 aggregated SGLang recipe for Thinking Machines’ first open-weights model — multimodal MoE with controllable reasoning effort, TP8 with EAGLE speculation.

SGLang8x B200Text + image + audioDay-0

Open recipe

Kimi-K2.6

Moonshot / NVIDIA

Multi-target vLLM matrix covering B200 and H200 for chat and agentic traffic, with Eagle3 MLA speculation and KV-aware routing.

vLLM4x B200 / 8x H200Chat + agentic

Open recipe

Nemotron 3 Ultra

NVIDIA

Aggregated vLLM targets for B200 and H200 chat and agentic traffic with MTP speculative decoding and trace-backed benchmarks.

vLLM4x B200 / 8x H200Chat + agentic

Open recipe

Nemotron 3 Super

NVIDIA

NVFP4 and FP8 vLLM targets for B200 and H200 chat and agentic traffic with MTP speculative decoding and KV-aware routing.

vLLM4x B200 / 4x H200Chat + agentic

Open recipe

GLM-5 NVFP4

NVIDIA / Z.AI

20x GB200 SGLang P/D recipe for 1K input / 8K output traffic with EAGLE speculative decoding, plus an AWS EFA variant.

SGLang20x GB200Long output / static ISL-OSL

Open recipe

DeepSeek-V4-Pro

NVIDIA checkpoint / DeepSeek

vLLM agentic recipe — MoE 1.6T / 49B active, B200 (NVFP4, 1M ctx) and H200 (FP8), aggregated or disaggregated with MTP-2 and KV-aware routing.

vLLM8–32x B200/H200Agentic

Open recipe

DeepSeek-V4-Flash

NVIDIA checkpoint / DeepSeek

vLLM agentic recipe — MoE 284B / 13B active, B200 (NVFP4) and H200 (FP8), aggregated or disaggregated (2P1D / 4P3D) with KV-aware routing.

vLLM4–28x B200/H200Agentic

Open recipe

DeepSeek V3.2 NVFP4

NVIDIA checkpoint / DeepSeek

32x GB200 TensorRT-LLM recipe with P/D split, KV-aware routing, and WideEP for long-context coding traces.

TRT-LLM32x GB200Long-context reuseRelated benchmark

Open recipe

GPT-OSS-120B

OpenAI

TensorRT-LLM GB200 targets for static traffic plus vLLM B200/H200 agentic targets (agg + disagg) with KV-aware routing and EAGLE3 speculative decoding.

TRT-LLM + vLLMGB200 · B200 · H200Static + agentic

Open recipe

Qwen3-32B

Qwen

Disaggregated vLLM serving with KV-aware routing on 16x H200, for multi-turn conversational traffic with prefix reuse.

vLLM16x H200Multi-turn conversationRelated benchmark

Open recipe

Qwen3-235B-A22B FP8

Qwen

16-GPU TensorRT-LLM recipe matrix for Hopper/Blackwell and aggregate/P-D serving.

TRT-LLM16x H100/H200Static ISL-OSL

Open recipe

Qwen3-32B FP8

Qwen

FP8 recipe set spanning TensorRT-LLM aggregate, TensorRT-LLM P/D, and vLLM P/D targets.

TRT-LLM2-8x GPUStatic ISL-OSL

Open recipe

Llama-3.3-70B FP8

Meta / Red Hat

Llama FP8 topology recipe set for 8K input / 1K output traffic on H100/H200.

vLLM4x H100/H200Long output / static ISL-OSLRelated benchmark

Open recipe

Use Feature Benchmarks when your question starts with a feature or performance claim, such as KV routing, embedding cache, speculative decoding, or topology. Recipe pages link back to Feature Benchmark pages whenever a recipe came from a comparison or feature-stack run.

15 model families

Kimi-K3

GLM-5.2

Inkling NVFP4

Kimi-K2.6

Nemotron 3 Ultra

Nemotron 3 Super

GLM-5 NVFP4

DeepSeek-V4-Pro

DeepSeek-V4-Flash

DeepSeek V3.2 NVFP4

GPT-OSS-120B

Qwen3-32B

Qwen3-235B-A22B FP8

Qwen3-32B FP8

Llama-3.3-70B FP8

Related Feature Benchmarks