Recipes

Pick a model, choose the hardware/runtime target, and deploy the matching Dynamo recipe.

View as Markdown

Start with the model, runtime, and hardware you need to run. Model cards are sorted newest to oldest by release date. Use Feature Benchmarks when you want evidence that a Dynamo feature or topology helps under controlled traffic.

Recipe catalog

10 model families

Filter by provider, runtime, and hardware. Each card notes its serving technique and workload shape.

29Deployable configurations
Provider
Runtime
Hardware

Kimi-K2.6

Moonshot / NVIDIA

Multi-target vLLM matrix covering B200 and H200 for chat and agentic traffic, with Eagle3 MLA speculation and KV-aware routing.

vLLM4x B200 / 8x H200Chat + agentic
Open recipe

Nemotron 3 Ultra

NVIDIA

Aggregated vLLM targets for B200 and H200 chat and agentic traffic with MTP speculative decoding and trace-backed benchmarks.

vLLM4x B200 / 8x H200Chat + agentic
Open recipe

Nemotron 3 Super

NVIDIA

NVFP4 and FP8 vLLM targets for B200 and H200 chat and agentic traffic with MTP speculative decoding and KV-aware routing.

vLLM4x B200 / 4x H200Chat + agentic
Open recipe

GLM-5 NVFP4

NVIDIA / Z.AI

20x GB200 SGLang P/D recipe for 1K input / 8K output traffic with EAGLE speculative decoding, plus an AWS EFA variant.

SGLang20x GB200Long output / static ISL-OSL
Open recipe

DeepSeek V3.2 NVFP4

NVIDIA checkpoint / DeepSeek

32x GB200 TensorRT-LLM recipe with P/D split, KV-aware routing, and WideEP for long-context coding traces.

TRT-LLM32x GB200Long-context reuseRelated benchmark
Open recipe

GPT-OSS-120B

OpenAI

TensorRT-LLM GB200 recipe pair organized by traffic target: aggregated serving and a disaggregated P/D split.

TRT-LLM4x GB200Static ISL-OSL
Open recipe

Qwen3-32B

Qwen

Disaggregated vLLM serving with KV-aware routing on 16x H200, for multi-turn conversational traffic with prefix reuse.

vLLM16x H200Multi-turn conversationRelated benchmark
Open recipe

Qwen3-235B-A22B FP8

Qwen

16-GPU TensorRT-LLM recipe matrix for Hopper/Blackwell and aggregate/P-D serving.

TRT-LLM16x H100/H200Static ISL-OSL
Open recipe

Qwen3-32B FP8

Qwen

FP8 recipe set spanning TensorRT-LLM aggregate, TensorRT-LLM P/D, and vLLM P/D targets.

TRT-LLM2-8x GPUStatic ISL-OSL
Open recipe

Llama-3.3-70B FP8

Meta / Red Hat

Llama FP8 topology recipe set for 8K input / 1K output traffic on H100/H200.

vLLM4x H100/H200Long output / static ISL-OSLRelated benchmark
Open recipe

Use Feature Benchmarks when your question starts with a feature or performance claim, such as KV routing, embedding cache, speculative decoding, or topology. Recipe pages link back to Feature Benchmark pages whenever a recipe came from a comparison or feature-stack run.