> For clean Markdown content of this page, append .md to this URL. For the complete documentation index, see https://docs.nvidia.com/dynamo/llms.txt. For full content including API reference and SDK examples, see https://docs.nvidia.com/dynamo/llms-full.txt.

# Recipes

Start with the model, runtime, and hardware you need to run. Model cards are sorted newest to oldest by release date. Use [Feature Benchmarks](/dynamo/dev/benchmarks) when you want evidence that a Dynamo feature or topology helps under controlled traffic.

<form>
  <input type="radio" id="provider-nvidia" name="provider" />

  <input type="radio" id="provider-qwen" name="provider" />

  <input type="radio" id="provider-deepseek" name="provider" />

  <input type="radio" id="provider-moonshot" name="provider" />

  <input type="radio" id="provider-meta" name="provider" />

  <input type="radio" id="provider-openai" name="provider" />

  <input type="radio" id="provider-zai" name="provider" />

  <input type="radio" id="runtime-vllm" name="runtime" />

  <input type="radio" id="runtime-trtllm" name="runtime" />

  <input type="radio" id="runtime-sglang" name="runtime" />

  <input type="radio" id="hardware-h100" name="hardware" />

  <input type="radio" id="hardware-h200" name="hardware" />

  <input type="radio" id="hardware-gb200" name="hardware" />

  <input type="radio" id="hardware-b200" name="hardware" />

  <p>
    Recipe catalog
  </p>

  <h3>
    10 model families
  </h3>

  <p>
    Filter by provider, runtime, and hardware. Each card notes its serving technique and workload shape.
  </p>

  <strong>
    29
  </strong>

  Deployable configurations

  <button type="reset">
    Reset filters
  </button>

  Provider

  All
  NVIDIA
  Qwen
  DeepSeek
  Moonshot
  Meta
  OpenAI
  Z.AI

  Runtime

  All
  vLLM
  TensorRT-LLM
  SGLang

  Hardware

  All
  H100
  H200
  GB200
  B200

  <img src="https://files.buildwithfern.com/dynamo.docs.buildwithfern.com/dynamo/6c9ba98efd7d603b5476db9313617ebffd57a53acb0f9ef724ef4c7288a8a527/pages-dev/assets/img/recipes/providers/moonshotai.webp" alt="" />

  <h3>Kimi-K2.6</h3><p>Moonshot / NVIDIA</p>

  <p>
    Multi-target vLLM matrix covering B200 and H200 for chat and agentic traffic, with Eagle3 MLA speculation and KV-aware routing.
  </p>

  vLLM4x B200 / 8x H200Chat + agentic

  <a href="/dynamo/dev/recipes/kimi-k2-6" aria-label="Open the Kimi-K2.6 recipe">
    Open recipe
  </a>

  <img src="https://files.buildwithfern.com/dynamo.docs.buildwithfern.com/dynamo/75f85aa1b5c95694f9199ae163b0a42ccd45c8ef2870d7dc494c1733b869444d/pages-dev/assets/img/recipes/providers/nvidia.webp" alt="" />

  <h3>Nemotron 3 Ultra</h3><p>NVIDIA</p>

  <p>
    Aggregated vLLM targets for B200 and H200 chat and agentic traffic with MTP speculative decoding and trace-backed benchmarks.
  </p>

  vLLM4x B200 / 8x H200Chat + agentic

  <a href="/dynamo/dev/recipes/nemotron-3-ultra" aria-label="Open the Nemotron 3 Ultra recipe">
    Open recipe
  </a>

  <img src="https://files.buildwithfern.com/dynamo.docs.buildwithfern.com/dynamo/75f85aa1b5c95694f9199ae163b0a42ccd45c8ef2870d7dc494c1733b869444d/pages-dev/assets/img/recipes/providers/nvidia.webp" alt="" />

  <h3>Nemotron 3 Super</h3><p>NVIDIA</p>

  <p>
    NVFP4 and FP8 vLLM targets for B200 and H200 chat and agentic traffic with MTP speculative decoding and KV-aware routing.
  </p>

  vLLM4x B200 / 4x H200Chat + agentic

  <a href="/dynamo/dev/recipes/nemotron-3-super" aria-label="Open the Nemotron 3 Super recipe">
    Open recipe
  </a>

  <img src="https://files.buildwithfern.com/dynamo.docs.buildwithfern.com/dynamo/e60742a0b37f805e654249dedb7b6ca11c785a60a2dd9ae33864668dce217693/pages-dev/assets/img/recipes/providers/zai-org.svg" alt="" />

  <h3>GLM-5 NVFP4</h3><p>NVIDIA / Z.AI</p>

  <p>
    20x GB200 SGLang P/D recipe for 1K input / 8K output traffic with EAGLE speculative decoding, plus an AWS EFA variant.
  </p>

  SGLang20x GB200Long output / static ISL-OSL

  <a href="/dynamo/dev/recipes/glm-5-nvfp4" aria-label="Open the GLM-5 NVFP4 recipe">
    Open recipe
  </a>

  <img src="https://files.buildwithfern.com/dynamo.docs.buildwithfern.com/dynamo/4cd5dd62104f4bfa1caa6bfa3a13e06b1577d24ea03bfd1e78e4c5aea8ceed9f/pages-dev/assets/img/recipes/providers/deepseek-ai.webp" alt="" />

  <h3>DeepSeek V3.2 NVFP4</h3><p>NVIDIA checkpoint / DeepSeek</p>

  <p>
    32x GB200 TensorRT-LLM recipe with P/D split, KV-aware routing, and WideEP for long-context coding traces.
  </p>

  TRT-LLM32x GB200Long-context reuseRelated benchmark

  <a href="/dynamo/dev/recipes/deepseek-v3-2-nvfp4" aria-label="Open the DeepSeek V3.2 NVFP4 recipe">
    Open recipe
  </a>

  <img src="https://files.buildwithfern.com/dynamo.docs.buildwithfern.com/dynamo/51767fd618f19941f3f2ae5a88c526ccc4ca874a0444226f09f6ab6da7346897/pages-dev/assets/img/recipes/providers/openai.webp" alt="" />

  <h3>GPT-OSS-120B</h3><p>OpenAI</p>

  <p>
    TensorRT-LLM GB200 recipe pair organized by traffic target: aggregated serving and a disaggregated P/D split.
  </p>

  TRT-LLM4x GB200Static ISL-OSL

  <a href="/dynamo/dev/recipes/gpt-oss-120b" aria-label="Open the GPT-OSS-120B recipe">
    Open recipe
  </a>

  <img src="https://files.buildwithfern.com/dynamo.docs.buildwithfern.com/dynamo/588266b8e1b85bad9da8849973fb14ea59b7422a2b30b9dbf7e7e8db5c246321/pages-dev/assets/img/recipes/providers/qwen.webp" alt="" />

  <h3>Qwen3-32B</h3><p>Qwen</p>

  <p>
    Disaggregated vLLM serving with KV-aware routing on 16x H200, for multi-turn conversational traffic with prefix reuse.
  </p>

  vLLM16x H200Multi-turn conversationRelated benchmark

  <a href="/dynamo/dev/recipes/qwen3-32b" aria-label="Open the Qwen3-32B recipe">
    Open recipe
  </a>

  <img src="https://files.buildwithfern.com/dynamo.docs.buildwithfern.com/dynamo/588266b8e1b85bad9da8849973fb14ea59b7422a2b30b9dbf7e7e8db5c246321/pages-dev/assets/img/recipes/providers/qwen.webp" alt="" />

  <h3>Qwen3-235B-A22B FP8</h3><p>Qwen</p>

  <p>
    16-GPU TensorRT-LLM recipe matrix for Hopper/Blackwell and aggregate/P-D serving.
  </p>

  TRT-LLM16x H100/H200Static ISL-OSL

  <a href="/dynamo/dev/recipes/qwen3-235b-a22b-fp8" aria-label="Open the Qwen3-235B-A22B FP8 recipe">
    Open recipe
  </a>

  <img src="https://files.buildwithfern.com/dynamo.docs.buildwithfern.com/dynamo/588266b8e1b85bad9da8849973fb14ea59b7422a2b30b9dbf7e7e8db5c246321/pages-dev/assets/img/recipes/providers/qwen.webp" alt="" />

  <h3>Qwen3-32B FP8</h3><p>Qwen</p>

  <p>
    FP8 recipe set spanning TensorRT-LLM aggregate, TensorRT-LLM P/D, and vLLM P/D targets.
  </p>

  TRT-LLM2-8x GPUStatic ISL-OSL

  <a href="/dynamo/dev/recipes/qwen3-32b-fp8" aria-label="Open the Qwen3-32B FP8 recipe">
    Open recipe
  </a>

  <img src="https://files.buildwithfern.com/dynamo.docs.buildwithfern.com/dynamo/b202ff7373a9f14706f9cd4afd55cbb4c513e0f53763675dd93552693a623bf8/pages-dev/assets/img/recipes/providers/meta-llama.webp" alt="" />

  <h3>Llama-3.3-70B FP8</h3><p>Meta / Red Hat</p>

  <p>
    Llama FP8 topology recipe set for 8K input / 1K output traffic on H100/H200.
  </p>

  vLLM4x H100/H200Long output / static ISL-OSLRelated benchmark

  <a href="/dynamo/dev/recipes/llama-3-3-70b" aria-label="Open the Llama-3.3-70B FP8 recipe">
    Open recipe
  </a>

  <p data-empty-state hidden>
    No recipes match the selected filters.
  </p>
</form>

## Related Feature Benchmarks

Use [Feature Benchmarks](/dynamo/dev/benchmarks) when your question starts with a feature or performance claim, such as KV routing, embedding cache, speculative decoding, or topology. Recipe pages link back to Feature Benchmark pages whenever a recipe came from a comparison or feature-stack run.