Multinode Examples
For general TensorRT-LLM features and engine configuration, see the Reference Guide.
Recommended Path
For multinode TensorRT-LLM deployments, start from the checked-in Kubernetes
recipes under recipes/. Those manifests are
the supported entrypoints for launching multi-node workers, frontend services,
and related routing components.
The main TRT-LLM recipe entrypoints are:
- DeepSeek-R1 WideEP on GB200
- Qwen3-235B-A22B-FP8 aggregated
- Qwen3-235B-A22B-FP8 disaggregated
- Qwen3-32B-FP8 aggregated
- Qwen3-32B-FP8 disaggregated
- GPT-OSS-120B aggregated
- GPT-OSS-120B disaggregated
- Nemotron-3-Super-FP8 disaggregated
For model-level setup, prerequisites, and hardware notes, use the recipe README files:
- DeepSeek-R1 recipes
- Qwen3-235B-A22B-FP8 recipes
- Qwen3-32B-FP8 recipes
- GPT-OSS-120B recipes
- Kimi-K2.5 recipes
Quick Start
At a high level, the Kubernetes workflow is:
- Install the Dynamo platform on Kubernetes. See the Kubernetes Deployment Guide.
- Create a namespace and any required secrets such as a Hugging Face token.
- Apply the recipe’s model cache and model download manifests when the recipe includes them.
- Apply the recipe’s
deploy.yaml. - Port-forward the frontend service and send test requests to
/v1/modelsor/v1/chat/completions.
Example flow:
After the deployment is ready, port-forward the frontend service named by the recipe and send a test request:
Notes
- The TRT-LLM engine config files used by launch and deploy flows live under
examples/backends/trtllm/engine_configs/. - If you need to customize model parallelism, replica counts, or routing mode, edit the recipe-local manifest rather than introducing a separate scheduler-specific guide.
- For the current catalog of supported recipes, see recipes/README.md.