Additional ResourcesTensorRT-LLM Details

Multinode Examples

View as Markdown

For general TensorRT-LLM features and engine configuration, see the Reference Guide.

For multinode TensorRT-LLM deployments, start from the checked-in Kubernetes recipes under recipes/. Those manifests are the supported entrypoints for launching multi-node workers, frontend services, and related routing components.

The main TRT-LLM recipe entrypoints are:

For model-level setup, prerequisites, and hardware notes, use the recipe README files:

Quick Start

At a high level, the Kubernetes workflow is:

  1. Install the Dynamo platform on Kubernetes. See the Kubernetes Deployment Guide.
  2. Create a namespace and any required secrets such as a Hugging Face token.
  3. Apply the recipe’s model cache and model download manifests when the recipe includes them.
  4. Apply the recipe’s deploy.yaml.
  5. Port-forward the frontend service and send test requests to /v1/models or /v1/chat/completions.

Example flow:

$export NAMESPACE=dynamo-demo
$kubectl create namespace ${NAMESPACE}
$
$kubectl create secret generic hf-token-secret \
> --from-literal=HF_TOKEN="your-token-here" \
> -n ${NAMESPACE}
$
$# Example: deploy DeepSeek-R1 TRT-LLM WideEP on GB200.
$kubectl apply -f recipes/deepseek-r1/model-cache/model-cache.yaml -n ${NAMESPACE}
$kubectl apply -f recipes/deepseek-r1/model-cache/model-download.yaml -n ${NAMESPACE}
$kubectl wait --for=condition=Complete job/model-download -n ${NAMESPACE} --timeout=7200s
$kubectl apply -f recipes/deepseek-r1/trtllm/disagg/wide_ep/gb200/deploy.yaml -n ${NAMESPACE}

After the deployment is ready, port-forward the frontend service named by the recipe and send a test request:

$kubectl port-forward svc/<frontend-service> 8000:8000 -n ${NAMESPACE}
$
$curl http://localhost:8000/v1/models

Notes

  • The TRT-LLM engine config files used by launch and deploy flows live under examples/backends/trtllm/engine_configs/.
  • If you need to customize model parallelism, replica counts, or routing mode, edit the recipe-local manifest rather than introducing a separate scheduler-specific guide.
  • For the current catalog of supported recipes, see recipes/README.md.