AIConfigurator is a performance optimization tool that helps you find the optimal configuration for deploying LLMs with Dynamo. It automatically determines the best number of prefill and decode workers, parallelism settings, and deployment parameters to meet your SLA targets while maximizing throughput.
When deploying LLMs with Dynamo, you need to make several critical decisions:
AIConfigurator answers these questions in seconds, providing:
Models: GPT, LLAMA2/3, QWEN2.5/3, Mixtral, DEEPSEEK_V3 GPUs: H100, H200, A100, B200 (preview), GB200 (preview) Backend: TensorRT-LLM (vLLM and SGLang coming soon)
Model name mismatch: Use exact model name that matches your deployment
GPU allocation: Verify available GPUs match --total_gpus
Performance variance: Results are estimates - benchmark actual deployment