The Dynamo Profiler is an automated performance analysis tool that measures model inference characteristics to optimize deployment configurations. It determines optimal tensor parallelism (TP) settings for prefill and decode phases, generates performance interpolation data, and enables SLA-driven autoscaling through the Planner.
The recommended way to profile models is through DGDRs, which automate the entire profiling and deployment workflow.
AI Configurator enables rapid offline profiling (~30 seconds) and supports all backends (vLLM, SGLang, TensorRT-LLM). Since searchStrategy: rapid is the default, AIC is used automatically unless you explicitly set searchStrategy: thorough.
The profiler generates:
Example recommendations: