Model Analyzer

The Triton Model Analyzer is a tool that uses perf_client to apply load to your model while measure GPU memory and compute utilization. The Model Analyzer is specifically useful for characterizing the GPU memory requirements for your model under different batching and model instance configurations. Once you have this GPU memory usage information you can more intelligently decide on how to combine multiple models on the same GPU while remaining within the memory capacity of the GPU.

The Model Analyzer repository is available at github.com/triton-inference-server/model_analyzer and a detailed explanation is provided in the blog Maximizing Deep Learning Inference Performance with NVIDIA Model Analyzer.