Model Analyzer#

The Triton Model Analyzer is a tool that uses Performance Analyzer to send requests to your model while measuring GPU memory and compute utilization. The Model Analyzer is specifically useful for characterizing the GPU memory requirements for your model under different batching and model instance configurations. Once you have this GPU memory usage information you can more intelligently decide on how to combine multiple models on the same GPU while remaining within the memory capacity of the GPU.

For more information see the Model Analyzer repository and the detailed explanation provided in Maximizing Deep Learning Inference Performance with NVIDIA Model Analyzer.