Best Practices#
This guide provides best practices for optimizing performance with TensorRT. It covers benchmarking, profiling, optimization techniques, and hardware/software configuration for achieving optimal inference performance.
What You Will Learn:
How to benchmark performance using
trtexecTechniques for analyzing and profiling your models
Strategies for optimizing inference speed
Hardware and software environment best practices
This guide is organized from basic benchmarking to advanced optimization techniques, allowing you to progressively improve your model’s performance.
- Performance Benchmarking using
trtexec - Advanced Performance Measurement Techniques
- Hardware/Software Environment for Performance Measurements
- GPU Information Query and GPU Monitoring
- GPU Clock Locking and Floating Clock
- GPU Power Consumption and Power Throttling
- GPU Temperature and Thermal Throttling
- H2D/D2H Data Transfers and PCIe Bandwidth
- TCC Mode and WDDM Mode
- Enqueue-Bound Workloads and CUDA Graphs
BlockingSyncandSpinWaitSynchronization Modes
- Optimizing TensorRT Performance
- Overhead of Shape Change and Optimization Profile Switching
- Improving Model Accuracy
- Optimizing Builder Performance