Reporting TensorRT Issues#

If you encounter issues when using TensorRT, check the FAQs and the Understanding Error Messages sections to look for similar failing patterns. For example, many engine building failures can be solved by sanitizing and constant-folding the ONNX model using Polygraphy with the following command:

polygraphy surgeon sanitize model.onnx --fold-constants --output model_folded.onnx

In addition, it is highly recommended that you first try our latest TensorRT release before filing an issue if you have not done so, because the issue may have been fixed in the latest release.

Channels for TensorRT Issue Reporting#

If neither the FAQs nor the Understanding Error Messages sections help, you can report the issue through the NVIDIA Developer Forum or the TensorRT GitHub Issue page. These channels are constantly monitored to provide feedback on the issues you encounter.

Here are the steps to report an issue on the NVIDIA Developer Forum:

  1. Register for the NVIDIA Developer website.

  2. Log in to the developer site.

  3. Click on your name in the upper right corner.

  4. Click My Account > My Bugs and select Submit a New Bug.

  5. Fill out the bug reporting page. Be descriptive and provide the steps to reproduce the problem.

  6. Click Submit a bug.

When reporting an issue, provide setup details and include the following information:

  • Environment information:

    • OS or Linux distro and version

    • GPU type

    • NVIDIA driver version

    • CUDA version

    • cuDNN version

    • Python version (if Python is used).

    • TensorFlow, PyTorch, and ONNX versions (if any of them are used).

    • TensorRT version

    • NGC TensorRT container version (if TensorRT container is used).

    • Jetson (if used), include OS and hardware versions

  • A thorough description of the issue.

  • Steps to reproduce the issue:

    • ONNX file (if ONNX is used).

    • A TensorRT API Capture (if ONNX is not used). For more information, refer to the TensorRT API Capture and Replay section.

    • Minimal commands or scripts to trigger the issue

    • Verbose logs by enabling kVERBOSE in ILogger

Depending on the type of the issue, providing more information listed below can expedite the response and debugging process.

Reporting a Functional Issue#

When reporting functional issues, such as linker errors, segmentation faults, engine building failures, inference failures, and so on, provide the scripts and commands to reproduce the issue and a detailed description of the environment. Having more details helps us debug the functional issue faster.

If you are not using ONNX, it is recommended to capture the TensorRT API calls and attach the JSON and BIN files. For more information, refer to the TensorRT API Capture and Replay section.

Since the TensorRT engine is specific to a specific TensorRT version and a specific GPU type, do not build the engine in one environment and use it to run it in another environment with different GPUs or dependency software stack, such as TensorRT version, CUDA version, cuDNN version. Also, ensure the application is linked to the correct TensorRT and cuDNN shared object files by checking the environment variable LD_LIBRARY_PATH (or %PATH% on Windows).

Reporting an Accuracy Issue#

When reporting an accuracy issue, provide the scripts and the commands used to calculate the accuracy metrics. Describe the expected accuracy level and share the steps to get the expected results using other frameworks like ONNX-Runtime.

The Polygraphy tool can debug the accuracy issue and produce a minimal failing case. For instructions, refer to the documentation on Debugging TensorRT Accuracy Issues. Having a Polygraphy command that shows the accuracy issue or having a minimal failing case expedites the time it takes for us to debug your accuracy issue.

Note that it is not practical to expect bitwise identical results between TensorRT and other frameworks like PyTorch, TensorFlow, or ONNX-Runtime even in FP32 precision since the order of the computations on the floating-point numbers can result in slight differences in output values. In practice, small numeric differences should not significantly affect the accuracy metric of the application, such as the mAP score for object-detection networks or the BLEU score for translation networks. If you encounter a significant drop in the accuracy metric between TensorRT and other frameworks such as PyTorch, TensorFlow, or ONNX-Runtime, it can be a genuine TensorRT bug.

If you are seeing NaNs or infinite values in TensorRT engine output when FP16/BF16 precision is enabled, it is possible that intermediate layer outputs in the network overflow in FP16/BF16. Some approaches to help mitigate this include:

  • Ensuring that network weights and inputs are restricted to a reasonably narrow range (such as [-1, 1] instead of [-100, 100]). This can require making changes to the network and retraining.

    • Consider pre-processing input by scaling or clipping it to the restricted range before passing it to the network for inference.

  • Overriding precision for individual layers vulnerable to overflows (such as Reduce and Element-Wise Power ops) to FP32.

Polygraphy can help you diagnose common problems by using reduced precision. Refer to Polygraphy’s Working with Reduced Precision how-to guide for more information.

Reporting a Performance Issue#

If you are reporting a performance issue, share the full trtexec logs using this command:

trtexec --onnx=<onnx_file> <precision_and_shape_flags> --verbose --profilingVerbosity=detailed --dumpLayerInfo --dumpProfile --duration=60

The verbose logs help us to identify the performance issue. If possible, also share the Nsight Systems profiling files using these commands:

trtexec --onnx=<onnx_file> <precision_and_shape_flags> --verbose --profilingVerbosity=detailed --dumpLayerInfo --saveEngine=<engine_path>
nsys profile --cuda-graph-trace=node -o <output_profile> trtexec --loadEngine=<engine_path> <precision_and_shape_flags> --warmUp=0 --duration=0 --iterations=20

Refer to the trtexec section for more instructions on using the trtexec tool and the meaning of these flags.

If you do not use trtexec to measure performance, provide the scripts and commands you use to measure it. Compare the performance measurement from your script with that from the trtexec tool. If the two numbers differ, your scripts can have some issues with the performance measurement methodology.

Refer to the Hardware/Software Environment for Performance Measurements section for some environmental factors affecting performance.