NVIDIA Optimized Frameworks

Deep Learning Profiler 20.08 Release Notes

Description

DLProf release for 20.08, available in the NVIDIA TensorFlow 1.x, TensorFlow 2.x, and PyTorch NGC containers.

Driver Requirements

Release 20.08 is based on NVIDIA CUDA 11.0.194, which requires NVIDIA Driver release 450 or later. However, if you are running on Tesla (for example, T4 or any other Tesla board), you may use NVIDIA driver release 418.xx or 440.30. The CUDA driver's compatibility package only supports particular drivers. For a complete list of supported drivers, see the CUDA Application Compatibility topic. For more information, see CUDA Compatibility and Upgrades..

New Features

The key features of DLProf v0.14.0 / r20.08 are:

  • Released in the TensorFlow 1.x 20.08, TensorFlow 2.x 20.08and PyTorch 20.08 NGC container.
  • Latest DLProf build is based on TensorFlow 1.15.2, TensorBoard 1.15.0, PyTorch 1.6.0, and Nsight Systems 2020.3.2.
  • DLProf support for PyTorch is now feature equivalent to the DLProf support for TensorFlow 1.x:
    • Correctly identifies forward and backwards operations.
    • Tensor Core eligible operations list corrected for improved accuracy.
    • Stack trace and direction added to reports.
    • Data loader detector improved to provide more details.
    • Methodology added to differentiate between calls to the same function from the same stack trace.
  • Testing and detection of Tensor Core kernels improved and provides a high level of confidence in TC usage metrics across all frameworks.
  • Tensor Core utilization calculation improved for XLA runs.

Known Issues

  • This software is only accessible in the NGC TensorFlow and PyTorch containers.
  • This software is only supported for TensorFlow 1.15, and PyTorch 1.6 and TensorBoard 1.15.
  • Partial simple mode profiling is supported for TensorFlow 2.

Resolved Issues

  • None
© Copyright 2024, NVIDIA. Last updated on Jul 26, 2024.