Deep Learning Profiler 20.06 Release Notes

Description

Deep Learning Profiler (DLProf) is a tool for profiling deep learning models to help data scientists understand and improve performance of their models visually via Tensorboard or by analyzing text reports. It also helps understand resource usage when models are trained.

Driver Requirements

Release 20.06 is based on NVIDIA CUDA 11.0.167, which requires NVIDIA Driver release 450.36.06. However, if you are running on Tesla (for example, T4 or any other Tesla board), you may use NVIDIA driver release 410, 418.xx or 440.30. The CUDA driver's compatibility package only supports particular drivers. For a complete list of supported drivers, see the CUDA Application Compatibility topic. For more information, see CUDA Compatibility and Upgrades.

New Features

The key features of DLProf v0.12.0 / r20.06 are:
  • Released in the TensorFlow 20.06 and PyTorch 20.06 NGC container.
  • Latest DLProf build is based on TensorFlow 1.15.2, TensorBoard 1.15.0, PyTorch 1.6.0, and Nsight Systems 2020.1.1.
  • PyTorch support for DLProf build released in the PyTorch NGC container.
    • Depends on of PyProf and Nsight Systems which are provided in the container.
    • Generates all reports and a GPU Event file that can be viewed with the NVIDIA DLProf Tensorboard Plugin is included for Tensorboard visualization.
  • New command line switch options.
    • Align CLI switch options and style with Nsight Systems CLI.
    • Improved help messages from dlprof --help.
    • Known issue: boolean options now require specifying true or false, e.g. --force=true.
  • Support for multiple Frameworks and NGC containers.
    • Each supported NGC container has a customized DLProf that is optimized for the Deep Learning framework in the container.
    • Supported Frameworks:
  • Recommendations for shape and data type are improved in Tensorboard and on command line.
  • Reporting enhancements
    • Print all reports with --reports=all
    • Added start and stop iterations to Summary report.
    • Added more detailed TensorCore GPU time to the Tensor report.
    • Fixed issue in kernel report that was over-counting the number of unique kernels.

Known Issues

  • This software is only accessible in the NGC TensorFlow and PyTorch containers.
  • This software is only supported for TensorFlow 1.15, and PyTorch 1.6 and TensorBoard 1.15.

Resolved Issues

  • None