Deep Learning Profiler 20.03 Release Notes
Description
Deep Learning Profiler (DLProf) is a tool for profiling deep learning models to help data scientists understand and improve performance of their models visually via Tensorboard or by analyzing text reports. It also helps understand resource usage when models are trained.
Driver Requirements
Release 20.03 is based on NVIDIA CUDA 10.2.89, which requires NVIDIA Driver release 440.33.01. However, if you are running on Tesla (for example, T4 or any other Tesla board), you may use NVIDIA driver release 396, 384.111+, 410, 418.xx, or 440.30. The CUDA driver's compatibility package only supports particular drivers. For a complete list of supported drivers, see the CUDA Application Compatibility topic. For more information, see CUDA Compatibility and Upgrades.
New Features
The key features of DLProf v0.10.0 / r20.03 are:
- Released in the TensorFlow 20.03 NGC container.
- Latest DLProf build is based on TensorFlow 1.15.2, TensorBoard 1.15.0, and Nsight Systems 2020.1.1.
- Expert Systems feature that analyzes performance results, looks for common performance issues, and suggests recommended fixes that may improve performance.
- Support for additional domains from custom NVTX markers.
- Reports are generated for the domain specified using markers.
- Data is aggregated only from NVTX markers in the same domain.
- Passing a Graphdef is now optional.User can specify a Graphdef with --graphdef or set it to auto for a TensorBoard graph event file to be created.
- System information is gathered in the background and is exposed in the summary report, database, and TensorBoard event files.
- Consistent command line arguments.
Known Issues
- This software is only accessible in the NGC TensorFlow container.
- This software is only supported for TensorFlow 1.15 and TensorBoard 1.15.
- The following command line options have been changed.
-
--in_nsys_db_filename
is now--nsys_database
-
--in_saved_model
was removed -
--nsys_base_output_name
is now--nsys_base_name
-
Resolved Issues
- Fixed issue with XLA kernels and nodes not being aggregated correctly.