Deep Learning Profiler 21.03 Release Notes
DLProf Release for 21.03, available in the NVIDIA TensorFlow 1.x, TensorFlow 2.x, and PyTorch NGC containers, and as a Python Wheel on the NVIDIA PY Index.
Release 21.03 is based on NVIDIA CUDA 11.2.1, which requires NVIDIA Driver release 460.32.03 or later. However, if you are running on Tesla (for example, T4 or any other Tesla board), you may use NVIDIA driver release 418.xx, 440.30, 450.xx, or 455.xx. The CUDA driver's compatibility package only supports particular drivers. For a complete list of supported drivers, see the CUDA Application Compatibility topic. For more information, see CUDA Compatibility and Upgrades.
The key features of DLProf v1.0.0 / r21.03 are:
- Released in the TensorFlow 1.x 21.03, TensorFlow 2.x 21.03 and PyTorch 21.03 NGC container.
- Latest DLProf build is based on TensorFlow 1.15.5, TensorBoard 1.15.0, TensorFlow 2.3.1, TensorBoard 2.3.0, PyTorch 1.8.0, and Nsight Systems 2020.4.3.
- Expert Systems can detect when GPU memory is underutilized and recommend increasing batch size.
- DLProf can now profile TensorRT models.
- DLProf can detect NCCL events and properly associate GPU activity to them.
- This software is accessible in the NGC TensorFlow and PyTorch containers and as a separate PIP wheel.
- This software is only supported for TensorFlow 1.15, TensorFlow 2.3, PyTorch 1.8, TensorBoard 1.15, and TensorBoard 2.3.
- When profiling on multi-gpu, in a rare case DLProf can get stuck after printing,
DLprof completed system call successfully. The recommended workaround is to run on a single GPU system or use version 21.02.
- When launching TensorBoard in a TensorFlow 2.x container, the --bind_all argument must be passed onto the command line. Example:
# tensorboard --bind_all --logdir /path/to/event_files