TensorFlow Release 21.06

The NVIDIA container image of TensorFlow, release 21.06, is available on NGC.

Contents of the TensorFlow container

This container image includes the complete source of the NVIDIA version of TensorFlow in /opt/tensorflow. It is pre-built and installed as a system Python module.

To achieve optimum TensorFlow performance, for image based training, the container includes a sample script that demonstrates efficient training of convolutional neural networks (CNNs). The sample script may need to be modified to fit your application.

The container also includes the following:

Driver Requirements

Release 21.06 is based on NVIDIA CUDA 11.3.1, which requires NVIDIA Driver release 465.19.01 or later. However, if you are running on Data Center GPUs (formerly Tesla), for example, T4, you may use NVIDIA driver release 418.40 (or later R418), 440.33 (or later R440), 450.51 (or later R450), or 460.27 (or later R460). The CUDA driver's compatibility package only supports particular drivers. For a complete list of supported drivers, see the CUDA Application Compatibility topic. For more information, see CUDA Compatibility and Upgrades and NVIDIA CUDA and Drivers Support.

GPU Requirements

Release 21.06 supports CUDA compute capability 6.0 and higher. This corresponds to GPUs in the NVIDIA Pascal, Volta, Turing, and Ampere Architecture GPU families. Specifically, for a list of GPUs that this compute capability corresponds to, see CUDA GPUs. For additional support details, see Deep Learning Frameworks Support Matrix.

Key Features and Enhancements

This TensorFlow release includes the following key features and enhancements.
  • TensorFlow container images version 21.05 are based on Tensorflow 1.15.5 and 2.5.0
  • Fixed an issue that caused XLA to initialize TensorFlow on all visible GPUs leading to OOM errors in Horovod and other multi-process configurations.
  • Fixed an issue in the FakeQuantizeAndDequantize op that would result in non-symmetric quantization when max=-min.
  • Implemented GPU kernels for ops common in recommender model input pipelines: SparseApplyFtrl, [Sparse]ApplyProximalAdagrad, SparseReshape, and SparseToDense.
  • Vectorized GPU Gather op to improve performance.
  • Introduced the environment variable TF_CPP_VLOG_FILENAME to direct VLOG output to a file.
  • Improved CUDNN kernel selection by switching to CUDNN_HEUR_B kernel selector.
  • Updated tensorflow-addons to r0.13.
  • Added support for FussedBatchNormGrad op to optimize side-inputs and activations.
  • Patched recently announced vulnerabilities in TF 1.15.5: CVE-2021-29591, CVE-2021-29605, CVE-2021-29606, and CVE-2021-29614.
  • Ubuntu 20.04 with May 2021 updates

Tensor Core Examples

The tensor core examples provided in GitHub focus on achieving the best performance and convergence by using the latest deep learning example networks and model scripts for training.

Each example model trains with mixed precision Tensor Cores on Volta, therefore you can get results much faster than training without tensor cores. This model is tested against each NGC monthly container release to ensure consistent accuracy and performance over time. This container includes the following tensor core examples.

Known Issues

Note: If you encounter functional or performance issues when XLA is enabled, please refer to the XLA Best Practices document. It offers pointers on how to diagnose symptoms and possibly address them.
  • TF1 and TF2 containers include a version of of Django with a known vulnerability that was discovered late in our QA process. See CVE-2021-31542 for details. This will be fixed in the next release.
  • The TF1 container includes a version of Pillow with known vulnerabilities discovered late in our QA process. See CVE-2021-25287, CVE-2021-28676, CVE-2021-28677, and CVE-2021-25288 for details. This will be fixed in the next release.
  • In certain cases, TensorFlow may claim too much memory on Pascal-based GPUs leading to failures due to OOM and potentially an application hang. This can be worked around by setting the environment variable TF_DEVICE_MIN_SYS_MEMORY_IN_MB to 675. This will be fixed in the 21.07 release.
  • A known regression can reduce the training performance of VGG-16 by up to 12% at certain batch sizes.
  • There is a known performance regression of up to 30% when training SSD models with fp32 data type on T4 GPUs. This will be addressed in a future release.
  • There is a known issue where attempting to convert some models using TF-TRT produces an error "Failed to import metagraph". This issue is still under investigation and will be resolved in a future release.