TensorFlow Release 21.12

NVIDIA Optimized Frameworks (Latest Release) Download PDF

The NVIDIA container image of TensorFlow, release 21.12, is available on NGC.

Contents of the TensorFlow container

This container image includes the complete source of the NVIDIA version of TensorFlow in /opt/tensorflow. It is pre-built and installed as a system Python module.

To achieve optimum TensorFlow performance, for image based training, the container includes a sample script that demonstrates efficient training of convolutional neural networks (CNNs). The sample script may need to be modified to fit your application. The container also includes the following:

Driver Requirements

Release 21.12 is based on NVIDIA CUDA 11.5.0, which requires NVIDIA Driver release 495 or later. However, if you are running on a Data Center GPU (for example, T4 or any other Tesla board), you may use NVIDIA driver release 418.40 (or later R418), 440.33 (or later R440), 450.51 (or later R450), 460.27 (or later R460), or 470.57 (or later R470). The CUDA driver's compatibility package only supports particular drivers. For a complete list of supported drivers, see the CUDA Application Compatibility topic. For more information, see CUDA Compatibility and Upgrades and NVIDIA CUDA and Drivers Support.

GPU Requirements

Release 21.12 supports CUDA compute capability 6.0 and higher. This corresponds to GPUs in the NVIDIA Pascal, Volta, Turing, and Ampere Architecture GPU families. Specifically, for a list of GPUs that this compute capability corresponds to, see CUDA GPUs. For additional support details, see Deep Learning Frameworks Support Matrix.

Key Features and Enhancements

This TensorFlow release includes the following key features and enhancements.

  • TensorFlow container images version 21.12 are based on Tensorflow 1.15.5 and 2.6.2.
  • The environment variable TF_DISABLE_REDUCED_PRECISION_REDUCTION=1 can now be set to disable intermediate reductions in lower precision than the requested math type.
  • Patched the following CVEs in TensorFlow 1.15.5: CVE-2021-29571, CVE-2021-29592, CVE-2021-29601, CVE-2021-29608, CVE-2021-29609, CVE-2021-29613, CVE-2021-22876, CVE-2021-22897, CVE-2021-22898, CVE-2021-22901, CVE-2021-37636, CVE-2021-37640, CVE-2021-37642, CVE-2021-37644, CVE-2021-37646, CVE-2021-37653, CVE-2021-37660, CVE-2021-37661, CVE-2021-37668, CVE-2021-37669, CVE-2021-37670, CVE-2021-37672, CVE-2021-37673, CVE-2021-37674, CVE-2021-37675, CVE-2021-37684, CVE-2021-37686, CVE-2021-37690, CVE-2021-37691, CVE-2021-41195, CVE-2021-41196, CVE-2021-41197, CVE-2021-41198, CVE-2021-41199, CVE-2021-41200, CVE-2021-41201, CVE-2021-41202, CVE-2021-41203, CVE-2021-41204, CVE-2021-41206, CVE-2021-41207, CVE-2021-41208, CVE-2021-41213, CVE-2021-41215, CVE-2021-41216, CVE-2021-41217, CVE-2021-41218, CVE-2021-41219, CVE-2021-41221, CVE-2021-41222, CVE-2021-41223, CVE-2021-41224, CVE-2021-41225, CVE-2021-41228, CVE-2021-22922, CVE-2021-22923, CVE-2021-22924, CVE-2021-22925, CVE-2021-22926.


  • DLProf v1.8, which is included in the 21.12 container, will be the last release of DLProf. Starting with the 22.01 container, DLProf will no longer be included. It can still be manually installed via a pip wheel on the nvidia-pyindex.
  • Starting with the 21.10 release, a beta version of the TensorFlow 1 and 2 containers is available for the Arm SBSA platform. Pulling the Docker image nvcr.io/nvidia/tensorflow:21.12-tf2-py3 on an Arm SBSA machine will automatically fetch the Arm-specific image.
  • The TensorCore example models are no longer provided in the core container (previously shipped in /workspace/nvidia-examples). Instead they can be obtained from Github or the NVIDIA GPU Cloud (NGC). Some python packages, included in previous containers to support these example models, have also been removed. Depending on their specific use cases, users may need to add some packages that were previously pre-installed.
  • Support for SLURM PMI2 is deprecated and will be removed after the 21.12 release. PMIX is supported by the container, but is not supported by default in SLURM. Users depending on SLURM integration may need to configure SLURM for PMIX in the base OS as appropriate to their OS distribution (for Ubuntu 20.04, the required package is slurm-wlm-basic-plugins).

NVIDIA TensorFlow Container Versions

The following table shows what versions of Ubuntu, CUDA, TensorFlow, and TensorRT are supported in each of the NVIDIA containers for TensorFlow. For older container versions, refer to the Frameworks Support Matrix.

Tensor Core Examples

The tensor core examples provided in GitHub focus on achieving the best performance and convergence by using the latest deep learning example networks and model scripts for training. Each example model trains with mixed precision Tensor Cores on Volta, therefore you can get results much faster than training without Tensor Cores. This model is tested against each NGC monthly container release to ensure consistent accuracy and performance over time.

Known Issues


If you encounter functional or performance issues when XLA is enabled, please refer to the XLA Best Practices document. It offers pointers on how to diagnose symptoms and possibly address them.

  • For TensorFlow 1.15, TF-TRT inference throughput may regress for certain models by up to 37% compared to the 21.06-tf1 release. This will be fixed in a future release.
  • A CUDNN performance regression can cause slowdowns of up to 15% in certain ResNet models. This will be fixed in a future release.
  • There is a known performance regression affecting UNet Medical 3D model training by up to 23%. This will be addressed in a future release.
  • TF-TRT native segment fallback has a known issue causing a crash. This will occur when using TF-TRT to convert a model with a subgraph that is converted to TensorRT but fails to build. Instead of falling back to native TensorFlow TF-TRT will crash. Using export TF_TRT_OP_DENYLIST="ProblematicOp" can help to prevent conversion of an OP causing a native segment fallback.
  • The version of OpenUCX included with TensorFlow container image version 21.11 has known issues with RAPIDS UCX-Py. When using Dask with this container version, pass protocol="tcp" to LocalCUDACluster(), not protocol="ucx", to work around these issues. Additionally, LocalCUDACluster UCX-specific configurations must remain unspecified; they are: enable_tcp_over_ucx, enable_nvlink, enable_infiniband, enable_rdmacm and ucx_net_devices.
© Copyright 2024, NVIDIA. Last updated on Jul 3, 2024.