TensorFlow Release 19.10
The NVIDIA container image of TensorFlow, release 19.10, is available on NGC.
Contents of the TensorFlow container
This container image contains the complete source of the version of NVIDIA TensorFlow in
/opt/tensorflow. It is pre-built and installed as a system Python module.
To achieve optimum TensorFlow performance, for image based training, the container includes a sample script that demonstrates efficient training of convolutional neural networks (CNNs). The sample script may need to be modified to fit your application. The container also includes the following:
- Ubuntu 18.04
- NVIDIA CUDA 10.1.243 including cuBLAS 10.2.1.243
- NVIDIA cuDNN 7.6.4
- NVIDIA NCCL 2.4.8 (optimized for NVLink™ )
- Horovod 0.18.1
- OpenMPI 3.1.4
- TensorBoard 1.14.0+nv
- OpenSeq2Seq at commit 2e0b1d8
- TensorRT 6.0.1
- DALI 0.14.0 Beta
- DLProf 19.10
- Nsight Compute 2019.4.0
- Nsight Systems 2019.5.1
- Tensor Core optimized example:
- Jupyter and JupyterLab:
Release 19.10 is based on NVIDIA CUDA 10.1.243, which requires NVIDIA Driver release 418.xx. However, if you are running on Tesla (for example, T4 or any other Tesla board), you may use NVIDIA driver release 396, 384.111+ or 410. However, if you are running on Tesla (Tesla V100, Tesla P4, Tesla P40, or Tesla P100), you may use NVIDIA driver release 384.111+ or 410. The CUDA driver's compatibility package only supports particular drivers. For a complete list of supported drivers, see the CUDA Application Compatibility topic. For more information, see CUDA Compatibility and Upgrades.
Release 19.10 supports CUDA compute capability 6.0 and higher. This corresponds to GPUs in the Pascal, Volta, and Turing families. Specifically, for a list of GPUs that this compute capability corresponds to, see CUDA GPUs. For additional support details, see Deep Learning Frameworks Support Matrix.
Key Features and Enhancements
- TensorFlow container image version 19.10 is based on TensorFlow 1.14.0.
- Latest version of NVIDIA cuDNN 7.6.4
- Latest version of Horovod 0.18.1
- Latest version of DALI 0.14.0 Beta
- Latest version of DLProf 19.10
- Latest versions of Nsight Systems 2019.5.1
- Latest versions of Jupyter Client 5.3.3
- Dilated convolutions will now be evaluated using cuDNN by default.
- Automatic Mixed Precision will correctly handle
- Automatic Mixed Precision can now evaluate softmax and activation functions in FP16.
- Ubuntu 18.04 with September 2019 updates
Accelerating Inference In TensorFlow With TensorRT (TF-TRT)
For step-by-step instructions on how to use TF-TRT, see Accelerating Inference In TensorFlow With TensorRT User Guide.
- Deprecated Features
The old API of TF-TRT is deprecated. It still works in TensorFlow 1.14 and 1.15, however, it is removed in TensorFlow 2.0. The old API is a Python function named
create_inference_graphwhich is now replaced by the Python class
TrtGraphConverterwith a number of methods. Refer to TF-TRT User Guide for more information about the API and how to use it.
- Known Issues
TensorRT INT8 calibration algorithm (see the TF-TRT User Guide for more information about how to use INT8) is very slow for certain models such as NASNet and Inception. We are working on optimizing the calibration algorithm in TensorRT.
The pip package of TensorFlow 1.14 released by Google is missing TensorRT. This will be fixed in the next release of TensorFlow by Google. In the meantime, you can use the more recent versions of TensorFlow pip packages released by Google (1.15 and 2.0) or the NVIDIA container for TensorFlow.
The accuracy of Faster RCNN with the backbone ResNet-50 using TensorRT6.0 INT8 calibration is lower than expected. We are investigating the issue.
The following sentence that appears in the log of TensorRT 6.0 can be safely ignored. This will be removed in the future releases of TensorRT.
Calling isShapeTensor before the entire network is constructed may result in an inaccurate result.
Automatic Mixed Precision (AMP)
- a loss scaling optimizer
- graph rewriter
For models already using an optimizer from
tf.keras.optimizers for both
apply_gradients() operations (for example, by calling
model.fit(), automatic mixed precision can be enabled by wrapping the optimizer with
For more information on this function, see the TensorFlow documentation here.
For backward compatibility with previous container releases, AMP can also be enabled for
tf.train optimizers by defining the following environment variable:
For more information about how to access and enable Automatic mixed precision for TensorFlow, see Automatic Mixed Precision Training In TensorFlow from the TensorFlow User Guide, along with Training With Mixed Precision.
Tensor Core Examples
The tensor core examples provided in GitHub focus on achieving the best performance and convergence by using the latest deep learning example networks and model scripts for training. Each example model trains with mixed precision Tensor Cores on Volta, therefore you can get results much faster than training without tensor cores. This model is tested against each NGC monthly container release to ensure consistent accuracy and performance over time. This container includes the following tensor core examples.
- U-Net Medical model. The U-Net model is a convolutional neural network for 2D image segmentation. This repository contains a U-Net implementation as described in the paper U-Net: Convolutional Networks for Biomedical Image Segmentation, without any alteration. This model script is available on GitHub as well as NVIDIA GPU Cloud (NGC).
- SSD320 v1.2 model. The SSD320 v1.2 model is based on the SSD: Single Shot MultiBox Detector paper, which describes an SSD as “a method for detecting objects in images using a single deep neural network”. Our implementation is based on the existing model from the TensorFlow models repository. This model script is available on GitHub as well as NVIDIA GPU Cloud (NGC).
- Neural Collaborative Filtering (NCF) model. The NCF model is a neural network that provides collaborative filtering based on implicit feedback, specifically, it provides product recommendations based on user and item interactions. The training data for this model should contain a sequence of user ID, item ID pairs indicating that the specified user has interacted with, for example, was given a rating to or clicked on, the specified item. This model script is available on GitHub as well as NVIDIA GPU Cloud (NGC).
- BERT model. BERT, or Bidirectional Encoder Representations from Transformers, is a new method of pre-training language representations which obtains state-of-the-art results on a wide array of Natural Language Processing (NLP) tasks. This model is based on BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding paper. NVIDIA's BERT is an optimized version of Google's official implementation, leveraging mixed precision arithmetic and Tensor Cores on V100 GPUS for faster training times while maintaining target accuracy. This model script is available on GitHub as well as NVIDIA GPU Cloud (NGC).
- U-Net Industrial Defect Segmentation model. This U-Net model is adapted from the original version of the U-Net model which is a convolutional auto-encoder for 2D image segmentation. U-Net was first introduced by Olaf Ronneberger, Philip Fischer, and Thomas Brox in the paper: U-Net: Convolutional Networks for Biomedical Image Segmentation. This work proposes a modified version of U-Net, called TinyUNet which performs efficiently and with very high accuracy on the industrial anomaly dataset DAGM2007. This model script is available on GitHub as well as NVIDIA GPU Cloud (NGC).
- GNMT v2 model. The GNMT v2 model is similar to the one discussed in the Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation paper. The most important difference between the two models is in the attention mechanism. In our model, the output from the first LSTM layer of the decoder goes into the attention module, then the re-weighted context is concatenated with inputs to all subsequent LSTM layers in the decoder at the current timestep. This model script is available on GitHub as well as NVIDIA GPU Cloud (NGC).
- ResNet-50 v1.5 model. The ResNet-50 v1.5 model is a modified version of the original ResNet-50 v1 model. The difference between v1 and v1.5 is in the bottleneck blocks which requires downsampling, for example, v1 has stride = 2 in the first 1x1 convolution, whereas v1.5 has stride = 2 in the 3x3 convolution. The following features were implemented in this model; data-parallel multi-GPU training with Horovod, Tensor Cores (mixed precision) training, and static loss scaling for Tensor Cores (mixed precision) training. This model script is available on GitHub as well as NVIDIA GPU Cloud (NGC).
- There are known issues regarding TF-TRT INT8 accuracy issues. See the Accelerating Inference In TensorFlow With TensorRT (TF-TRT) section above for more information.
- There is a known performance regression in TensorFlow 1.14.0 affecting a variety of models. Affected models include GNMT, SSD, and NCF. Performance regressions can be as high as 20% compared to TensorFlow 1.13.1 in the 19.06 release.
- For BERT Large training with the 19.08 release on Tesla V100 boards with 16 GB memory, performance with batch size 3 per GPU is lower than expected; batch size 2 per GPU may be a better choice for this model on these GPUs with the 19.08 release. 32 GB GPUs are not affected.
- TensorBoard has a bug in its IPv6 support which can result in the following error:
Tensorboard could not bind to unsupported address family ::. To workaround this error, pass the
--host <IP>flag when starting TensorBoard.
- Automatic Mixed Precision (AMP) does not support the Keras
LearningRateSchedulerin the 19.08 release. A fix will be included in a future release.
- A known issue in TensorFlow results in the error
Cannot take the length of Shape with unknown rankwhen training variable sized images with the Keras
model.fitAPI. Details are provided here and a fix will be available in a future release.
- Support for CUDNN float32 Tensor Op Math mode first introduced in the 18.09 release is now deprecated in favor of Automatic Mixed Precision. It is scheduled to be removed after the 19.11 release.
- There is a known issue when your NVIDIA driver release is older than 418.xx in the 19.10 release, the Nsight Systems profiling tool (for example, the
nsys) might cause
CUDA runtime API error. A fix will be included in a future release.