TensorFlow Release 19.08

The NVIDIA container image of TensorFlow, release 19.08, is available on NGC.

Contents of the TensorFlow container

This container image contains the complete source of the version of NVIDIA TensorFlow in /opt/tensorflow. It is pre-built and installed as a system Python module.

To achieve optimum TensorFlow performance, for image based training, the container includes a sample script that demonstrates efficient training of convolutional neural networks (CNNs). The sample script may need to be modified to fit your application.

Driver Requirements

Release 19.08 is based on NVIDIA CUDA 10.1.243, which requires NVIDIA Driver release 418.87. However, if you are running on Tesla (Tesla V100, Tesla P4, Tesla P40, or Tesla P100), you may use NVIDIA driver release 384.111+ or 410. The CUDA driver's compatibility package only supports particular drivers. For a complete list of supported drivers, see the CUDA Application Compatibility topic. For more information, see CUDA Compatibility and Upgrades.

GPU Requirements

Release 19.08 supports CUDA compute capability 6.0 and higher. This corresponds to GPUs in the Pascal, Volta, and Turing families. Specifically, for a list of GPUs that this compute capability corresponds to, see CUDA GPUs. For additional support details, see Deep Learning Frameworks Support Matrix.

Key Features and Enhancements

This TensorFlow release includes the following key features and enhancements.

Announcements

We will stop support for Python 2.7 in a future TensorFlow container release.

Accelerating Inference In TensorFlow With TensorRT (TF-TRT)

For step-by-step instructions on how to use TF-TRT, see Accelerating Inference In TensorFlow With TensorRT User Guide.
Key Features And Enhancements
  • Migrated TensorRT conversion sources from the contrib directory to the compiler directory in preparation for TensorFlow 2.0. The Python code can be found at //tensorflow/python/compiler/tensorrt.

  • Added a user friendly TrtGraphConverter API for TensorRT conversion.

  • Expanded support for TensorFlow operators in TensorRT conversion (for example, Gather, Slice, Pack, Unpack, ArgMin, ArgMax, DepthSpaceShuffle). Refer to the TF-TRT User Guide for a complete list of supported operators.

  • Support added for TensorFlow operator CombinedNonMaxSuppression in TensorRT conversion which significantly accelerates SSD object detection models.

  • Integrated TensorRT 5.1.5 into TensorFlow. See the TensorRT 5.1.5 Release Notes for a full list of new features.

Deprecated Features
  • The old API of TF-TRT is deprecated. It still works in TensorFlow 1.14, however, it may be removed in TensorFlow 2.0. The old API is a Python function named create_inference_graph which is not replaced by the Python class TrtGraphConverter with a number of methods. Refer to TF-TRT User Guide for more information about the API and how to use it.

Known Issues
  • Precision mode in the TF-TRT API is a string with one of the following values: FP32, FP16 or INT8. In TensorFlow 1.13, these strings were supported in lowercase, however, in TensorFlow 1.14 only uppercase is supported.

  • INT8 calibration (see the TF-TRT User Guide for more information about how to use INT8) is a very slow process that can take 1 hour depending on the model. We are working on optimizing this algorithm in TensorRT.

  • The pip package of TensorFlow 1.14 released by Google is missing TensorRT. This will be fixed in the next release of TensorFlow by Google. In the meantime, you can use the NVIDIA container for TensorFlow.

Automatic Mixed Precision (AMP)

Automatic mixed precision converts certain float32 operations to operate in float16 which can run much faster on Tensor Cores. Automatic mixed precision is built on two components:
  • a loss scaling optimizer
  • graph rewriter
For models already using a tf.train.Optimizer or tf.keras.optimizers.Optimizer for both compute_gradients() and apply_gradients() operations, automatic mixed precision can be enabled by wrapping the optimizer with tf.train.experimental.enable_mixed_precision_graph_rewrite(). For backward compatibility with AMP in previous containers, AMP can also be enabled by defining the following environment variable before calling the usual float32 training script:
export TF_ENABLE_AUTO_MIXED_PRECISION=1
Models implementing their own optimizers can use the graph rewriter on its own (while implementing loss scaling manually) by setting the following flag in the tf.session config:
config.graph_options.rewrite_options.auto_mixed_precision=1
Or equivalently for backward compatibility with AMP in previous NGC containers, by setting the following environment variable:
export TF_ENABLE_AUTO_MIXED_PRECISION_GRAPH_REWRITE=1

For more information about how to access and enable Automatic mixed precision for TensorFlow, see Automatic Mixed Precision Training In TensorFlow from the TensorFlow User Guide, along with Training With Mixed Precision.

Tensor Core Examples

These examples focus on achieving the best performance and convergence from NVIDIA Volta Tensor Cores by using the latest deep learning example networks for training.

Each example model trains with mixed precision Tensor Cores on Volta, therefore you can get results much faster than training without tensor cores. This model is tested against each NGC monthly container release to ensure consistent accuracy and performance over time. This container includes the following tensor core examples.

Known Issues

  • There is a known performance regression in TensorFlow 1.14.0 affecting a variety of models. Affected models include GNMT, SSD, and UNet. Performance regressions can be as high as 20% compared to TensorFlow 1.13.1 in the 19.06 release.
  • For BERT Large training with the 19.08 release on Tesla V100 boards with 16 GB memory, performance with batch size 3 per GPU is lower than expected; batch size 2 per GPU may be a better choice for this model on these GPUs with the 19.08 release. 32 GB GPUs are not affected.
  • TensorBoard has a bug in its IPv6 support which can result in the following error: Tensorboard could not bind to unsupported address family ::. To workaround this error, pass the --host <IP> flag when starting TensorBoard.
  • In previous containers, libtensorflow_framework.so was available in the /usr/local/lib/tensorflow directory. This was redundant with the libs installed with the TensorFlow pip package. To find the TensorFlow lib directory, use tf.sysconfig.get_lib().
  • Automatic Mixed Precision (AMP) does not support the Keras LearningRateScheduler in the 19.08 release. A fix will be included in a future release.
  • A known issue in TensorFlow results in the error Cannot take the length of Shape with unknown rank when training variable sized images with the Keras model.fit API. Details are provided here and a fix will be available in a future release.
  • Support for CUDNN float32 Tensor Op Math mode first introduced in the 18.09 release is now deprecated in favor of Automatic Mixed Precision. It is scheduled to be removed after the 19.11 release.