TensorFlow Release 17.09 (PDF)

The NVIDIA container image of TensorFlow, release 17.09, is available.

TensorFlow container image version 17.09 is based on TensorFlow 1.3.0.

## Contents of TensorFlow

This container image contains the complete source of the version of NVIDIA TensorFlow in /opt/tensorflow. It is pre-built and installed into the /usr/local/[bin,lib] directories in the container image.

To achieve optimum TensorFlow performance, for image based training, the container includes a sample script that demonstrates efficient training of convolutional neural networks (CNNs). The sample script may need to be modified to fit your application. The container also includes the following:

## Driver Requirements

Release 17.09 is based on CUDA 9, which requires NVIDIA Driver release 384.xx.

## Key Features and Enhancements

This TensorFlow release includes the following key features and enhancements.

• Tensor Core operation support in TensorFlow is enabled by default on Volta for FP16 convolutions and matrix multiplies, which should give a speedup for FP16 models.
• FP16 training in nvidia-examples/cnn/nvcnn.py
• FP16 input/output in the fused batch normalization operation (tf.nn.fused_batch_norm)
• Tensor Core operation in FP16 convolutions and matrix multiplications
• Added the TF_ENABLE_TENSOR_OP_MATH parameter which enables and disables Tensor Core operation (defaults to enabled).
• Tensor Core operation in FP32 matrix multiplications
• Added the TF_ENABLE_TENSOR_OP_MATH_FP32 parameter which enables and disables Tensor Core operation for float32 matrix multiplications (defaults to disabled because it reduces precision).
• Increased the TF_AUTOTUNE_THRESHOLD parameter which improves auto-tune stability.
• Increased the CUDA_DEVICE_MAX_CONNECTIONS parameter which solves performance issues related to streams on Tesla K80 GPUs.
• Enhancements to nvidia-examples/cnn/nvcnn.py
• Fixed a bug where the final layer was wrong when running in evaluation mode.
• Changed is_training to a constant instead of a placeholder for better performance and reduced memory use.
• Merged gradients for all layers into a single NCCL call for better performance.
• Disabled use of XLA by default for better performance.
• Disabled zero_debias_moving_mean in batch normalization operation.