TensorFlow Release 17.09
The NVIDIA container image of TensorFlow, release 17.09, is available.
TensorFlow container image version 17.09 is based on TensorFlow 1.3.0.
Contents of TensorFlow
This container image contains the complete source of the version of NVIDIA TensorFlow in /opt/tensorflow
. It is pre-built and installed into the /usr/local/[bin,lib]
directories in the container image.
To achieve optimum TensorFlow performance, for image based training, the container includes a sample script that demonstrates efficient training of convolutional neural networks (CNNs). The sample script may need to be modified to fit your application. The container also includes the following:
- Ubuntu 16.04
- NVIDIA CUDA® 9.0
- NVIDIA CUDA® Deep Neural Network library™ (cuDNN) 7.0.2
- NVIDIA® Collective Communications Library ™ (NCCL) 2.0.5 (optimized for NVLink™ )
Driver Requirements
Release 17.09 is based on CUDA 9, which requires NVIDIA Driver release 384.xx.
Key Features and Enhancements
This TensorFlow release includes the following key features and enhancements.
- Tensor Core operation support in TensorFlow is enabled by default on Volta for FP16 convolutions and matrix multiplies, which should give a speedup for FP16 models.
- Added experimental support for:
- FP16 training in
nvidia-examples/cnn/nvcnn.py
- FP16 input/output in the fused batch normalization operation (
tf.nn.fused_batch_norm
) - Tensor Core operation in FP16 convolutions and matrix multiplications
- Added the TF_ENABLE_TENSOR_OP_MATH parameter which enables and disables Tensor Core operation (defaults to enabled).
- Tensor Core operation in FP32 matrix multiplications
- Added the TF_ENABLE_TENSOR_OP_MATH_FP32 parameter which enables and disables Tensor Core operation for float32 matrix multiplications (defaults to disabled because it reduces precision).
- FP16 training in
- Increased the TF_AUTOTUNE_THRESHOLD parameter which improves auto-tune stability.
- Increased the CUDA_DEVICE_MAX_CONNECTIONS parameter which solves performance issues related to streams on Tesla K80 GPUs.
- Enhancements to
nvidia-examples/cnn/nvcnn.py
- Fixed a bug where the final layer was wrong when running in evaluation mode.
- Changed
is_training
to a constant instead of a placeholder for better performance and reduced memory use. - Merged gradients for all layers into a single NCCL call for better performance.
- Disabled use of XLA by default for better performance.
- Disabled
zero_debias_moving_mean
in batch normalization operation.
- Latest version of CUDA
- Latest version of cuDNN
- Latest version of NCCL
- Ubuntu 16.04 with August 2017 updates
Known Issues
There are no known issues in this release.