TensorFlow Release 17.09

NVIDIA Optimized Frameworks (Latest Release) Download PDF

The NVIDIA container image of TensorFlow, release 17.09, is available.

TensorFlow container image version 17.09 is based on TensorFlow 1.3.0.

Contents of TensorFlow

This container image contains the complete source of the version of NVIDIA TensorFlow in /opt/tensorflow. It is pre-built and installed into the /usr/local/[bin,lib] directories in the container image.

To achieve optimum TensorFlow performance, for image based training, the container includes a sample script that demonstrates efficient training of convolutional neural networks (CNNs). The sample script may need to be modified to fit your application. The container also includes the following:

Driver Requirements

Release 17.09 is based on CUDA 9, which requires NVIDIA Driver release 384.xx.

Key Features and Enhancements

This TensorFlow release includes the following key features and enhancements.

  • Tensor Core operation support in TensorFlow is enabled by default on Volta for FP16 convolutions and matrix multiplies, which should give a speedup for FP16 models.
  • Added experimental support for:
    • FP16 training in nvidia-examples/cnn/nvcnn.py
    • FP16 input/output in the fused batch normalization operation (tf.nn.fused_batch_norm)
    • Tensor Core operation in FP16 convolutions and matrix multiplications
      • Added the TF_ENABLE_TENSOR_OP_MATH parameter which enables and disables Tensor Core operation (defaults to enabled).
    • Tensor Core operation in FP32 matrix multiplications
      • Added the TF_ENABLE_TENSOR_OP_MATH_FP32 parameter which enables and disables Tensor Core operation for float32 matrix multiplications (defaults to disabled because it reduces precision).
  • Increased the TF_AUTOTUNE_THRESHOLD parameter which improves auto-tune stability.
  • Increased the CUDA_DEVICE_MAX_CONNECTIONS parameter which solves performance issues related to streams on Tesla K80 GPUs.
  • Enhancements to nvidia-examples/cnn/nvcnn.py
    • Fixed a bug where the final layer was wrong when running in evaluation mode.
    • Changed is_training to a constant instead of a placeholder for better performance and reduced memory use.
    • Merged gradients for all layers into a single NCCL call for better performance.
    • Disabled use of XLA by default for better performance.
    • Disabled zero_debias_moving_mean in batch normalization operation.
  • Latest version of CUDA
  • Latest version of cuDNN
  • Latest version of NCCL
  • Ubuntu 16.04 with August 2017 updates

Known Issues

There are no known issues in this release.

© Copyright 2024, NVIDIA. Last updated on Jul 3, 2024.