TensorFlow Release 18.05

NVIDIA Docs Hub NVIDIA Optimized Frameworks NVIDIA Optimized Frameworks TensorFlow Release 18.05

TensorFlow Release 18.05 (PDF)

The NVIDIA container image of TensorFlow, release 18.05, is available.

Contents of TensorFlow

This container image contains the complete source of the version of NVIDIA TensorFlow in /opt/tensorflow. It is pre-built and installed as a system Python module.

To achieve optimum TensorFlow performance, for image based training, the container includes a sample script that demonstrates efficient training of convolutional neural networks (CNNs). The sample script may need to be modified to fit your application. The container also includes the following:

Ubuntu 16.04

Note:

Container image 18.05-py2 contains Python 2.7; 18.05-py3 contains Python 3.5.
NVIDIA CUDA 9.0.176 (see Errata section and 2.1) including CUDA^® Basic Linear Algebra Subroutines library™ (cuBLAS) 9.0.333 (see section 2.3.1)
NVIDIA CUDA^® Deep Neural Network library™ (cuDNN) 7.1.2
NCCL 2.1.15 (optimized for NVLink™ )
Horovod™ 0.12.1
OpenMPI™ 3.0.0
TensorBoard 1.7.0
MLNX_OFED 3.4
OpenSeq2Seq v0.2

Driver Requirements

Release 18.05 is based on CUDA 9, which requires NVIDIA Driver release 384.xx.

Key Features and Enhancements

This TensorFlow release includes the following key features and enhancements.

TensorFlow container image version 18.05 is based on TensorFlow 1.7.0.
For developers needing more visibility between network layer calls and CUDA kernel calls, we've added support for basic NVTX ranges to the TensorFlow executor. Nsight Systems or the NVIDIA Visual Profiler, with NVTX ranges, are able to display each TensorFlow op demarcated by an NVTX range named by the op. NVTX ranges are enabled by default but can be disabled by setting the environment variable TF_DISABLE_NVTX_RANGES=1.
Optimized input pipeline in nvcnn.py and nvcnn_hvd.py by casting back to uint8 immediately after image preprocessing.
Added OpenSeq2Seq v0.2 to the base container.
Includes integration with TensorRT 3.0.4
Ubuntu 16.04 with April 2018 updates

Accelerating Inference In TensorFlow With TensorRT (TF-TRT)

For step-by-step instructions on how to use TF-TRT, see Accelerating Inference In TensorFlow With TensorRT User Guide.

Attention:

Support for accelerating TensorFlow with TensorRT 3.x will be removed in a future release (likely TensorFlow 1.13). The generated plan files are not portable across platforms or TensorRT versions. Plans are specific to the exact GPU model they were built on (in addition to the platforms and the TensorRT version) and must be retargeted to the specific GPU in case you want to run them on a different GPU. Therefore, models that were accelerated using TensorRT 3.x will no longer run. If you have a production model that was accelerated with TensorRT 3.x, you will need to convert your model with TensorRT 4.x or later again.

For more information, see the Note in Serializing A Model In C++ or Serializing A Model In Python.

Key Features And Enhancements

TensorRT backend accelerates inference performance for frozen TensorFlow models.
Automatic segmenter that recognizes TensorRT compatible subgraphs and converts them into TensorRT engines. TensorRT engines are wrapped with TensorFlow custom ops that moves the execution of the subgraph to TensorRT backend for optimized performance, while fall back to TensorFlow for non-TensorRT compatible ops.
Supported networks are slim classification networks including ResNet, VGG, and Inception.
Mixed precision and quantization are supported.

Limitations

Conversion relies on static shape inference, where the frozen graph should provide explicit dimension on all ranks other than the first batch dimension.
Batchsize for converted TensorRT engines are fixed at conversion time. Inference can only run with batchsize smaller than the specified number.
Current supported models are limited to CNNs. Object detection models and RNNs are not yet supported.
Resource management is not integrated, therefore, ensure you limit the memory claimed by TensorFlow in order for TensorRT to acquire the necessary resource. To limit the memory, use setting per_process_gpu_memory_fraction to < 1.0 and pass it to session creation, for example:
Copy

Copied!
```
            
            gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.333) sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))
        
```

Deprecated Features

In the 18.05 container, you need to create a TensorFlow session with the per_process_gpu_memory_fraction option. With the resource management fully integrated, you no longer need to reserve GPU memory from TensorFlow. Therefore, the option is not necessary for mixed TensorFlow-TensorRT (TF-TRT) model.

Known Issues

The TensorRT engine only accepts input tensor with rank == 4.

Announcements

Starting with the next major version of CUDA release, we will no longer provide Python 2 containers and will only maintain Python 3 containers.

Known Issues

There are no known issues in this release.