TensorFlow Release 18.05
The NVIDIA container image of TensorFlow, release 18.05, is available.
Contents of TensorFlow
This container image contains the complete source of the version of NVIDIA TensorFlow in
/opt/tensorflow. It is pre-built and installed as a system Python module.
To achieve optimum TensorFlow performance, for image based training, the container includes a sample script that demonstrates efficient training of convolutional neural networks (CNNs). The sample script may need to be modified to fit your application. The container also includes the following:
- Ubuntu 16.04
Note: Container image
18.05-py2contains Python 2.7;
18.05-py3contains Python 3.5.
- NVIDIA CUDA 9.0.176 (see Errata section and 2.1) including CUDA® Basic Linear Algebra Subroutines library™ (cuBLAS) 9.0.333 (see section 2.3.1)
- NVIDIA CUDA® Deep Neural Network library™ (cuDNN) 7.1.2
- NCCL 2.1.15 (optimized for NVLink™ )
- Horovod™ 0.12.1
- OpenMPI™ 3.0.0
- TensorBoard 1.7.0
- MLNX_OFED 3.4
- OpenSeq2Seq v0.2
Release 18.05 is based on CUDA 9, which requires NVIDIA Driver release 384.xx.
Key Features and Enhancements
This TensorFlow release includes the following key features and enhancements.
- TensorFlow container image version 18.05 is based on TensorFlow 1.7.0.
- For developers needing more visibility between network layer calls and CUDA kernel calls, we've added support for basic NVTX ranges to the TensorFlow executor. Nsight Systems or the NVIDIA Visual Profiler, with NVTX ranges, are able to display each TensorFlow op demarcated by an NVTX range named by the op. NVTX ranges are enabled by default but can be disabled by setting the environment variable
- Optimized input pipeline in
nvcnn_hvd.pyby casting back to uint8 immediately after image preprocessing.
- Added OpenSeq2Seq v0.2 to the base container.
- Includes integration with TensorRT 3.0.4
- Ubuntu 16.04 with April 2018 updates
Accelerating Inference In TensorFlow With TensorRT (TF-TRT)
For step-by-step instructions on how to use TF-TRT, see Accelerating Inference In TensorFlow With TensorRT User Guide.
Support for accelerating TensorFlow with TensorRT 3.x will be removed in a future release (likely TensorFlow 1.13). The generated plan files are not portable across platforms or TensorRT versions. Plans are specific to the exact GPU model they were built on (in addition to the platforms and the TensorRT version) and must be retargeted to the specific GPU in case you want to run them on a different GPU. Therefore, models that were accelerated using TensorRT 3.x will no longer run. If you have a production model that was accelerated with TensorRT 3.x, you will need to convert your model with TensorRT 4.x or later again.
For more information, see the Note in Serializing A Model In C++ or Serializing A Model In Python.
- Key Features And Enhancements
TensorRT backend accelerates inference performance for frozen TensorFlow models.
Automatic segmenter that recognizes TensorRT compatible subgraphs and converts them into TensorRT engines. TensorRT engines are wrapped with TensorFlow custom ops that moves the execution of the subgraph to TensorRT backend for optimized performance, while fall back to TensorFlow for non-TensorRT compatible ops.
Supported networks are slim classification networks including ResNet, VGG, and Inception.
Mixed precision and quantization are supported.
Conversion relies on static shape inference, where the frozen graph should provide explicit dimension on all ranks other than the first batch dimension.
Batchsize for converted TensorRT engines are fixed at conversion time. Inference can only run with batchsize smaller than the specified number.
Current supported models are limited to CNNs. Object detection models and RNNs are not yet supported.
Resource management is not integrated, therefore, ensure you limit the memory claimed by TensorFlow in order for TensorRT to acquire the necessary resource. To limit the memory, use
setting per_process_gpu_memory_fraction to < 1.0and pass it to session creation, for example:
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.333) sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))
- Deprecated Features
In the 18.05 container, you need to create a TensorFlow session with the
per_process_gpu_memory_fractionoption. With the resource management fully integrated, you no longer need to reserve GPU memory from TensorFlow. Therefore, the option is not necessary for mixed TensorFlow-TensorRT (TF-TRT) model.
- Known Issues
The TensorRT engine only accepts input tensor with
rank == 4.
Starting with the next major version of CUDA release, we will no longer provide Python 2 containers and will only maintain Python 3 containers.
There are no known issues in this release.