TensorFlow Release 18.08
The NVIDIA container image of TensorFlow, release 18.08, is available.
Contents of TensorFlow
This container image contains the complete source of the version of NVIDIA TensorFlow in
/opt/tensorflow. It is pre-built and installed as a system Python module.
To achieve optimum TensorFlow performance, for image based training, the container includes a sample script that demonstrates efficient training of convolutional neural networks (CNNs). The sample script may need to be modified to fit your application. The container also includes the following:
- Ubuntu 16.04
- NVIDIA CUDA 9.0.176 (see Errata section and 2.1) including CUDA® Basic Linear Algebra Subroutines library™ (cuBLAS) 9.0.425
- NVIDIA CUDA® Deep Neural Network library™ (cuDNN) 7.2.1
- NCCL 2.2.13 (optimized for NVLink™ )
- Horovod™ 0.12.1
- OpenMPI™ 3.0.0
- TensorBoard 1.9.0
- MLNX_OFED 3.4
- OpenSeq2Seq v0.5 at commit 83e96551.
- TensorRT 4.0.1
- DALI 0.1.2 Beta
Release 18.08 is based on CUDA 9, which requires NVIDIA Driver release 384.xx.
Key Features and Enhancements
- TensorFlow container image version 18.08 is based on TensorFlow 1.9.0.
- Latest version of cuDNN 7.2.1.
- Latest version of DALI 0.1.2 Beta.
- Latest version of TensorBoard 1.9.0.
- Added experimental support for float16 data type in Horovod, allowing functions such as
all_reduceto accept tensors in float16 precision. (This functionality is not yet integrated into multi-GPU training examples).
- Ubuntu 16.04 with July 2018 updates
Accelerating Inference In TensorFlow With TensorRT (TF-TRT)
For step-by-step instructions on how to use TF-TRT, see Accelerating Inference In TensorFlow With TensorRT User Guide.
- Key Features And Enhancements
TensorRT conversion has been integrated into optimization pass. The
tensorflow/contrib/tensorrt/test/test_tftrt.pyscript has an example showing the use of optimization pass.
TensorRT conversion relies on static shape inference, where the frozen graph should provide explicit dimension on all ranks other than the first batch dimension.
Batchsize for converted TensorRT engines are fixed at conversion time. Inference can only run with batchsize smaller than the specified number.
Current supported models are limited to CNNs. Object detection models and RNNs are not yet supported.
Current optimization pass does not support INT8 yet.
- Known Issues
Input tensors are required to have rank 4 for quantization mode (INT8 precision).
Starting with the next major version of CUDA release, we will no longer provide updated Python 2 containers and will only update Python 3 containers.
- The DALI integrated ResNet-50 samples in the 18.08 NGC TensorFlow container has lower than expected accuracy and performance results. We are working to address the issue in the next release.
- There is a known performance regression in the inference benchmarks for ResNet-50. We haven't seen this regression in the inference benchmarks for VGG or training benchmarks for any network. The cause of the regression is still under investigation.