TensorFlow Release 18.11
The NVIDIA container image of TensorFlow, release 18.11, is available.
Contents of TensorFlow
This container image contains the complete source of the version of NVIDIA TensorFlow in /opt/tensorflow
. It is pre-built and installed as a system Python module.
To achieve optimum TensorFlow performance, for image based training, the container includes a sample script that demonstrates efficient training of convolutional neural networks (CNNs). The sample script may need to be modified to fit your application. The container also includes the following:
- Ubuntu 16.04
Note:
Container image
18.11-py2
contains Python 2.7;18.11-py3
contains Python 3.5. - NVIDIA CUDA 10.0.130 including CUDA® Basic Linear Algebra Subroutines library™ (cuBLAS) 10.0.130
- NVIDIA CUDA® Deep Neural Network library™ (cuDNN) 7.4.1
- NCCL 2.3.7 (optimized for NVLink™ )
- Horovod 0.15.1
- OpenMPI 3.1.2
- TensorBoard 1.12.0
- MLNX_OFED 3.4
- OpenSeq2Seq v18.11 at commit 4b95346
- TensorRT 5.0.2
- DALI 0.4.1 Beta
Driver Requirements
Release 18.11 is based on CUDA 10, which requires NVIDIA Driver release 410.xx. However, if you are running on Tesla (Tesla V100, Tesla P4, Tesla P40, or Tesla P100), you may use NVIDIA driver release 384. For more information, see CUDA Compatibility and Upgrades.
Key Features and Enhancements
This TensorFlow release includes the following key features and enhancements.
- TensorFlow container image version 18.11 is based on TensorFlow 1.12.0-rc2.
- Latest version of Horovod 0.15.1.
- Latest version of NCCL 2.3.7.
- Latest version of NVIDIA cuDNN 7.4.1.
- Latest version of TensorRT 5.0.2
- Latest version of DALI 0.4.1 Beta.
- Bug fixes and improvements for TensorFlow-TensorRT (TF-TRT) integration.
- Added an object detection example to
workspace/nvidia-examples/inference/object-detection
. - Ubuntu 16.04 with October 2018 updates
Accelerating Inference In TensorFlow With TensorRT (TF-TRT)
For step-by-step instructions on how to use TF-TRT, see Accelerating Inference In TensorFlow With TensorRT User Guide.
- Key Features And Enhancements
-
-
Added support for dilated convolution.
-
Fixed a bug in the
Identity
op. -
Fixed a bug in the
Relu6
op. -
Support added to allow empty const tensor.
-
Added object detection example to
nvidia-examples/inference
.
-
- Known Issues
-
-
In the TF-TRT API, the
minimum_segment_size
argument default value is 3. In the image classification examples undernvidia-examples/inference
, we define a command line argument forminimum_segment_size
which has its own default value. In 18.10, the default value was7
and in 18.11 we changed it to2
. Smaller values for this argument would cause to convert more TensorFlow nodes to TensorRT which typically should improve the performance, however, we have observed cases where the performance gets worse. In particular, Resnet-50 with smaller batch sizes gets slower withminimum_segment_size=2
comparing tominimum_segment_size=7
.
-
Announcements
Support for accelerating TensorFlow with TensorRT 3.x will be removed in a future release (likely TensorFlow 1.13). The generated plan files are not portable across platforms or TensorRT versions. Plans are specific to the exact GPU model they were built on (in addition to the platforms and the TensorRT version) and must be retargeted to the specific GPU in case you want to run them on a different GPU. Therefore, models that were accelerated using TensorRT 3.x will no longer run. If you have a production model that was accelerated with TensorRT 3.x, you will need to convert your model with TensorRT 4.x or later again.
For more information, see the Note in Serializing A Model In C++ or Serializing A Model In Python.
Known Issues
OpenSeq2Seq is only supported in the Python 3 container.