TensorFlow Release 18.09
The NVIDIA container image of TensorFlow, release 18.09, is available.
Contents of TensorFlow
This container image contains the complete source of the version of NVIDIA TensorFlow in
/opt/tensorflow. It is pre-built and installed as a system Python module.
To achieve optimum TensorFlow performance, for image based training, the container includes a sample script that demonstrates efficient training of convolutional neural networks (CNNs). The sample script may need to be modified to fit your application. The container also includes the following:
- Ubuntu 16.04
- NVIDIA CUDA 10.0.130 including CUDA® Basic Linear Algebra Subroutines library™ (cuBLAS) 10.0.130
- NVIDIA CUDA® Deep Neural Network library™ (cuDNN) 7.3.0
- NCCL 2.3.4 (optimized for NVLink™ )
- Horovod™ 0.13.10
- OpenMPI 3.0.0
- TensorBoard 1.10.0
- MLNX_OFED 3.4
- OpenSeq2Seq v18.09 at commit 694a230
- TensorRT 5.0.0 RC
- DALI 0.2 Beta
Release 18.09 is based on CUDA 10, which requires NVIDIA Driver release 410.xx. However, if you are running on Tesla (Tesla V100, Tesla P4, Tesla P40, or Tesla P100), you may use NVIDIA driver release 384. For more information, see CUDA Compatibility and Upgrades.
Key Features and Enhancements
- TensorFlow container image version 18.09 is based on TensorFlow 1.10.0.
- Latest version of cuDNN 7.3.0.
- Latest version of CUDA 10.0.130 which includes support for DGX-2, Turing, and Jetson Xavier.
- Latest version of cuBLAS 10.0.130.
- Latest version of NCCL 2.3.4.
- Latest version of TensorRT 5.0.0 RC.
- Latest version of TensorBoard 1.10.0.
- Latest version of DALI 0.2 Beta
- Added support for CUDNN float32 Tensor Op Math mode, which enables float32 models to use Tensor Cores on supported hardware, at the cost of reduced precision. This is disabled by default, but can be enabled by setting the environment variables
TF_ENABLE_CUDNN_TENSOR_OP_MATH_FP32=1(for convolutions) or
TF_ENABLE_CUDNN_RNN_TENSOR_OP_MATH_FP32=1(for RNNs that use the
cudnn_rnnop). This feature is currently considered experimental.
- Renamed the existing environment variable
When using any of the
TF_ENABLE_*_TENSOR_OP_MATH_FP32environment variables, it is recommended that models also use loss scaling to avoid numerical issues during training. For more information about loss scaling, see Training With Mixed Precision.
tf.contrib.layers.layer_normby adding a
use_fused_batch_normparameter that improves performance. This parameter is disabled by default, but can be enabled by setting it to
- Ubuntu 16.04 with August 2018 updates
Accelerating Inference In TensorFlow With TensorRT (TF-TRT)
For step-by-step instructions on how to use TF-TRT, see Accelerating Inference In TensorFlow With TensorRT User Guide.
- Key Features And Enhancements
New examples at
nvidia-examples/tftrtwith good accuracy and performance.
Built TF-TRT with TensorRT 5.0.0 which introduces the new TensorRT APIs into TF-TRT.
Not all the new TensorRT 5.0.0 features are supported yet in TF-TRT including INT8 quantization ranges and the plugins registry.
We have only tested image classification models with TF-TRT including the ones we have provided in our examples inside the container (
nvidia-examples/tftrt). This means object detection, translation (convolutional and recurrent based) are not yet supported due to either functionality or performance limitations.
TF-TRT has an implementation of optimizing the TensorFlow graph by specifying appropriate TensorFlow session arguments without using the Python TF-TRT API (
create_inference_graph), however, we have not thoroughly tested this functionality yet, therefore, we don’t support it.
TF-TRT has an implementation of the dynamic conversion of a TensorFlow graph, but we have not thoroughly tested this functionality yet, therefore, we don’t support it.
- Known Issues
Running inference with batch sizes larger than the maximum batch size is not supported by TensorRT.
Due to certain logs (errors or warnings) of TF-TRT, they could be misleading and point to the TensorRT graph as broken while it’s not. It is recommended to check whether there is any TensorRT op in the graph (the type of op is
TRTEngineOp). If there is not TensorRT ops in the graph, that means no conversion has happened and the inference should fall back to the native TensorFlow. Currently, the best way to verify whether a frozen graph resulting from the conversion is not broken is to run inference on it and check the accuracy of the results.
There are operators that are not supported by either TensorRT or the conversion algorithm. The convertor is supposed to skip these ops but this skip may not happen properly due to bugs. One way to get around this problem is to increase the value of the
minimum_segment_sizeparameter and hope that the subgraphs that contain those ops are too small and remain out of the conversion.
We have observed functionality problems in optimizing:
- NASNet models with TF-TRT in FP16 precision mode.
- ResNet, MobileNet, and NASNet models with TF-TRT in INT8 precision mode.
TF-TRT cannot optimize certain models such as ResNet in INT8 precision mode because of a lacking feature in TensorRT regarding the dimensionality of tensors. Usually, increasing the value of
minimum_segment_sizeis a workaround by removing those unsupported dimensions out of the TensorRT sub-graph.
TF-TRT doesn’t work with TensorFlow Lite due to a TensorRT bug that causes Flatbuffer symbols to be exposed. This means you cannot import both
tf.litein the same process.
We have observed a bit low accuracy on image classification models with TF-TRT on Jetson AGX Xavier.
INT8 calibration on
mobilenet_v2using TF-TRT fails if the calibration dataset has only one element.
Support for accelerating TensorFlow with TensorRT 3.x will be removed in a future release (likely TensorFlow 1.13). The generated plan files are not portable across platforms or TensorRT versions. Plans are specific to the exact GPU model they were built on (in addition to the platforms and the TensorRT version) and must be retargeted to the specific GPU in case you want to run them on a different GPU. Therefore, models that were accelerated using TensorRT 3.x will no longer run. If you have a production model that was accelerated with TensorRT 3.x, you will need to convert your model with TensorRT 4.x or later again.
- OpenSeq2Seq is only supported in the Python 3 container.
build_imagenet_datascripts have a missing dependency on the
axelapplication. This can be resolved by issuing the following command:
apt-get update && apt-get install axel