Accelerating TensorFlow 1.10 With TensorRT 5.0.0 RC Using The 18.09 Or 18.10 Container

These release notes are for accelerating TensorFlow 1.10 with TensorRT version 5.0.0 Release Candidate (RC) using the TensorFlow 18.09 or TensorFlow 18.10 container. For specific details about TensorRT, see the TensorRT 5.0.0 RC Release Notes.

Key Features and Enhancements

This release includes the following key features and enhancements.
  • New examples at nvidia-examples/tftrt with good accuracy and performance.

  • Built TF-TRT with TensorRT 5.0.0 which introduces the new TensorRT APIs into TF-TRT.

  • In 18.10, we added support for the TensorFlow operator RELU6 (using Relu6(x) = min(Relu(x), 6)).

  • In 18.10, we made improvements in the image classification example, such as bug fixes and using the dynamic_op feature.

Compatibility

Limitations Of Accelerating TensorFlow With TensorRT

There are some limitations you may experience after accelerating TensorFlow 1.10 with TensorRT 5.0.0 RC, such as:
  • Not all the new TensorRT 5.0.0 features are supported yet in TF-TRT including INT8 quantization ranges and the plugins registry.

  • We have only tested image classification models with TF-TRT including the ones we have provided in our examples inside the container (nvidia-examples/tftrt). This means object detection, translation (convolutional and recurrent based) are not yet supported due to either functionality or performance limitations.

  • TF-TRT has an implementation of optimizing the TensorFlow graph by specifying appropriate TensorFlow session arguments without using the Python TF-TRT API (create_inference_graph), however, we have not thoroughly tested this functionality yet, therefore, we don’t support it.

  • In 18.09, TF-TRT has an implementation of the dynamic conversion of a TensorFlow graph, but we have not thoroughly tested this functionality yet, therefore, we don’t support it.

Deprecated Features

  • Support for accelerating TensorFlow with TensorRT 3.x will be removed in a future release (likely TensorFlow 1.13). The generated plan files are not portable across platforms or TensorRT versions. Plans are specific to the exact GPU model they were built on (in addition to platforms and the TensorRT version) and must be retargeted to the specific GPU in case you want to run them on a different GPU. Therefore, models that were accelerated using TensorRT 3.x will no longer run. If you have a production model that was accelerated with TensorRT 3.x, you will need to convert your model with TensorRT 4.x or later again.

    For more information, see the Note in Serializing A Model In C++ or Serializing A Model In Python.

Known Issues

  • Running inference with batch sizes larger than the maximum batch size is not supported by TensorRT.

  • Due to certain logs (errors or warnings) of TF-TRT, they could be misleading and point to the TensorRT graph as broken while it’s not. It is recommended to check whether there is any TensorRT op in the graph (the type of op is TRTEngineOp). If there is not TensorRT ops in the graph, that means no conversion has happened and the inference should fall back to the native TensorFlow. Currently, the best way to verify whether a frozen graph resulting from the conversion is not broken is to run inference on it and check the accuracy of the results.

  • There are operators that are not supported by either TensorRT or the conversion algorithm. The convertor is supposed to skip these ops but this skip may not happen properly due to bugs. One way to get around this problem is to increase the value of the minimum_segment_size parameter and hope that the subgraphs that contain those ops are too small and remain out of the conversion.

  • We have observed functionality problems in optimizing:
    • NASNet models with TF-TRT in FP16 precision mode.
    • ResNet, MobileNet, and NASNet models with TF-TRT in INT8 precision mode.
    Note:TF-TRT cannot optimize certain models such as ResNet in INT8 precision mode because of a lacking feature in TensorRT regarding the dimensionality of tensors. Usually, increasing the value of minimum_segment_size is a workaround by removing those unsupported dimensions out of the TensorRT sub-graph.

  • TF-TRT doesn’t work with TensorFlow Lite due to a TensorRT bug that causes Flatbuffer symbols to be exposed. This means you cannot import both tf.contrib.tensorrt and tf.lite in the same process.

  • We have observed a bit low accuracy on image classification models with TF-TRT on Jetson AGX Xavier.

  • INT8 calibration on mobilenet_v1 and mobilenet_v2 using TF-TRT fails if the calibration dataset has only one element.