Accelerating TensorFlow 1.7 With TensorRT 3.0.4 Using The 18.05 Container

These release notes are for accelerating TensorFlow 1.7 with TensorRT version 3.0.4 using the TensorFlow 18.05 container. For specific details about TensorRT, see the TensorRT 3.0.4 Release Notes.
Attention: Support for accelerating TensorFlow with TensorRT 3.x will be removed in a future release (likely TensorFlow 1.13). The generated plan files are not portable across platforms or TensorRT versions. Plans are specific to the exact GPU model they were built on (in addition to platforms and the TensorRT version) and must be retargeted to the specific GPU in case you want to run them on a different GPU. Therefore, models that were accelerated using TensorRT 3.x will no longer run. If you have a production model that was accelerated with TensorRT 3.x, you will need to convert your model with TensorRT 4.x or later again.

For more information, see the Note in Serializing A Model In C++ or Serializing A Model In Python.

Key Features and Enhancements

This release includes the following key features and enhancements.
  • TensorRT backend accelerates inference performance for frozen TensorFlow models.

  • Automatic segmenter that recognizes TensorRT compatible subgraphs and converts them into TensorRT engines. TensorRT engines are wrapped with TensorFlow custom ops that moves the execution of the subgraph to TensorRT backend for optimized performance, while fall back to TensorFlow for non-TensorRT compatible ops.

  • Supported networks are slim classification networks including ResNet, VGG, and Inception.

  • Mixed precision and quantization are supported.

Compatibility

Limitations Of Accelerating TensorFlow With TensorRT

There are some limitations you may experience after accelerating TensorFlow 1.7 with TensorRT 3.0.4, such as:
  • Conversion relies on static shape inference, where the frozen graph should provide explicit dimension on all ranks other than the first batch dimension.

  • Batchsize for converted TensorRT engines are fixed at conversion time. Inference can only run with batchsize smaller than the specified number.

  • Current supported models are limited to CNNs. Object detection models and RNNs are not yet supported.

  • Resource management is not integrated, therefore, ensure you limit the memory claimed by TensorFlow in order for TensorRT to acquire the necessary resource. To limit the memory, use setting per_process_gpu_memory_fraction to < 1.0 and pass it to session creation, for example:
    gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.333) sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))

Known Issues

  • The TensorRT engine only accepts input tensor with rank == 4..