Accelerating TensorFlow 1.8 With TensorRT 4.0.1 Using The 18.06 Or 18.07 Container

These release notes are for accelerating TensorFlow 1.8 with TensorRT version 4.0.1 using either the TensorFlow TensorFlow 18.06 or TensorFlow 18.07 container. For specific details about TensorRT, see the TensorRT 4.0.1 Release Notes.

Key Features and Enhancements

This release includes the following key features and enhancements.
  • Added TensorRT 4.0 API support with extended layer support. This support includes the FullyConnected layer and BatchedMatMul op.

  • Resource management added, where memory allocation is uniformly managed by TensorFlow.

  • Bug fixes and better error handling in conversion.


Limitations Of Accelerating TensorFlow With TensorRT

There are some limitations you may experience after accelerating TensorFlow 1.8 with TensorRT 4.0.1, such as:
  • TensorRT conversion relies on static shape inference, where the frozen graph should provide explicit dimension on all ranks other than the first batch dimension.

  • Batchsize for converted TensorRT engines are fixed at conversion time. Inference can only run with batchsize smaller than the specified number.

  • Current supported models are limited to CNNs. Object detection models and RNNs are not yet supported.

Deprecated Features

In the 18.05 container, you need to create a TensorFlow session with the per_process_gpu_memory_fraction option. With the resource management fully integrated, you no longer need to reserve GPU memory from TensorFlow. Therefore, the option is not necessary for mixed TensorFlow-TensorRT (TF-TRT) model.

Known Issues

  • Input tensors are required to have rank 4 for quantization mode (INT8 precision).