Triton Inference Server Release 19.03

The TensorRT Inference Server container image, release 19.03, is available on NGC and is open source on GitHub.

Contents of the Triton inference server container

The TensorRT Inference Server Docker image contains the inference server executable and related shared libraries in /opt/tensorrtserver.

The container also includes the following:

Driver Requirements

Release 19.03 is based on CUDA 10.1, which requires NVIDIA Driver release 418.xx+. However, if you are running on Tesla (Tesla V100, Tesla P4, Tesla P40, or Tesla P100), you may use NVIDIA driver release 384.111+ or 410. The CUDA driver's compatibility package only supports particular drivers. For a complete list of supported drivers, see the CUDA Application Compatibility topic. For more information, see CUDA Compatibility and Upgrades.

GPU Requirements

Release 19.03 supports CUDA compute capability 6.0 and higher. This corresponds to GPUs in the Pascal, Volta, and Turing families. Specifically, for a list of GPUs that this compute capability corresponds to, see CUDA GPUs. For additional support details, see Deep Learning Frameworks Support Matrix.

Key Features and Enhancements

This Inference Server release includes the following key features and enhancements.
  • The inference server container image version 19.03 is based on NVIDIA TensorRT Inference Server 1.0.0, TensorFlow 1.13.1, and Caffe2 0.8.2.
  • 19.03 is the first GA release of TensorRT Inference Server. See the README at the GitHub project for information on backwards-compatibility guarantees for this and future releases.
  • Added support for “stateful” models and backends that require multiple inference requests be routed to the same model instance/batch slot. The new sequence batcher provides scheduling and batching capabilities for this class of models.
  • Added GRPC streaming protocol support for inference requests.
  • HTTP front-end is now asynchronous to enable lower-latency and higher-throughput handling of inference requests.
  • Enhanced perf_client to support “stateful”/sequence models and backends.
  • Latest version of NVIDIA CUDA 10.1.105 including cuBLAS 10.1.105
  • Latest version of NVIDIA cuDNN 7.5.0
  • Latest version of NVIDIA NCCL 2.4.3
  • Latest version of TensorRT 5.1.2 RC
  • Ubuntu 16.04 with February 2019 updates

Known Issues

  • If using or upgrading to a 3-part-version driver, for example, a driver that takes the format of xxx.yy.zz, you will receive a Failed to detect NVIDIA driver version. message. This is due to a known bug in the entry point script's parsing of 3-part driver versions. This message is non-fatal and can be ignored. This will be fixed in the 19.04 release.