Release Notes :: NVIDIA Deep Learning Triton Inference Server Documentation

The TensorRT Inference Server container image, release 19.04, is available on NGC and is open source on GitHub.

Contents of the Triton inference server container

The TensorRT Inference Server Docker image contains the inference server executable and related shared libraries in /opt/tensorrtserver.

The container also includes the following:

Ubuntu 16.04
NVIDIA CUDA 10.1.105 including cuBLAS 10.1.0.105
NVIDIA cuDNN 7.5.0
NVIDIA NCCL 2.4.6 (optimized for NVLink™ )
OpenMPI 3.1.3
TensorRT 5.1.2 RC

Driver Requirements

Release 19.04 is based on CUDA 10.1, which requires NVIDIA Driver release 418.xx.x+. However, if you are running on Tesla (Tesla V100, Tesla P4, Tesla P40, or Tesla P100), you may use NVIDIA driver release 384.111+ or 410. The CUDA driver's compatibility package only supports particular drivers. For a complete list of supported drivers, see the CUDA Application Compatibility topic. For more information, see CUDA Compatibility and Upgrades.

GPU Requirements

Release 19.04 supports CUDA compute capability 6.0 and higher. This corresponds to GPUs in the Pascal, Volta, and Turing families. Specifically, for a list of GPUs that this compute capability corresponds to, see CUDA GPUs. For additional support details, see Deep Learning Frameworks Support Matrix.

Key Features and Enhancements

This Inference Server release includes the following key features and enhancements.

The inference server container image version 19.04 is based on NVIDIA TensorRT Inference Server 1.1.0, TensorFlow 1.13.1, and Caffe2 0.8.2.
Latest version of NVIDIA NCCL 2.4.6
Latest version of cuBLAS 10.1.0.105
Client libraries and examples now build with a separate Makefile (a Dockerfile is also included for convenience).
Input or output tensors with variable-size dimensions (indicated by -1 in the model configuration) can now represent tensors where the variable dimension has value 0 (zero).
Zero-sized input and output tensors are now supported for batching models. This enables the inference server to support models that require inputs and outputs that have shape [ batch-size ].
TensorFlow custom operations (C++) can now be built into the inference server. An example and documentation are included in this release.
Ubuntu 16.04 with March 2019 updates

Known Issues

There are no known issues in this release.

Triton Inference Server Release 19.04

Contents of the Triton inference server container

Driver Requirements

GPU Requirements

Key Features and Enhancements

Known Issues