Triton Inference Server Release 19.04
The TensorRT Inference Server container image, release 19.04, is available on NGC and is open source on GitHub.
Contents of the Triton inference server container
The TensorRT Inference Server Docker image contains the inference server executable and related shared libraries in /opt/tensorrtserver.
- Ubuntu 16.04
- NVIDIA CUDA 10.1.105 including cuBLAS 10.1.0.105
- NVIDIA cuDNN 7.5.0
- NVIDIA NCCL 2.4.6 (optimized for NVLink™ )
- OpenMPI 3.1.3
- TensorRT 5.1.2 RC
Driver Requirements
Release 19.04 is based on CUDA 10.1, which requires NVIDIA Driver release 418.xx.x+. However, if you are running on Tesla (Tesla V100, Tesla P4, Tesla P40, or Tesla P100), you may use NVIDIA driver release 384.111+ or 410. The CUDA driver's compatibility package only supports particular drivers. For a complete list of supported drivers, see the CUDA Application Compatibility topic. For more information, see CUDA Compatibility and Upgrades.
GPU Requirements
Release 19.04 supports CUDA compute capability 6.0 and higher. This corresponds to GPUs in the Pascal, Volta, and Turing families. Specifically, for a list of GPUs that this compute capability corresponds to, see CUDA GPUs. For additional support details, see Deep Learning Frameworks Support Matrix.
Key Features and Enhancements
- The inference server container image version 19.04 is based on NVIDIA TensorRT Inference Server 1.1.0, TensorFlow 1.13.1, and Caffe2 0.8.2.
- Latest version of NVIDIA NCCL 2.4.6
- Latest version of cuBLAS 10.1.0.105
- Client libraries and examples now build with a separate Makefile (a Dockerfile is also included for convenience).
- Input or output tensors with variable-size dimensions (indicated by -1 in the model configuration) can now represent tensors where the variable dimension has value 0 (zero).
- Zero-sized input and output tensors are now supported for batching models. This enables the inference server to support models that require inputs and outputs that have shape [ batch-size ].
- TensorFlow custom operations (C++) can now be built into the inference server. An example and documentation are included in this release.
- Ubuntu 16.04 with March 2019 updates
Known Issues
There are no known issues in this release.