TensorRT inference server Release 18.08 Beta
The NVIDIA container image of the TensorRT inference server, release 18.08, is available as a beta release.
Contents of the TensorRT inference server
This container image contains the TensorRT inference server executable in /opt/inference_server.
The container also includes the following:
- Ubuntu 16.04
- NVIDIA CUDA 9.0.176 (see Errata section and 2.1) including CUDA® Basic Linear Algebra Subroutines library™ (cuBLAS) 9.0.425
- NVIDIA CUDA® Deep Neural Network library™ (cuDNN) 7.2.1
- NCCL 2.2.13 (optimized for NVLink™ )
- TensorRT 4.0.1
Key Features and Enhancements
This TensorRT inference server release includes the following key features and
enhancements.
- The TensorRT inference server container image version 18.08 is based on NVIDIA Inference Server 0.5.0 Beta and TensorFlow 1.9.0 and Caffe2 0.8.1.
- Latest version of cuDNN 7.2.1.
- Added support for Kubernetes compatible ready and live endpoints.
- Added support for Prometheus metrics. Load metric is reported that can be used for Kubernetes-style auto-scaling.
- Enhance example perf_client application to generate latency vs. inferences/second results.
- Improve performance of TensorRT models by allowing multiple TensorRT model instances to execute simultaneously.
- Improve HTTP client performance by reusing connections for multiple inference requests.
- Ubuntu 16.04 with July 2018 updates