inference server Release 18.08 Beta

The NVIDIA container image of the inference server, release 18.08, is available as a beta release.

Contents of the inference server

This container image contains the inference server executable in /opt/inference_server.

The container also includes the following:

Driver Requirements

Release 18.08 is based on CUDA 9, which requires NVIDIA Driver release 384.xx.

Key Features and Enhancements

This inference server release includes the following key features and enhancements.
  • The inference server container image version 18.08 is based on NVIDIA inference server 0.5.0 beta, TensorFlow 1.9.0, and Caffe2 0.8.1.
  • Latest version of cuDNN 7.2.1.
  • Added support for Kubernetes compatible ready and live endpoints.
  • Added support for Prometheus metrics. Load metric is reported that can be used for Kubernetes-style auto-scaling.
  • Enhance example perf_client application to generate latency vs. inferences/second results.
  • Improve performance of TensorRT models by allowing multiple TensorRT model instances to execute simultaneously.
  • Improve HTTP client performance by reusing connections for multiple inference requests.
  • Ubuntu 16.04 with July 2018 updates

Known Issues

  • This is a beta release of the inference server. All features are expected to be available, however, some aspects of functionality and performance will likely be limited compared to a non-beta release.
  • There is a known performance regression in the inference benchmarks for ResNet-50. We haven't seen this regression in the inference benchmarks for VGG or training benchmarks for any network. The cause of the regression is still under investigation.