Inference Server Release 18.04 Beta
The NVIDIA container image of the Inference Server, release 18.04, is available as a beta release.
Contents of the Inference Server
This container image contains the Inference Server executable in /opt/inference_server.
The container also includes the following:
- Ubuntu 16.04 including Python 2.7 environment
- NVIDIA CUDA 9.0.176 (see Errata section and 2.1) including CUDA® Basic Linear Algebra Subroutines library™ (cuBLAS) 9.0.333 (see section 2.3.1)
- NVIDIA CUDA® Deep Neural Network library™ (cuDNN) 7.1.1
- NCCL 2.1.15 (optimized for NVLink™ )
Key Features and Enhancements
This Inference Server release includes the following key features and
enhancements.
- This is the beta release of the Inference Server container.
- The Inference Server container image version 18.04 is based on NVIDIA Inference Server 0.1.0 beta.
- Multiple model support. The Inference Server can manage any number and mix of models (limited by system disk and memory resources). Supports TensorRT and TensorFlow GraphDef model formats.
- Multi-GPU support. The server can distribute inferencing across all system GPUs.
- Multi-tenancy support. Multiple models (or multiple instances of the same model) can run simultaneously on the same GPU.
- Batching support.
- Latest version of NCCL 2.1.15
- Ubuntu 16.04 with March 2018 updates
Known Issues
This is a beta release of the Inference Server. All features are expected to be available, however, some aspects of functionality and performance will likely be limited compared to a non-beta release.