Triton Inference Server Overview

The NVIDIA Triton Inference Server provides a cloud inferencing solution optimized for NVIDIA GPUs.

The Triton Inference Server provides an inference service via an HTTP or GRPC endpoint, allowing remote clients to request inferencing for any model that is being managed by the server.

The Triton Inference Server itself is included in the Triton Inference Server container. External to the container, there are additional C++ and Python client libraries, and additional documentation at GitHub: Inference Server.

This document describes the key features, software enhancements and improvements, any known issues, and how to run this container.