Triton Inference Server Overview

The NVIDIA® Triton™ Inference Server provides a cloud and edge inferencing solution that is optimized for CPUs and GPUs. Triton supports an HTTP/REST and GRPC protocol that allows remote clients to request inferencing for any model that is being managed by the server. For edge deployments, Triton is available as a shared library with a C API that allows the complete Triton functionality to be included directly in an application.

The Triton Inference Server is in the Triton Inference Server container. There are additional C++ and Python client libraries that are external to the container, and you can find additional documentation at GitHub: Inference Server.

This document describes the key features, software enhancements and improvements, known issues, and how to run this container.