Installing the Server

The Triton Inference Server is available as a pre-built Docker container or you can build it from source.

Installing Prebuilt Containers

The inference server is provided as a pre-built container on the NVIDIA GPU Cloud (NGC).

Before you can pull a container from the NGC container registry, you must have Docker and nvidia-docker installed. For DGX users, this is explained in Preparing to use NVIDIA Containers Getting Started Guide. For users other than DGX, follow the nvidia-docker installation documentation to install the most recent version of CUDA, Docker, and nvidia-docker.

After performing the above setup, you can pull the Triton Inference Server container using the following command:

docker pull nvcr.io/nvidia/tritonserver:20.03.1-py3

Replace 20.03.1 with the version of inference server that you want to pull.