Installing the Server

The TensorRT Inference Server is available as a pre-built Docker container or you can build it from source.

Installing Prebuilt Containers

The inference server is provided as a pre-built container on the NVIDIA GPU Cloud (NGC). Before pulling the container you must have access and be logged into the NGC container registry as explained in the NGC Getting Started Guide.

Before you can pull a container from the NGC container registry, you must have Docker and nvidia-docker installed. For DGX users, this is explained in Preparing to use NVIDIA Containers Getting Started Guide. For users other than DGX, follow the nvidia-docker installation documentation to install the most recent version of CUDA, Docker, and nvidia-docker.

After performing the above setup, you can pull the TensorRT Inference Server container using the following command:

docker pull nvcr.io/nvidia/tensorrtserver:18.11-py3

Replace 18.11 with the version of the inference server that you want to pull.