Quickstart

The Triton Inference Server is available in two ways:

Prerequisites

Regardless of which method you choose (starting with a pre-built container from NGC or building from source), you must perform the following prerequisite steps:

  • Clone the Triton Inference Server GitHub repo. Even if you choose to get the pre-built Triton from NGC, you need the GitHub repo for the example model repository. Go to https://github.com/NVIDIA/triton-inference-server and then select the clone or download drop down button. After cloning the repo be sure to select the r<xx.yy> release branch that corresponds to the version of Triton you want to use:

    $ git checkout r20.07
    
  • Create a model repository containing one or more models that you want Triton to serve. An example model repository is included in the docs/examples/model_repository directory of the GitHub repo. Before using the repository, you must fetch any missing model definition files from their public model zoos via the provided docs/examples/fetch_models.sh script:

    $ cd docs/examples
    $ ./fetch_models.sh
    

If you are starting with a pre-built NGC container perform these additional steps:

Using A Prebuilt Docker Container

Use docker pull to get the Triton Inference Server container from NGC:

$ docker pull nvcr.io/nvidia/tritonserver:<xx.yy>-py3

Where <xx.yy> is the version of Triton that you want to pull. Once you have the container follow these steps to run Triton and the example client applications.

  1. Run Triton.

  2. Verify that Triton is running correct.

  3. Get the example client applications.

  4. Run the image classification example.

Building With Docker

Make sure you complete the steps in Prerequisites before attempting to build Triton. To build Triton from source, change to the root directory of the GitHub repo and checkout the release version of the branch that you want to build (or the master branch if you want to build the under-development version):

$ git checkout r20.07

Then use docker to build:

$ docker build --pull -t tritonserver .

After the build completes follow these steps to run Triton and the example client applications.

  1. Run Triton.

  2. Verify that Triton is running correct.

  3. Get the example client applications.

  4. Run the image classification example.

Building With CMake

Make sure you complete the steps in Prerequisites before attempting to build Triton. To build with CMake you must decide which features of Triton you want, build any required dependencies, and the lastly build the Triton itself. See Building Triton with CMake for details on how to build with CMake.

After the build completes follow these steps to run Triton and the example client applications.

  1. Run Triton.

  2. Verify that Triton is running correct.

  3. Get the example client applications.

  4. Run the image classification example.

Run Triton Inference Server

Assuming the example model repository is available in /full/path/to/example/model/repository, if you built using Docker use the following command to run the Triton container:

$ docker run --gpus=1 --rm --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 -p8000:8000 -p8001:8001 -p8002:8002 -v/full/path/to/example/model/repository:/models <docker image> tritonserver --model-repository=/models

Where <docker image> is nvcr.io/nvidia/tritonserver:<xx.yy>-py3 if you pulled the Triton container from NGC, or is tritonserver if you built Triton from source.

If you built using CMake run Triton directly on your host system:

$ tritonserver --model-repository=/full/path/to/example/model/repository

In either case, after you start Triton you will see output on the console showing the server starting up and loading the model. When you see output like the following, Triton is ready to accept inference requests:

I0828 23:42:45.635957 1 main.cc:417] Starting endpoints, 'inference:0' listening on
I0828 23:42:45.649580 1 grpc_server.cc:1730] Started GRPCInferenceService at 0.0.0.0:8001
I0828 23:42:45.649647 1 http_server.cc:1125] Started HTTPService at 0.0.0.0:8000
I0828 23:42:45.693758 1 http_server.cc:1139] Started Metrics Service at 0.0.0.0:8002

For more information, see Running Triton.

Verify Triton Is Running Correctly

Use Triton’s ready endpoint to verify that the server and the models are ready for inference. From the host system use curl to access the HTTP endpoint that indicates server status. For example:

$ curl -v localhost:8000/v2/health/ready
...
< HTTP/1.1 200 OK
< Content-Length: 0
< Content-Type: text/plain

The HTTP request returns status 200 if Triton is ready and non-200 if it is not ready. For more information, see Checking Triton Status.

Getting The Client Examples

Use docker pull to get the client libraries and examples container from NGC:

$ docker pull nvcr.io/nvidia/tritonserver:<xx.yy>-py3-clientsdk

Where <xx.yy> is the version that you want to pull. Run the client image:

$ docker run -it --rm --net=host nvcr.io/nvidia/tritonserver:<xx.yy>-py3-clientsdk

It is also possible to build the client examples with or without Docker. For more information, see Getting the Client Examples.

Running The Image Classification Example

From within the nvcr.io/nvidia/tritonserver:<xx.yy>-py3-clientsdk image, run the example image-client application to perform image classification using the example resnet50_netdef.

To send a request for the resnet50_netdef (Caffe2) model from the example model repository for an image from the /workspace/images directory:

$ /workspace/install/bin/image_client -m resnet50_netdef -s INCEPTION /workspace/images/mug.jpg
Request 0, batch size 1
Image 'images/mug.jpg':
    0.723992 (504) = COFFEE MUG

The Python version of image_client accepts the same command-line arguments:

$ python /workspace/install/python/image_client.py -m resnet50_netdef -s INCEPTION /workspace/images/mug.jpg
Request 1, batch size 1
    0.777365 (504) = COFFEE MUG

For more information, see Image Classification Example Application.