Using Triton Inference Server as a shared library for execution on Jetson

Using Triton Inference Server as a shared library for execution on Jetson#

Overview#

This project demonstrates how to run C API applications using Triton Inference Server as a shared library. We also show how to build and execute such applications on Jetson.

Prerequisites#

JetPack >= 4.6
OpenCV >= 4.1.1
TensorRT >= 8.0.1.6

Installation#

Follow the installation instructions from the GitHub release page (https://github.com/triton-inference-server/server/releases/).

In our example, we placed the contents of downloaded release directory under /opt/tritonserver.

Part 1. Concurrent inference and dynamic batching#

The purpose of the sample located under concurrency_and_dynamic_batching is to demonstrate the important features of Triton Inference Server such as concurrent model execution and dynamic batching. In order to do that, we implemented a people detection application using C API and Triton Inference Server as a shared library.

Part 2. Analyzing model performance with perf_analyzer#

To analyze model performance on Jetson, perf_analyzer tool is used. The perf_analyzer is included in the release tar file or can be compiled from source.

From this directory of the repository, execute the following to evaluate model performance:

./perf_analyzer -m peoplenet -b 2 --service-kind=triton_c_api --model-repo=$(pwd)/concurrency_and_dynamic_batching/trtis_model_repo_sample_1 --triton-server-directory=/opt/tritonserver --concurrency-range 1:6 -f perf_c_api.csv

In the example above we saved the results as a .csv file. To visualize these results, follow the steps described here.

NVIDIA Triton Inference Server