Building

The TensorRT Inference Server can be built in two ways:

  • Build using Docker and the TensorFlow and PyTorch containers from NVIDIA GPU Cloud (NGC). Before building you must install Docker and nvidia-docker and login to the NGC registry by following the instructions in Installing Prebuilt Containers.

  • Build using CMake and the dependencies (for example, TensorFlow or TensorRT library) that you build or install yourself.

Building the Server with Docker

To build a release version of the TensorRT Inference Server container, change directory to the root of the repo and checkout the release version of the branch that you want to build (or the master branch if you want to build the under-development version):

$ git checkout r19.07

Then use docker to build:

$ docker build --pull -t tensorrtserver .

Incremental Builds with Docker

For typical development you will want to run the build container with your local repo’s source files mounted so that your local changes can be incrementally built. This is done by first building the tensorrtserver_build container:

$ docker build --pull -t tensorrtserver_build --target trtserver_build .

By mounting /path/to/tensorrtserver/src into the container at /workspace/src, changes to your local repo will be reflected in the container:

$ nvidia-docker run -it --rm -v/path/to/tensorrtserver/src:/workspace/src tensorrtserver_build

Within the container you can perform an incremental server build with:

# cd /workspace/builddir
# make -j16 trtis

The first build attempt may fail with a “protobuf_generate_cpp” error. If that happens simply run the make command again and the build should succeed. When the build completes the binary, libraries and headers can be found in trtis/install. To overwrite the existing versions:

# cp trtis/install/bin/trtserver /opt/tensorrtserver/bin/.
# cp trtis/install/lib/libtrtserver.so /opt/tensorrtserver/lib/.

You can reconfigure the build by running cmake as described in Building the Server with CMake.

Building the Server with CMake

To build a release version of the TensorRT Inference Server with CMake, change directory to the root of the repo and checkout the release version of the branch that you want to build (or the master branch if you want to build the under-development version):

$ git checkout r19.07

Next you must build or install each framework backend you want to enable in the inference server, configure the inference server to enable the desired features, and finally build the server.

Dependencies

To include GPU support in the inference server you must install the necessary CUDA libraries. Similarly, to include support for a particular framework backend, you must build the appropriate libraries for that framework and make them available to the inference server build. In general, the Dockerfile build steps guide how each of these frameworks can be built for use in the interence server.

CUDA, cuBLAS, cuDNN

For the inference server to support NVIDIA GPUs you must install CUDA, cuBLAS and cuDNN. These libraries must be installed on system include and library paths so that they are available for the CMake build. The version of the libraries used in the Dockerfile build can be found in the Framework Containers Support Matrix.

For a given version of the inference server you can attempt to build with non-supported versions of the libraries but you may have build or execution issues since non-supported versions are not tested.

Once you have CUDA, cuBLAS and cuDNN installed you can enable GPUs with the CMake option -DTRTIS_ENABLE_GPU=ON as described below.

TensorRT

The TensorRT includes and libraries must be installed on system include and library paths so that they are available for the CMake build. The version of TensorRT used in the Dockerfile build can be found in the Framework Containers Support Matrix.

For a given version of the inference server you can attempt to build with non-supported versions of TensorRT but you may have build or execution issues since non-supported versions are not tested.

Once you have TensorRT installed you can enable the TensorRT backend in the inference server with the CMake option -DTRTIS_ENABLE_TENSORRT=ON as described below. You must also specify -DTRTIS_ENABLE_GPU=ON because TensorRT requires GPU support.

TensorFlow

The version of TensorFlow used in the Dockerfile build can be found in the Framework Containers Support Matrix. The trtserver_tf section of the Dockerfile shows how to build the required TensorFlow libary from the NGC TensorFlow container.

You can build and install a different version of the TensorFlow library but you must build with the equivalent options indicated by the patches used in the Dockerfile, and you must include tensorflow_backend_tf.cc and tensorflow_backend_tf.h. The patch to tensorflow/BUILD and the build options shown in nvbuildopts cause TensorFlow backend to be built into a single library, libtensorflow_cc.so, that includes all the functionality required by the inference server.

Once you have the TensorFlow library built and installed you can enable the TensorFlow backend in the inference server with the CMake option -DTRTIS_ENABLE_TENSORRT=ON as described below. You must also specify -DTRTIS_ENABLE_GPU=ON because TensorRT requires GPU support.

You can install the TensorFlow library in a system library path or you can specify the path with the CMake option TRTIS_EXTRA_LIB_PATHS. Multiple paths can be specified by separating them with a semicolon, for example, -DTRTIS_EXTRA_LIB_PATHS=”/path/a;/path/b”.

ONNX Runtime

The version of the ONNX Runtime used in the Dockerfile build can be found in the trtserver_onnx section of the Dockerfile. That section also details the steps that can be used to build the backend. You can attempt to build a different version of the ONNX Runtime or use a different build process but you may have build or execution issues.

Your build should produce the ONNX Runtime library, libonnxruntime.so. You can enable the ONNX Runtime backend in the inference server with the CMake option -DTRTIS_ENABLE_ONNXRUNTIME=ON as described below.

You can install the library in a system library path or you can specify the path with the CMake option TRTIS_EXTRA_LIB_PATHS. Multiple paths can be specified by separating them with a semicolon, for example, -DTRTIS_EXTRA_LIB_PATHS=”/path/a;/path/b”.

You must also provide the path to the ONNX Runtime headers using the -DTRTIS_ONNXRUNTIME_INCLUDE_PATHS option. Multiple paths can be specified by separating them with a semicolon.

PyTorch and Caffe2

The version of PyTorch and Caffe2 used in the Dockerfile build can be found in the Framework Containers Support Matrix. The trtserver_caffe2 section of the Dockerfile shows how to build the required PyTorch and Caffe2 libaries from the NGC PyTorch container.

You can build and install a different version of the libraries but if you want to enable the Caffe2 backend you must include netdef_backend_c2.cc and netdef_backend.c2.h in the build, as shown in the Dockerfile.

Once you have the libraries built and installed you can enable the PyTorch backend in the inference server with the CMake option -DTRTIS_ENABLE_PYTORCH=ON and the Caffe2 backend with -DTRTIS_ENABLE_CAFFE2=ON as described below.

You can install the PyTorch library, libtorch.so, and all the required Caffe2 libraries (see Dockerfile) in a system library path or you can specify the path with the CMake option TRTIS_EXTRA_LIB_PATHS. Multiple paths can be specified by separating them with a semicolon, for example, -DTRTIS_EXTRA_LIB_PATHS=”/path/a;/path/b”.

For the PyTorch backend you must also provide the path to the PyTorch headers using the -DTRTIS_PYTORCH_INCLUDE_PATHS option. Multiple paths can be specified by separating them with a semicolon.

Configure Inference Server

Use cmake to configure the TensorRT Inference Server:

$ mkdir builddir
$ cd builddir
$ cmake -D<option0> ... -D<optionn> ../build

The following options are used to enable and disable the different backends. To enable a backend set the corresponding option to ON, for example -DTRTIS_ENABLE_TENSORRT=ON. To disable a backend set the corresponding option to OFF, for example -DTRTIS_ENABLE_TENSORRT=OFF. By default no backends are enabled. See the section on dependencies for information on additional requirements for enabling a backend.

  • TRTIS_ENABLE_TENSORRT: Use -DTRTIS_ENABLE_TENSORRT=ON to enable the TensorRT backend. The TensorRT libraries must be on your library path or you must add the path to TRTIS_EXTRA_LIB_PATHS.

  • TRTIS_ENABLE_TENSORFLOW: Use -DTRTIS_ENABLE_TENSORFLOW=ON to enable the TensorFlow backend. The TensorFlow library libtensorflow_cc.so must be built as described above and must be on your library path or you must add the path to TRTIS_EXTRA_LIB_PATHS.

  • TRTIS_ENABLE_ONNXRUNTIME: Use -DTRTIS_ENABLE_ONNXRUNTIME=ON to enable the OnnxRuntime backend. The library libonnxruntime.so must be built as described above and must be on your library path or you must add the path to TRTIS_EXTRA_LIB_PATHS.

  • TRTIS_ENABLE_PYTORCH: Use -DTRTIS_ENABLE_PYTORCH=ON to enable the PyTorch backend. The library libtorch.so must be built as described above and must be on your library path or you must add the path to TRTIS_EXTRA_LIB_PATHS.

  • TRTIS_ENABLE_CAFFE2: Use -DTRTIS_ENABLE_CAFFE2=ON to enable the Caffe2 backend. The library libcaffe2.so and all the other required libraries must be built as described above and must be on your library path or you must add the path to TRTIS_EXTRA_LIB_PATHS.

  • TRTIS_ENABLE_CUSTOM: Use -DTRTIS_ENABLE_CUSTOM=ON to enable support for custom backends. See Building A Custom Backend for information on how to build a custom backend.

These additional options may be specified:

  • TRTIS_ENABLE_METRICS: By default the inference server reports Prometheus metrics on an HTTP endpoint. Use -DTRTIS_ENABLE_METRICS=OFF to disable.

  • TRTIS_ENABLE_GPU: By default the inference server supports NVIDIA GPUs. Use -DTRTIS_ENABLE_GPU=OFF to disable GPU support. When GPUs are disable the inference server will run models on CPU when possible.

Build Inference Server

After configuring, build the inference server with make:

$ cd builddir
$ make -j16 trtis

The first build attempt may fail with a “protobuf_generate_cpp” error. If that happens simply run the make command again and the build should succeed. When the build completes the binary, libraries and headers can be found in trtis/install.

Building A Custom Backend

The source repository contains several example custom backends in the src/custom directory. These custom backends are built using CMake:

$ mkdir builddir
$ cd builddir
$ cmake ../build
$ make -j16 trtis-custom-backends

The first build attempt may fail with a “protobuf_generate_cpp” error. If that happens simply run the make command again and the build should succeed. When the build completes the custom backend libraries can be found in trtis-custom-backends/install.

A custom backend is not built-into the inference server. Instead if is built as a separate shared library that the inference server dynamically loads when the model repository contains a model that uses that custom backend. There are a couple of ways you can build your custom backend into a shared libary, as described in the following sections.

Build Using CMake

One way to build your own custom backend is to use the inference server’s CMake build. Simply copy and modify one of the existing example custom backends and then build your backend using CMake. You can then use the resulting shared library in your model repository as described in Custom Backends.

Build Using Custom Backend SDK

The custom backend SDK includes all the header files you need to build your custom backend as well as a static library which provides all the model configuration and protobuf utility functions you will need. You can either build the custom backend SDK yourself using Dockerfile.custombackend:

docker build -t tensorrtserver_cbe -f Dockerfile.custombackend .

Or you can download a pre-build version of the SDK from the GitHub release page corresponding to the release you are interested in. The custom backend SDK is found in the “Assets” section of the release page in a tar file named after the version of the release and the OS, for example, v1.2.0_ubuntu1604.custombackend.tar.gz.

Once you have the SDK you can use the include/ directory and static library when you compile your custom backend source code. For example, the SDK includes the source for the param custom backend in src/param.cc. You can create a custom backend from that source using the following command:

g++ -fpic -shared -std=c++11 -o libparam.so custom-backend-sdk/src/param.cc -Icustom-backend-sdk/include custom-backend-sdk/lib/libcustombackend.a

Building the Client Libraries and Examples

The provided Dockerfile.client and CMake support can be used to build the client libraries and examples.

Build Using Dockerfile

To build the libaries and examples using Docker, first change directory to the root of the repo and checkout the release version of the branch that you want to build (or the master branch if you want to build the under-development version). The branch you use for the client build should match the version of the inference server you are using:

$ git checkout r19.07

Then, issue the following command to build the C++ client library, C++ and Python examples, and a Python wheel file for the Python client library:

$ docker build -t tensorrtserver_client -f Dockerfile.client .

You can optionally add --build-arg “UBUNTU_VERSION=<ver>” to set the Ubuntu version that you want the client library built for. Supported values for <ver> are 16.04 and 18.04, with 16.04 being the default.

After the build completes the tensorrtserver_client docker image will contain the built client libraries and examples, and will also be configured with all the dependencies required to run those examples within the container. The easiest way to try the examples described in the following sections is to run the client image with --net=host so that the client examples can access the inference server running in its own container (see Running The Inference Server for more information about running the inference server):

$ docker run -it --rm --net=host tensorrtserver_client

In the tensorrtserver_client image you can find the example executables in /workspace/build/trtis-clients/install/bin, and the Python examples in /workspace/build/trtis-clients/install/python. A tar file containing all the library and example binaries and Python scripts is at /workspace/v<version>.clients.tar.gz.

Build Using CMake

The client build is performed using CMake. The build dependencies and requirements are shown in Dockerfile.client. To build without Docker you must first install those dependencies. This section describes the client build for Ubuntu 16.04, Ubuntu 18.04, and Windows 10 systems. The CMake build can also be targeted for other OSes and platforms. We welcome any updates that expand the build functionality and allow the clients to be built on additional platforms.

To build the libaries and examples using CMake, first change directory to the root of the repo and checkout the release version of the branch that you want to build (or the master branch if you want to build the under-development version):

$ git checkout r19.07

When following the build steps below, the first build attempt may fail with a “protobuf_generate_cpp” error. If that happens simply run the make/build command again and the build should succeed.

Ubuntu 16.04 / Ubuntu 18.04

To build on Ubuntu, change to the build/ directory and run the following to configure and build:

$ cd build
$ cmake -DCMAKE_BUILD_TYPE=Release
$ make -j8 trtis-clients

When the build completes the libraries and examples can be found in trtis-clients/install.

Windows 10

If you are building the client for Windows, as there are no default build system available, you will need to specify the generator for CMake to match the build system you are using. For instance, if you are using Microsoft Visual Studio, you should do the following:

> cd build
> cmake -G"Visual Studio 16 2019" -DCMAKE_BUILD_TYPE=Release
> MSBuild.exe trtis-clients.vcxproj -p:Configuration=Release

When the build completes the libraries and examples can be found in trtis-clients/install. The C++ client examples will not be generated as those examples have not yet been ported to Windows. However, you can use the Python examples to test if the build is successful.

Building the Documentation

The inference server documentation is found in the docs/ directory and is based on Sphinx. Doxygen integrated with Exhale is used for C++ API docuementation.

To build the docs install the required dependencies:

$ apt-get update
$ apt-get install -y --no-install-recommends doxygen
$ pip install --upgrade sphinx sphinx-rtd-theme nbsphinx exhale

To get the Python client library API docs the TensorRT Inference Server Python package must be installed:

$ pip install --upgrade tensorrtserver-*.whl

Then use Sphinx to build the documentation into the build/html directory:

$ cd docs
$ make clean html