The TensorRT Inference Server is built using Docker and the TensorFlow and PyTorch containers from NVIDIA GPU Cloud (NGC). Before building you must install Docker and nvidia-docker and login to the NGC registry by following the instructions in Installing Prebuilt Containers.

Building the Server

To build a release version of the TensorRT Inference Server container, change directory to the root of the repo and checkout the release version of the branch that you want to build (or the master branch if you want to build the under-development version):

$ git checkout r19.04

Then use docker to build:

$ docker build --pull -t tensorrtserver .

Incremental Builds

For typical development you will want to run the build container with your local repo’s source files mounted so that your local changes can be incrementally built. This is done by first building the tensorrtserver_build container:

$ docker build --pull -t tensorrtserver_build --target trtserver_build .

By mounting /path/to/tensorrtserver/src into the container at /workspace/src, changes to your local repo will be reflected in the container:

$ nvidia-docker run -it --rm -v/path/to/tensorrtserver/src:/workspace/src tensorrtserver_build

Within the container you can perform an incremental server build with:

# cd /workspace
# bazel build -c opt src/servers/trtserver
# cp /workspace/bazel-bin/src/servers/trtserver /opt/tensorrtserver/bin/trtserver

Some source changes seem to cause bazel to get confused and not correctly rebuild all required sources. You can force bazel to rebuild all of the inference server source without requiring a complete rebuild of the TensorFlow and Caffe2 components by doing the following before issuing the above build command:

# rm -fr bazel-bin/src

Building the Client Libraries and Examples

The provided Makefile.client and Dockerfile.client can be used to build the client libraries and examples.

Build Using Dockerfile

To build the libaries and examples, first change directory to the root of the repo and checkout the release version of the branch that you want to build (or the master branch if you want to build the under-development version). The branch you use for the client build should match the version of the inference server you are using:

$ git checkout r19.04

Then, issue the following command to build the C++ client library, C++ and Python examples, and a Python wheel file for the Python client library:

$ docker build -t tensorrtserver_client -f Dockerfile.client .

You can optionally add --build-arg “PYVER=<ver>” to set the Python version that you want the Python client library built for. Supported values for <ver> are 2.6 and 3.5, with 3.5 being the default.

After the build completes the tensorrtserver_client docker image will contain the built client libraries and examples, and will also be configured with all the dependencies required to run those example within the container. The easiest way to try the examples described in the following sections is to run the client image with --net=host so that the client examples can access the inference server running in its own container (see Running The Inference Server for more information about running the inference server):

$ docker run -it --rm --net=host tensorrtserver_client

In the tensorrtserver_client image you can find the C++ library and example executables in /workspace/build, and the Python examples in /workspace/src/clients/python. A tar file containing all the library and example binaries and Python scripts is at /workspace/v<version>.clients.tar.gz.

Build Using Makefile

The actual client build is performed by Makefile.client. The build dependencies and requirements are shown in Dockerfile.client. To build without Docker you must first install those dependencies. The Makefile can also be targeted for other OSes and platforms. We welcome any updates that expand the Makefiles functionality and allow the clients to be built on additional platforms.

Building the Documentation

The inference server documentation is found in the docs/ directory and is based on Sphinx. Doxygen integrated with Exhale is used for C++ API docuementation.

To build the docs install the required dependencies:

$ apt-get update
$ apt-get install -y --no-install-recommends doxygen
$ pip install --upgrade sphinx sphinx-rtd-theme nbsphinx exhale

To get the Python client library API docs the TensorRT Inference Server Python package must be installed:

$ pip install --upgrade tensorrtserver-*.whl

Then use Sphinx to build the documentation into the build/html directory:

$ cd docs
$ make clean html