Building

The TensorRT Inference Server is built using Docker and the TensorFlow and PyTorch containers from NVIDIA GPU Cloud (NGC). Before building you must install Docker and nvidia-docker and login to the NGC registry by following the instructions in Installing Prebuilt Containers.

Building the Server

To build a release version of the TensorRT Inference Server container, change directory to the root of the repo and issue the following command:

$ docker build --pull -t tensorrtserver .

Incremental Builds

For typical development you will want to run the build container with your local repo’s source files mounted so that your local changes can be incrementally built. This is done by first building the tensorrtserver_build container:

$ docker build --pull -t tensorrtserver_build --target trtserver_build .

By mounting /path/to/tensorrtserver/src into the container at /workspace/src, changes to your local repo will be reflected in the container:

$ nvidia-docker run -it --rm -v/path/to/tensorrtserver/src:/workspace/src tensorrtserver_build

Within the container you can perform an incremental server build with:

# cd /workspace
# bazel build -c opt --config=cuda src/servers/trtserver
# cp /workspace/bazel-bin/src/servers/trtserver /opt/tensorrtserver/bin/trtserver

Similarly, within the container you can perform an incremental build of the C++ and Python client libraries and example executables with:

# cd /workspace
# bazel build -c opt --config=cuda src/clients/…
# mkdir -p /opt/tensorrtserver/bin
# cp bazel-bin/src/clients/c++/image_client /opt/tensorrtserver/bin/.
# cp bazel-bin/src/clients/c++/perf_client /opt/tensorrtserver/bin/.
# cp bazel-bin/src/clients/c++/simple_client /opt/tensorrtserver/bin/.
# mkdir -p /opt/tensorrtserver/lib
# cp bazel-bin/src/clients/c++/librequest.so /opt/tensorrtserver/lib/.
# cp bazel-bin/src/clients/c++/librequest.a /opt/tensorrtserver/lib/.
# mkdir -p /opt/tensorrtserver/pip
# bazel-bin/src/clients/python/build_pip /opt/tensorrtserver/pip/.

Some source changes seem to cause bazel to get confused and not correctly rebuild all required sources. You can force bazel to rebuild all of the inference server source without requiring a complete rebuild of the TensorFlow and Caffe2 components by doing the following before issuing the above build command:

# rm -fr bazel-bin/src

Building the Client Libraries and Examples

The provided Dockerfile can be used to build just the client libraries and examples. Issue the following command to build the C++ client library, C++ and Python examples, and a Python wheel file for the Python client library:

$ docker build -t tensorrtserver_clients --target trtserver_build --build-arg "BUILD_CLIENTS_ONLY=1" .

You can optionally add --build-arg “PYVER=<ver>” to set the Python version that you want the Python client library built for. Supported values for <ver> are 2.6 and 3.5, with 3.5 being the default.

After the build completes the tensorrtserver_clients docker image will contain the built client libraries and examples. The easiest way to try the examples described in the following sections is to run the client image with --net=host so that the client examples can access the inference server running in its own container (see Running The Inference Server for more information about running the inference server):

$ docker run -it --rm --net=host tensorrtserver_clients

In the client image you can find the example executables in /opt/tensorrtserver/bin, and the Python wheel in /opt/tensorrtserver/pip.

If your host sytem is Ubuntu-16.04, an alternative to running the examples within the tensorrtserver_clients container is to instead download the client libraries and examples from the GitHub release page corresponding to the release you are interested in:

$ mkdir tensorrtserver_clients
$ cd tensorrtserver_clients
$ wget https://github.com/NVIDIA/tensorrt-inference-server/archive/v0.11.0.clients.tar.gz
$ tar xzf v0.11.0.clients.tar.gz

You can now find client example binaries in bin/, c++ libraries in lib/, and Python client examples and wheel file in python/.

To run the C++ examples you must install some dependencies on your Ubuntu-16.04 host system:

$ apt-get install curl libcurl3-dev libopencv-dev libopencv-core-dev

To run the Python examples you will need to additionally install the wheel file and some other dependencies:

$ apt-get install python3 python3-pip
$ pip3 install --user --upgrade tensorrtserver-*.whl numpy pillow

Building the Documentation

The inference server documentation is found in the docs/ directory and is based on Sphinx. Doxygen integrated with Exhale is used for C++ API docuementation.

To build the docs install the required dependencies:

$ apt-get update
$ apt-get install -y --no-install-recommends doxygen
$ pip install --upgrade sphinx sphinx-rtd-theme nbsphinx exhale

To get the Python client library API docs the TensorRT Inference Server Python package must be installed:

$ pip install --upgrade tensorrtserver-*.whl

Then use Sphinx to build the documentation into the build/html directory:

$ cd docs
$ make clean html