Client Libraries

The inference server client libraries make it easy to communicate with the TensorRT Inference Server from your C++ or Python application. Using these libraries you can send either HTTP or GRPC requests to the server to check status or health and to make inference requests. These libraries also support using shared memory for passing inputs to and receiving outputs from the inference server. Client Examples describes examples that show the use of both the C++ and Python libraries.

You can also communicate with the inference server by using the protoc compiler to generate the GRPC client stub in a large number of programming languages. The grpc_image_client example in Client Examples illustrates how to use the GRPC client stub.

This section shows how to get the client libraries by either building or downloading, and also describes how to build your own client using these libraries.

Getting the Client Libraries

The provided Dockerfile.client and CMake support can be used to build the client libraries. As an alternative to building, it is also possible to download the pre-build client libraries from GitHub.

Build Using Dockerfile

To build the libaries using Docker, first change directory to the root of the repo and checkout the release version of the branch that you want to build (or the master branch if you want to build the under-development version). The branch you use for the client build should match the version of the inference server you are using:

$ git checkout r19.09

Then, issue the following command to build the C++ client library and a Python wheel file for the Python client library:

$ docker build -t tensorrtserver_client -f Dockerfile.client .

You can optionally add --build-arg “UBUNTU_VERSION=<ver>” to set the Ubuntu version that you want the client library built for. Supported values for <ver> are 16.04 and 18.04, with 16.04 being the default.

The generated Python wheel file works with both Python2 and Python3, but you can control which version of Python (and pip) are used to generate the wheel file by editing PYVER in Dockerfile.client. The default is Python3 and pip3.

After the build completes the tensorrtserver_client docker image will contain the built client libraries in /workspace/install/lib, the corresponding headers in /workspace/install/include, and the Python wheel file in /workspace/install/python. The image will also contain the built client examples that you can learn more about in Client Examples.

Build Using CMake

The client library build is performed using CMake. The build dependencies and requirements are shown in Dockerfile.client. To build without Docker you must first install those dependencies. This section describes the client build for Ubuntu 16.04, Ubuntu 18.04, and Windows 10 systems. The CMake build can also be targeted for other OSes and platforms. We welcome any updates that expand the build functionality and allow the clients to be built on additional platforms.

To build the libaries using CMake, first change directory to the root of the repo and checkout the release version of the branch that you want to build (or the master branch if you want to build the under-development version):

$ git checkout r19.09

Ubuntu 16.04 / Ubuntu 18.04

For Ubuntu, the dependencies and how to install them can be found in Dockerfile.client. Also note that the dependency name may be different depending on the version of the system.

To build on Ubuntu, change to the build/ directory and run the following to configure and build:

$ cd build
$ cmake -DCMAKE_BUILD_TYPE=Release
$ make -j8 trtis-clients

When the build completes the libraries can be found in trtis-clients/install/lib, the corresponding headers in trtis-clients/install/include, and the Python wheel file in trtis-clients/install/python. The trtis-clients/install directory will also contain the built client examples that you can learn more about in Client Examples.

Windows 10

For Windows, the dependencies can be installed using pip and vcpkg which is a C++ library management tool on Windows. The following shows how to install the dependencies using them, and you can also install the dependencies in other ways that you prefer:

> .\vcpkg.exe install curl[openssl]:x64-windows
> pip install grpcio-tools wheel

The vcpkg step above installs curl and openssl, “:x64-windows” specifies the target and it is optional. The path to the libraries should be added to environment variable “PATH”, by default it is \path\to\vcpkg\installed\<target>\bin.

To build the client for Windows, as there is no default build system available, you will need to specify the generator for CMake to match the build system you are using. For instance, if you are using Microsoft Visual Studio, you should do the following:

> cd build
> cmake -G"Visual Studio 16 2019" -DCMAKE_BUILD_TYPE=Release
> MSBuild.exe trtis-clients.vcxproj -p:Configuration=Release

When the build completes the libraries can be found in trtis-clients\install\lib, the corresponding headers in trtis-clients\install\include, and the Python wheel file in trtis-clients\install\python. The trtis-clients\install directory will also contain the built client Python examples that you can learn more about in Client Examples. At this time the Windows build does not include the C++ examples.

Download From GitHub

An alternative to building the client library is to download the pre-built client libraries from the GitHub release page corresponding to the release you are interested in. The client libraries are found in the “Assets” section of the release page in a tar file named after the version of the release and the OS, for example, v1.2.0_ubuntu1604.clients.tar.gz.

The pre-built libraries can be used on the corresponding host system (for example Ubuntu-16.04 or Ubuntu-18.04) or you can install them into the TensorRT Inference Server container to have both the clients and server in the same container:

$ mkdir clients
$ cd clients
$ wget https://github.com/NVIDIA/tensorrt-inference-server/releases/download/<tarfile_path>
$ tar xzf <tarfile_name>

After installing the libraries can be found in lib/, the corresponding headers in include/, and the Python wheel file in python/. The bin/ and python/ directories contain the built examples that you can learn more about in Client Examples.

Building Your Own Client

No matter how you get the client libraries (Dockerfile, CMake or download), using them to build your own client application is the same. The install directory contains all the libraries and includes needed for your client.

For Python you just need to install the wheel from from the python/ directory. The wheel contains everything you need to communicate with the inference server from you Python application, as shown in Client Examples.

For C++ the lib/ directory contains both shared and static libraries and then include/ directory contains the corresponding headers. The src/ directory contains an example application and CMake file to show how you can build your C++ application to use the libaries and includes. To build the example you must first install dependencies appropriate for your platform. For example, for Ubuntu 18.04:

$ apt-get update
$ apt-get install software-properties-common build-essential curl git zlib1g zlib1g-dev libssl-dev libcurl4-openssl-dev

Then you can build the example application:

$ cd src/cmake
$ cmake .
$ make -j8 trtis-clients

The example CMake file that illustrates how to build is in src/cmake/trtis-clients/CMakeLists.txt. The build produces both a statically and dynamically linked version of the example application into src/cmake/trtis-clients/install/bin.

Client API

The C++ client API exposes a class-based interface for querying server and model status and for performing inference. The commented interface is available at src/core/request.h and in the API Reference.

The Python client API provides similar capabilities as the C++ API. The commented interface is available at src/clients/python/__init__.py and in the API Reference.

A simple C++ example application at src/clients/c++/simple_client.cc and a Python version at src/clients/python/simple_client.py demonstrate basic client API usage.

To run the the C++ version of the simple example, first build or download it as described in Getting the Client Examples and then:

$ simple_client
0 + 1 = 1
0 - 1 = -1
1 + 1 = 2
1 - 1 = 0
2 + 1 = 3
2 - 1 = 1
...
14 - 1 = 13
15 + 1 = 16
15 - 1 = 14

To run the the Python version of the simple example, first build or download it as described in Getting the Client Examples and install the tensorrtserver whl, then:

$ python simple_client.py

Shared Memory

A simple C++ example application using shared memory at src/clients/c++/simple_shm_client.cc and a Python version at src/clients/python/simple_shm_client.py demonstrate the usage of shared memory with the client API.

To run the the C++ version of the simple shared memory example, first build or download it as described in Getting the Client Examples and then:

$ simple_shm_client
0 + 1 = 1
0 - 1 = -1
1 + 1 = 2
1 - 1 = 0
2 + 1 = 3
2 - 1 = 1
...
14 - 1 = 13
15 + 1 = 16
15 - 1 = 14

To run the the Python version of the simple shared memory example, first build or download it as described in Getting the Client Examples and install the tensorrtserver whl, then:

$ python simple_shm_client.py

String Datatype

Some frameworks support tensors where each element in the tensor is a string (see Datatypes for information on supported datatypes). For the most part, the Client API is identical for string and non-string tensors. One exception is that in the C++ API a string input tensor must be initialized with SetFromString() instead of SetRaw().

String tensors are demonstrated in the C++ example application at src/clients/c++/simple_string_client.cc and a Python version at src/clients/python/simple_string_client.py.

Client API for Stateful Models

When performing inference using a stateful model, a client must identify which inference requests belong to the same sequence and also when a sequence starts and ends.

Each sequence is identified with a correlation ID that is provided when the inference context is created (in either the Python of C++ APIs). It is up to the clients to create a unique correlation ID. For each sequence the first inference request should be marked as the start of the sequence and the last inference requests should be marked as the end of the sequence. Start and end are marked using the flags provided with the RunOptions in the C++ API and the run() and async_run() methods in the Python API.

The use of correlation ID and start and end flags are demonstrated in the C++ example application at src/clients/c++/simple_sequence_client.cc and a Python version at src/clients/python/simple_sequence_client.py.