In-Process Triton Server API#

The Triton Inference Server provides a backwards-compatible C API/ python-bindings/java-bindings that allows Triton to be linked directly into a C/C++/java/python application. This API is called the “Triton Server API” or just “Server API” for short. The API is implemented in the Triton shared library which is built from source contained in the core repository. On Linux this library is libtritonserver.so and on Windows it is tritonserver.dll. In the Triton Docker image the shared library is found in /opt/tritonserver/lib. The header file that defines and documents the Server API is tritonserver.h. Java bindings for In-Process Triton Server API are built on top of tritonserver.h and can be used for Java applications that need to use Tritonserver in-process.

All capabilities of Triton server are encapsulated in the shared library and are exposed via the Server API. The tritonserver executable implements HTTP/REST and GRPC endpoints and uses the Server API to communicate with core Triton logic. The primary source files for the endpoints are grpc_server.cc and http_server.cc. In these source files you can see the Server API being used.

You can use the Server API in your own application as well. A simple example using the Server API can be found in simple.cc.