BLS Triton Backend#
The BLS backend demonstrates using in-process C-API to execute inferences within the backend. This backend serves as an example to backend developers for implementing their own custom pipeline in C++. For Python use cases, please refer to Business Logic Scripting section in Python backend.
The source code for the bls backend is contained in src.
backend.cc contains the main backend implementation. The content of this file is not BLS specific. It only includes the required Triton backend functions that is standard for any backend implementation. The BLS logic is set off in the
TRITONBACKEND_ModelInstanceExecute
with linesbls_executor.Execute(requests[r], &responses[r]);
.bls.h is where the BLS (class
BLSExecutor
) of this example is located. You can refer to this file to see how to interact with Triton in-process C-API to build the custom execution pipeline.bls_utils.h is where all the utilities that are not BLS dependent are located.
The source code contains extensive documentation describing the operation of the backend and the use of the Triton Backend API and the Triton Server API. Before reading the source code, make sure you understand the concepts associated with Triton backend abstractions TRITONBACKEND_Backend, TRITONBACKEND_Model, and TRITONBACKEND_ModelInstance.
The bls backend will send two requests on the ‘addsub_python’ and ‘addsub_onnx’ models. After the inference requests are completed, this backend will extract OUTPUT0 from the ‘addsub_python’ and OUTPUT1 from the ‘addsub_onnx’ model to construct the final inference response object using these tensors.
There are some self-imposed limitations that were made for the simplicity of this example:
This backend does not support batching.
This backend does not support decoupled models.
This backend does not support GPU tensors.
The model configuration should be strictly set as the comments described in backend.cc.
You can implement your custom backend that is not limited to the limitations mentioned above.
Building the BLS Backend#
backends/bls/CMakeLists.txt shows the recommended build and install script for a Triton backend. Building and installing is the same as described in Building the Minimal Backend.
Running Triton with the BLS Backend#
After adding the bls backend to the Triton server as described in Backend Shared Library, you can run Triton and have it load the models in model_repos/bls_models. Assuming you have created a tritonserver Docker image by adding the bls backend to Triton, the following command will run Triton:
$ docker run --rm -it --net=host -v/path/to/model_repos/bls_models:/models tritonserver --model-repository=/models
The console output will show similar to the following indicating that the bls_fp32, addsub_python and addsub_onnx models from the bls_models repository have loaded correctly.
I0616 09:34:47.767433 19214 server.cc:629]
+---------------+---------+--------+
| Model | Version | Status |
+---------------+---------+--------+
| addsub_python | 1 | READY |
| addsub_onnx | 1 | READY |
| bls_fp32 | 1 | READY |
+---------------+---------+--------+
Testing the BLS Backend#
The clients directory holds example clients. The bls_client Python script demonstrates sending an inference requests to the bls backend. With Triton running as described in Running Triton with the BLS Backend, execute the client:
$ clients/bls_client
You should see an output similar to the output below:
INPUT0 ([0.42935285 0.51512766 0.43625894 ... 0.6670954 0.17747518 0.7976901 ]) + INPUT1 ([6.7752063e-01 2.4223252e-01 6.7743927e-01 ... 4.1531715e-01 2.5451833e-01 7.9097062e-01]) = OUTPUT0 ([1.1068735 0.75736016 1.1136982 ... 1.0824126 0.4319935 1.5886607 ])
INPUT0 ([0.42935285 0.51512766 0.43625894 ... 0.6670954 0.17747518 0.7976901 ]) - INPUT1 ([6.7752063e-01 2.4223252e-01 6.7743927e-01 ... 4.1531715e-01 2.5451833e-01 7.9097062e-01]) = OUTPUT1 ([-0.24816778 0.27289516 -0.24118033 ... 0.25177827 -0.07704315 0.00671947])
PASS