BLS Example#

In this section we demonstrate an end-to-end example for BLS in Python backend. The model repository should contain pytorch, addsub. The pytorch and addsub models calculate the sum and difference of the INPUT0 and INPUT1 and put the results in OUTPUT0 and OUTPUT1 respectively. This example is broken into two sections. The first section demonstrates how to perform synchronous BLS requests and the second section shows how to execute asynchronous BLS requests.

Synchronous BLS Requests#

The goal of sync BLS model is the same as pytorch and addsub models but the difference is that the BLS model will not calculate the sum and difference by itself. The sync BLS model will pass the input tensors to the pytorch or addsub models and return the responses of that model as the final response. The additional parameter MODEL_NAME determines which model will be used for calculating the final outputs.

  1. Create the model repository:

mkdir -p models/add_sub/1
mkdir -p models/bls_sync/1
mkdir -p models/pytorch/1

# Copy the Python models
cp examples/add_sub/model.py models/add_sub/1/
cp examples/add_sub/config.pbtxt models/add_sub/config.pbtxt
cp examples/bls/sync_model.py models/bls_sync/1/model.py
cp examples/bls/sync_config.pbtxt models/bls_sync/config.pbtxt
cp examples/pytorch/model.py models/pytorch/1/
cp examples/pytorch/config.pbtxt models/pytorch/
  1. Start the tritonserver:

tritonserver --model-repository `pwd`/models
  1. Send inference requests to server:

python3 examples/bls/sync_client.py

You should see an output similar to the output below:

=========='add_sub' model result==========
INPUT0 ([0.34984654 0.6808792  0.6509772  0.6211422 ]) + INPUT1 ([0.37917137 0.9080451  0.60789365 0.33425143]) = OUTPUT0 ([0.7290179 1.5889243 1.2588708 0.9553937])
INPUT0 ([0.34984654 0.6808792  0.6509772  0.6211422 ]) - INPUT1 ([0.37917137 0.9080451  0.60789365 0.33425143]) = OUTPUT1 ([-0.02932483 -0.22716594  0.04308355  0.28689077])


=========='pytorch' model result==========
INPUT0 ([0.34984654 0.6808792  0.6509772  0.6211422 ]) + INPUT1 ([0.37917137 0.9080451  0.60789365 0.33425143]) = OUTPUT0 ([0.7290179 1.5889243 1.2588708 0.9553937])
INPUT0 ([0.34984654 0.6808792  0.6509772  0.6211422 ]) - INPUT1 ([0.37917137 0.9080451  0.60789365 0.33425143]) = OUTPUT1 ([-0.02932483 -0.22716594  0.04308355  0.28689077])


=========='undefined' model result==========
Failed to process the request(s) for model instance 'bls_0', message: TritonModelException: Failed for execute the inference request. Model 'undefined_model' is not ready.

At:
  /tmp/python_backend/models/bls/1/model.py(110): execute

The sync_model.py model file is heavily commented with explanations about each of the function calls.

Explanation of the Client Output#

The client.py sends three inference requests to the ‘bls_sync’ model with different values for the “MODEL_NAME” input. As explained earlier, “MODEL_NAME” determines the model name that the “bls” model will use for calculating the final outputs. In the first request, it will use the “add_sub” model and in the second request it will use the “pytorch” model. The third request uses an incorrect model name to demonstrate error handling during the inference request execution.

Asynchronous BLS Requests#

In this section we explain how to send multiple BLS requests without waiting for their response. Asynchronous execution of BLS requests will not block your model execution and can lead to speedups under certain conditions.

The bls_async model will perform two async BLS requests on the pytorch and addsub models. Then, it will wait until the inference requests on these models is completed. It will extract OUTPUT0 from the pytorch and OUTPUT1 from the addsub model to construct the final inference response object using these tensors.

  1. Create the model repository:

mkdir -p models/add_sub/1
mkdir -p models/bls_async/1
mkdir -p models/pytorch/1

# Copy the Python models
cp examples/add_sub/model.py models/add_sub/1/
cp examples/add_sub/config.pbtxt models/add_sub/
cp examples/bls/async_model.py models/bls_async/1/model.py
cp examples/bls/async_config.pbtxt models/bls_async/config.pbtxt
cp examples/pytorch/model.py models/pytorch/1/
cp examples/pytorch/config.pbtxt models/pytorch/
  1. Start the tritonserver:

tritonserver --model-repository `pwd`/models
  1. Send inference requests to server:

python3 examples/bls/async_client.py

You should see an output similar to the output below:

INPUT0 ([0.72394824 0.45873794 0.4307444  0.07681174]) + INPUT1 ([0.34224355 0.8271524  0.5831284  0.904624  ]) = OUTPUT0 ([1.0661918 1.2858903 1.0138729 0.9814357])
INPUT0 ([0.72394824 0.45873794 0.4307444  0.07681174]) - INPUT1 ([0.34224355 0.8271524  0.5831284  0.904624  ]) = OUTPUT1 ([ 0.3817047  -0.36841443 -0.15238398 -0.82781225])