Auto-Complete Example#

This example shows how to implement auto_complete_config function in Python backend to provide max_batch_size, input and output properties. These properties will allow Triton to load the Python model with Minimal Model Configuration in absence of a configuration file.

The model repository should contain nobatch_auto_complete, and batch_auto_complete models. The max_batch_size of nobatch_auto_complete model is set to zero, whereas the max_batch_size of batch_auto_complete model is set to 4. For models with a non-zero value of max_batch_size, the configuration can specify a different value of max_batch_size as long as it does not exceed the value set in the model file.

The nobatch_auto_complete and batch_auto_complete models calculate the sum and difference of the INPUT0 and INPUT1 and put the results in OUTPUT0 and OUTPUT1 respectively.

Deploying the Auto-Complete Models#

  1. Create the model repository:

mkdir -p models/nobatch_auto_complete/1/
mkdir -p models/batch_auto_complete/1/

# Copy the Python models
cp examples/auto_complete/ models/nobatch_auto_complete/1/
cp examples/auto_complete/ models/batch_auto_complete/1/

Note that we don’t need a model configuration file since Triton will use the auto-complete model configuration provided in the Python model.

  1. Start the tritonserver:

tritonserver --model-repository `pwd`/models

Running inferences on Nobatch and Batch models:#

Send inference requests using

python3 examples/auto_complete/

You should see an output similar to the output below:

'nobatch_auto_complete' configuration matches the expected auto complete configuration

'batch_auto_complete' configuration matches the expected auto complete configuration

PASS: auto_complete

The and model files are heavily commented with explanations about how to utilize set_max_batch_size, add_input, and add_outputfunctions to set max_batch_size, input and output properties of the model.

Explanation of the Client Output#

For each model, the first requests the model configuration from Triton to validate if the model configuration has been registered as expected. The client then sends an inference request to verify whether the inference has run properly and the result is correct.