Python#

Triton Inference Server In-Process Python API [BETA]#

Starting with release 24.01 Triton Inference Server will include a Python package enabling developers to embed Triton Inference Server instances in their Python applications. The in-process Python API is designed to match the functionality of the in-process C API while providing a higher level abstraction. At its core the API relies on a 1:1 python binding of the C API and provides all the flexibility and power of the C API with a simpler to use interface.

[!Note] As the API is in BETA please expect some changes as we test out different features and get feedback. All feedback is weclome and we look forward to hearing from you!

Requirements#

The following instructions require a linux system with Docker installed. For CUDA support, make sure your CUDA driver meets the requirements in “NVIDIA Driver” section of Deep Learning Framework support matrix: https://docs.nvidia.com/deeplearning/frameworks/support-matrix/index.html

Installation#

The tutorial and Python API package are designed to be installed and run within the nvcr.io/nvidia/tritonserver:24.01-py3 docker image.

A set of convenience scripts are provided to create a docker image based on the nvcr.io/nvidia/tritonserver:24.01-py3 image with the Python API installed plus additional dependencies required for the examples.

Triton Inference Server 24.01 + Python API#

Clone Repository#
Build triton-python-api:r24.01 Image#
Supported Backends#

The built image includes all the backends shipped by default in the tritonserver nvcr.io/nvidia/tritonserver:24.01-py3 container.

dali  fil  identity  onnxruntime  openvino  python  pytorch  repeat  square  tensorflow  tensorrt
Included Models#

The default build includes an identity model that can be used for exercising basic operations including sending input tensors of different data types. The identity model copies provided inputs of shape [-1, -1] to outputs of shape [-1, -1]. Inputs are named data_type_input and outputs are named data_type_output (e.g. string_input, string_output, fp16_input, fp16_output).

Hello World#

Start triton-python-api:r24.01 Container#

The following command starts a container and volume mounts the current directory as workspace.

Enter Python Shell#

Create and Start a Server Instance#

List Models#

server.models()
Example Output#

server.models() returns a dictionary of the available models with their current state.

Send an Inference Request#

Iterate through Responses#

model.infer() returns an iterator that can be used to process the results of an inference request.

Example Output#

Stable Diffusion#

This example is based on the Popular_Models_Guide/StableDiffusion tutorial.

Please note the following command will take many minutes depending on your hardware configuration and network connection.

The built image includes all the backends shipped by default in the tritonserver nvcr.io/nvidia/tritonserver:24.01-py3 container.

dali  fil  identity  onnxruntime  openvino  python  pytorch  repeat  square  tensorflow  tensorrt

The diffusion build includes a stable_diffustion pipeline that takes a text prompt and returns a generated image. For more details on the models and pipeline please see the Popular_Models_Guide/StableDiffusion tutorial.

Start Container#

The following command starts a container and volume mounts the current directory as workspace.

Enter Python Shell#

Create and Start a Server Instance#

List Models#

server.models()
Example Output#

Send an Inference Request#

Iterate through Responses and save image#

Example Output#
sample_generated_image

Fig. 1 sample_generated_image#