Deploying Stable Diffusion Models with Triton and TensorRT#

This example demonstrates how to deploy Stable Diffusion models in Triton by leveraging the TensorRT demo pipeline and utilities.

Using the TensorRT demo as a base this example contains a reusable python based backend, /backend/diffusion/model.py, suitable for deploying multiple versions and configurations of Diffusion models.

For more information on Stable Diffusion please visit stable-diffusion-v1-5, stable-diffusion-xl. For more information on the TensorRT implementation please see the TensorRT demo.

[!Note] This example is given as sample code and should be reviewed before use in production settings.

| Requirements | Building Server Image | Stable Diffusion v1.5 | Stable Diffusion XL | Sending an Inference Request | Model Configuration | Sample Client | Known Issues and Limitations |

Requirements#

The following instructions require a Linux system with Docker installed. For CUDA support, make sure your CUDA driver meets the requirements in the “NVIDIA Driver” section of Deep Learning Framework support matrix.

Building the Triton Inference Server Image#

The example is designed based on the nvcr.io/nvidia/tritonserver:24.08-py3 docker image and TensorRT OSS v10.4.

A set of convenience scripts are provided to create a docker image based on the nvcr.io/nvidia/tritonserver:24.01-py3 image with the dependencies for the TensorRT Stable Diffusion demo installed.

Triton Inference Server + TensorRT OSS#

Clone Repository#

git clone https://github.com/triton-inference-server/tutorials.git --single-branch
cd tutorials/Popular_Models_Guide/StableDiffusion

Build Tritonserver Diffusion Docker Image#

./build.sh

Included Models#

The default build includes model configuration files located in the /diffusion-models folder. Example configurations are provided for stable_diffusion_1_5 and stable_diffusion_xl.

Model artifacts and engine files are not included in the image but are built into a volume mounted directory as a separate step.

Building and Running Stable Diffusion v 1.5#

Start Tritonserver Diffusion Container#

The following command starts a container and volume mounts the current directory as workspace.

./run.sh

Build Stable Diffusion v 1.5 Engine#

[!Note]

The model stable-diffusion-v1-5 requires login in to huggingface and acceptance of terms and conditions of use. Please set the environment variable HF_TOKEN accordingly.

./scripts/build_models.sh --model stable_diffusion_1_5

Expected Output#

 diffusion-models
|-- stable_diffusion_1_5
|   |-- 1
|   |   |-- 1.5-engine-batch-size-1
|   |   |-- 1.5-onnx
|   |   |-- 1.5-pytorch_model
|   `-- config.pbtxt

Start a Server Instance#

[!Note] We use EXPLICIT model control mode for demonstration purposes to control which stable diffusion version is loaded. For production deployments please refer to Secure Deployment Considerations for more information on the risks associated with EXPLICIT mode.

tritonserver --model-repository diffusion-models --model-control-mode explicit --load-model stable_diffusion_1_5

Expected Output#

<SNIP>
I0229 20:15:52.125050 749 server.cc:676]
+----------------------+---------+--------+
| Model                | Version | Status |
+----------------------+---------+--------+
| stable_diffusion_1_5 | 1       | READY  |
+----------------------+---------+--------+

<SNIP>

Building and Running Stable Diffusion XL#

Start Tritonserver Diffusion Container#

The following command starts a container and volume mounts the current directory as workspace.

./run.sh

Build Stable Diffusion XL Engine#

./scripts/build_models.sh --model stable_diffusion_xl

Expected Output#

 diffusion-models
 |-- stable_diffusion_xl
    |-- 1
    |   |-- xl-1.0-engine-batch-size-1
    |   |-- xl-1.0-onnx
    |   `-- xl-1.0-pytorch_model
    `-- config.pbtxt

Start a Server Instance#

[!Note] We use EXPLICIT model control mode for demonstration purposes to control which stable diffusion version is loaded. For production deployments please refer to Secure Deployment Considerations for more information on the risks associated with EXPLICIT mode.

tritonserver --model-repository diffusion-models --model-control-mode explicit --load-model stable_diffusion_xl

Expected Output#

<SNIP>
I0229 20:22:22.912465 1440 server.cc:676]
+---------------------+---------+--------+
| Model               | Version | Status |
+---------------------+---------+--------+
| stable_diffusion_xl | 1       | READY  |
+---------------------+---------+--------+

<SNIP>

Sending an Inference Request#

We’ve provided a sample client application to make sending and receiving requests simpler.

Start Tritonserver Diffusion Container#

In a separate terminal from the server start a new container.

The following command starts a container and volume mounts the current directory as workspace.

./run.sh

Send Prompt to Stable Diffusion 1.5#

python3 client.py --model stable_diffusion_1_5 --prompt "butterfly in new york, 4k, realistic" --save-image

Example Output#

Client: 0 Throughput: 0.7201335361144658 Avg. Latency: 1.3677194118499756
Throughput: 0.7163933558221957 Total Time: 1.395881175994873

If --save-image is given then output images will be saved as jpegs.

client_0_generated_image_0.jpg

sample_generated_image

Send Prompt to Stable Diffusion XL#

python3 client.py --model stable_diffusion_xl --prompt "butterfly in new york, 4k, realistic" --save-image

Example Output#

Client: 0 Throughput: 0.1825067711674996 Avg. Latency: 5.465569257736206
Throughput: 0.18224859609447058 Total Time: 5.487010717391968

If --save-image is given then output images will be saved as jpegs.

client_0_generated_image_0.jpg

sample_generated_image

Sample Client#

The sample client application enables users to quickly test the diffusion models under different concurrency scenarios. For a full list and description of the client application’s options use:

python3 client.py --help

Sending Concurrent Requests#

To increase load and concurrency users can use the clients and requests options to control the number of client processes and the number of requests sent by each client.

Example: Ten Clients Sending Ten Requests Each#

The following command enables ten clients each sending ten requests. Each client is an independent process that sends its requests one after the other in parallel with the other nine clients.

python3 client.py --model stable_diffusion_xl --requests 10 --clients 10

Known Issues and Limitations#

  1. The diffusion backend doesn’t yet support using an optional refiner model unlike the demo it’s based on. See also demo_txt2img_xl.py