Deploying Stable Diffusion Models with Triton and TensorRT#
This example demonstrates how to deploy Stable Diffusion models in Triton by leveraging the TensorRT demo pipeline and utilities.
Using the TensorRT demo as a base this example contains a reusable
python based backend, /backend/diffusion/model.py
,
suitable for deploying multiple versions and configurations of
Diffusion models.
For more information on Stable Diffusion please visit stable-diffusion-v1-5, stable-diffusion-xl. For more information on the TensorRT implementation please see the TensorRT demo.
[!Note] This example is given as sample code and should be reviewed before use in production settings.
| Requirements | Building Server Image | Stable Diffusion v1.5 | Stable Diffusion XL | Sending an Inference Request | Model Configuration | Sample Client | Known Issues and Limitations |
Requirements#
The following instructions require a Linux system with Docker installed. For CUDA support, make sure your CUDA driver meets the requirements in the “NVIDIA Driver” section of Deep Learning Framework support matrix.
Building the Triton Inference Server Image#
The example is designed based on the
nvcr.io/nvidia/tritonserver:24.08-py3
docker image and TensorRT OSS v10.4.
A set of convenience scripts are provided to create a docker image
based on the nvcr.io/nvidia/tritonserver:24.01-py3
image with the
dependencies for the TensorRT Stable Diffusion demo installed.
Triton Inference Server + TensorRT OSS#
Clone Repository#
git clone https://github.com/triton-inference-server/tutorials.git --single-branch
cd tutorials/Popular_Models_Guide/StableDiffusion
Build Tritonserver Diffusion Docker Image#
./build.sh
Included Models#
The default
build includes model configuration files located in the
/diffusion-models
folder. Example configurations are provided for
stable_diffusion_1_5
and
stable_diffusion_xl
.
Model artifacts and engine files are not included in the image but are built into a volume mounted directory as a separate step.
Building and Running Stable Diffusion v 1.5#
Start Tritonserver Diffusion Container#
The following command starts a container and volume mounts the current
directory as workspace
.
./run.sh
Build Stable Diffusion v 1.5 Engine#
[!Note]
The model stable-diffusion-v1-5 requires login in to huggingface and acceptance of terms and conditions of use. Please set the environment variable HF_TOKEN accordingly.
./scripts/build_models.sh --model stable_diffusion_1_5
Expected Output#
diffusion-models
|-- stable_diffusion_1_5
| |-- 1
| | |-- 1.5-engine-batch-size-1
| | |-- 1.5-onnx
| | |-- 1.5-pytorch_model
| `-- config.pbtxt
Start a Server Instance#
[!Note] We use
EXPLICIT
model control mode for demonstration purposes to control which stable diffusion version is loaded. For production deployments please refer to Secure Deployment Considerations for more information on the risks associated withEXPLICIT
mode.
tritonserver --model-repository diffusion-models --model-control-mode explicit --load-model stable_diffusion_1_5
Expected Output#
<SNIP>
I0229 20:15:52.125050 749 server.cc:676]
+----------------------+---------+--------+
| Model | Version | Status |
+----------------------+---------+--------+
| stable_diffusion_1_5 | 1 | READY |
+----------------------+---------+--------+
<SNIP>
Building and Running Stable Diffusion XL#
Start Tritonserver Diffusion Container#
The following command starts a container and volume mounts the current
directory as workspace
.
./run.sh
Build Stable Diffusion XL Engine#
./scripts/build_models.sh --model stable_diffusion_xl
Expected Output#
diffusion-models
|-- stable_diffusion_xl
|-- 1
| |-- xl-1.0-engine-batch-size-1
| |-- xl-1.0-onnx
| `-- xl-1.0-pytorch_model
`-- config.pbtxt
Start a Server Instance#
[!Note] We use
EXPLICIT
model control mode for demonstration purposes to control which stable diffusion version is loaded. For production deployments please refer to Secure Deployment Considerations for more information on the risks associated withEXPLICIT
mode.
tritonserver --model-repository diffusion-models --model-control-mode explicit --load-model stable_diffusion_xl
Expected Output#
<SNIP>
I0229 20:22:22.912465 1440 server.cc:676]
+---------------------+---------+--------+
| Model | Version | Status |
+---------------------+---------+--------+
| stable_diffusion_xl | 1 | READY |
+---------------------+---------+--------+
<SNIP>
Sending an Inference Request#
We’ve provided a sample client application to make sending and receiving requests simpler.
Start Tritonserver Diffusion Container#
In a separate terminal from the server start a new container.
The following command starts a container and volume mounts the current
directory as workspace
.
./run.sh
Send Prompt to Stable Diffusion 1.5#
python3 client.py --model stable_diffusion_1_5 --prompt "butterfly in new york, 4k, realistic" --save-image
Example Output#
Client: 0 Throughput: 0.7201335361144658 Avg. Latency: 1.3677194118499756
Throughput: 0.7163933558221957 Total Time: 1.395881175994873
If --save-image
is given then output images will be saved as jpegs.
client_0_generated_image_0.jpg
Send Prompt to Stable Diffusion XL#
python3 client.py --model stable_diffusion_xl --prompt "butterfly in new york, 4k, realistic" --save-image
Example Output#
Client: 0 Throughput: 0.1825067711674996 Avg. Latency: 5.465569257736206
Throughput: 0.18224859609447058 Total Time: 5.487010717391968
If --save-image
is given then output images will be saved as jpegs.
client_0_generated_image_0.jpg
Sample Client#
The sample client application enables users to quickly test the diffusion models under different concurrency scenarios. For a full list and description of the client application’s options use:
python3 client.py --help
Sending Concurrent Requests#
To increase load and concurrency users can use the clients
and
requests
options to control the number of client processes and the
number of requests sent by each client.
Example: Ten Clients Sending Ten Requests Each#
The following command enables ten clients each sending ten requests. Each client is an independent process that sends its requests one after the other in parallel with the other nine clients.
python3 client.py --model stable_diffusion_xl --requests 10 --clients 10
Known Issues and Limitations#
The diffusion backend doesn’t yet support using an optional refiner model unlike the demo it’s based on. See also demo_txt2img_xl.py