Triton Inference Server Ray Serve Deployment#

Using the Triton Inference Server In-Process Python API you can integrate triton server based models into any Python framework including FastAPI and Ray Serve.

This directory contains an example Triton Inference Server Ray Serve deployment based on FastAPI.

| Installation | Run Deployment | Send Requests |

Installation#

The stable diffusion pipeline is based on the Popular_Models_Guide/StableDiffusion tutorial.

Clone Repository#

git clone https://github.com/triton-inference-server/tutorials.git
cd tutorials/Triton_Inference_Server_Python_API

Build Tritonserver Image and Stable Diffusion Models#

Please note the following command will take many minutes depending on your hardware configuration and network connection.

./build.sh --framework diffusion --build-models

Run Ray Serve Deployment#

Start Container#

The following command starts a container and volume mounts the current directory as workspace.

./run.sh --framework diffusion
cd examples/rayserve

Start Local Ray Cluster#

The following command starts a local Ray cluster. It also starts prometheus and grafana instances with default Ray and Ray Serve metrics and dashboards enabled.

./start_ray.sh

Run Deployment#

serve run tritonserver_deployment:deployment

Send Requests to Deployment#

The deployment includes two endpoints:

`/identity`#

The identity endpoint accepts a string and returns the same string.

Example Request#

curl --request GET "http://127.0.0.1:8000/identity?string_input=hello_world!"

Example Output#

"hello_world!"

`/generate`#

The generate endpoint accepts a prompt, generates an image based on the prompt using stable diffusion, and saves the image to a file.

Example Request#

curl --request GET "http://127.0.0.1:8000/generate?prompt=car,model-t,realistic,4k&filename=/workspace/examples/rayserve/car_sample.jpg"

Example Output#

car_sample

View Ray and Ray Serve Dashboards#

The Ray and Ray Serve dashboards are hosted on the default port and can be used to visualize various metrics:

<IP_ADDRESS>:8265

Stop the Ray Serve Cluster#

The following command stops the local Ray cluster and also stops prometheus and grafana instances.

./stop_ray.sh

Triton Inference Server Ray Serve Deployment#

Installation#

Clone Repository#

Build Tritonserver Image and Stable Diffusion Models#

Run Ray Serve Deployment#

Start Container#

Start Local Ray Cluster#

Run Deployment#

Send Requests to Deployment#

/identity#

Example Request#

Example Output#

/generate#

Example Request#

Example Output#

View Ray and Ray Serve Dashboards#

Stop the Ray Serve Cluster#

`/identity`#

`/generate`#