Triton Inference Server with DALI backend#

This is an example of DALI (Data Loading Library) backend for image classification on Triton Inference Server.

The classification is performed on the ResNet50 model, which is optimized by TensorRT.

Prerequisites#

  • Export ONNX and build TensorRT

    • nvcr.io/nvidia/pytorch:21.07-py3

  • Triton Inference Server with DALI Backend

    • nvcr.io/nvidia/tritonserver:21.07-py3

    • https://github.com/triton-inference-server/dali_backend. DALI Backend is included out-of-the-box in tritonserver, starting from 20.11 release.

  • Client

    • nvcr.io/nvidia/tritonserver:21.07-py3-sdk

Run Triton Inference Server#

The listing below visualizes, how should the model_repository look like, after proper setup:

model_repository
├── dali
│   ├── 1      └── model.dali
│   └── config.pbtxt
├── ensemble_dali_resnet50
│   ├── 1   └── config.pbtxt
└── resnet50_trt
    ├── 1
       └── model.plan
    ├── config.pbtxt
    └── labels.txt

Run the Triton server

docker run --gpus=all --rm -p8000:8000 -p8001:8001 -p8002:8002 -v$(pwd):/workspace/ -v/$(pwd)/model_repository:/models nvcr.io/nvidia/tritonserver:21.07-py3 tritonserver --model-repository=/models

Request image classification#

Create gRPC client via URL

triton_client = tritongrpcclient.InferenceServerClient(url=args.url, verbose=False)

Load raw image from numpy and configurate input and output with the name, shape and datatype.

inputs = []
outputs = []
input_name = "INPUT"
output_name = "OUTPUT"
image_data = load_image(args.image)
image_data = np.expand_dims(image_data, axis=0)

inputs.append(tritongrpcclient.InferInput(input_name, image_data.shape, "UINT8"))
outputs.append(tritongrpcclient.InferRequestedOutput(output_name))

inputs[0].set_data_from_numpy(image_data)

Request inference and obtain the results

results = triton_client.infer(model_name=args.model_name,
                                    inputs=inputs,
                                    outputs=outputs)
output0_data = results.as_numpy(output_name)

Run client.py with the path to image --image

wget https://raw.githubusercontent.com/triton-inference-server/server/master/qa/images/mug.jpg -O "mug.jpg"
docker run --rm --net=host -v $(pwd):/workspace/ nvcr.io/nvidia/tritonserver:21.07-py3-sdk python client.py --image mug.jpg 
0.02642226219177246ms class:COFFEE MUG