Backend-Platform Support Matrix#

Even though Triton supports inference across various platforms such as cloud, data center, edge and embedded devices on NVIDIA GPUs, x86 and ARM CPU, or AWS Inferentia, it does so by relying on the backends. Note that not all Triton backends support every platform. The purpose of this document is to go over what all compute platforms are supported by each of these Triton backends. GPU in this document refers to Nvidia GPU. See GPU, Driver, and CUDA Support Matrix to learn more about supported GPUs.

Ubuntu 22.04#

The table below describes target device(s) supported for inference by each backend on different platforms.

Backend

x86

ARM-SBSA

TensorRT

:heavy_check_mark: GPU
:x: CPU

:heavy_check_mark: GPU
:x: CPU

ONNX Runtime

:heavy_check_mark: GPU
:heavy_check_mark: CPU

:heavy_check_mark: GPU
:heavy_check_mark: CPU

TensorFlow

:heavy_check_mark: GPU
:heavy_check_mark: CPU

:heavy_check_mark: GPU
:heavy_check_mark: CPU

PyTorch

:heavy_check_mark: GPU
:heavy_check_mark: CPU

:heavy_check_mark: GPU
:heavy_check_mark: CPU

OpenVINO

:x: GPU
:heavy_check_mark: CPU

:x: GPU
:x: CPU

Python[1]

:heavy_check_mark: GPU
:heavy_check_mark: CPU

:heavy_check_mark: GPU
:heavy_check_mark: CPU

DALI

:heavy_check_mark: GPU
:heavy_check_mark: CPU

:heavy_check_mark: GPU[2]
:heavy_check_mark: CPU[2]

FIL

:heavy_check_mark: GPU
:heavy_check_mark: CPU

Unsupported

TensorRT-LLM

:heavy_check_mark: GPU
:x: CPU

:heavy_check_mark: GPU
:x: CPU

vLLM

:heavy_check_mark: GPU
:heavy_check_mark: CPU

Unsupported

Windows 10#

Only TensorRT and ONNX Runtime backends are supported on Windows.

Backend

x86

ARM-SBSA

TensorRT

:heavy_check_mark: GPU
:x: CPU

:heavy_check_mark: GPU
:x: CPU

ONNX Runtime

:heavy_check_mark: GPU
:heavy_check_mark: CPU

:heavy_check_mark: GPU
:heavy_check_mark: CPU

Jetson JetPack#

Following backends are currently supported on Jetson Jetpack:

Backend

Jetson

TensorRT

:heavy_check_mark: GPU
:x: CPU

ONNX Runtime

:heavy_check_mark: GPU
:heavy_check_mark: CPU

TensorFlow

:heavy_check_mark: GPU
:heavy_check_mark: CPU

PyTorch

:heavy_check_mark: GPU
:heavy_check_mark: CPU

Python[1]

:x: GPU
:heavy_check_mark: CPU

Look at the Triton Inference Server Support for Jetson and JetPack.

AWS Inferentia#

Currently, inference on AWS Inferentia is only supported via python backend where the deployed python script invokes AWS Neuron SDK.