Triton Inference Server Overview

Triton Inference Server simplifies the deployment of AI models by serving inference requests at scale in production. It lets teams deploy trained AI models from any framework (TensorFlow, NVIDIA® TensorRT, PyTorch, ONNX Runtime, or custom) in addition to any local storage or cloud platform GPU- or CPU-based infrastructure (cloud, data center, or edge).

Model Repository

The model repository is a directory where we store the models deployed by the Triton Inference Server for Inference. A model repository is a folder that has the structure below. For more information about the Triton model, repository format, see here.

Important

With all the files explained below, the model repository has already been preloaded into the VM for the lab walkthrough.

Copy
Copied!
            

<model-repository-path>/ <model-name>/ [config.pbtxt] [<output-labels-file> ...] <version>/ <model-definition-file> <version>/ <model-definition-file> ... <model-name>/ [config.pbtxt] [<output-labels-file> ...] <version>/ <model-definition-file> <version>/ <model-definition-file>

TensorFlow Models

TensorFlow saves models can be saved into one of two formats: GraphDef or SavedModel. Triton supports both formats.

A TensorFlow GraphDef is a single file that by default must be named model.graphdef. A TensorFlow SavedModel is a directory containing multiple files. By default, the directory must be named model.savedmodel. The model configuration can override these default names using the default_model_filename property.

A minimal model repository for a TensorFlow SavedModel model is:

Copy
Copied!
            

<model-repository-path>/ <model-name>/ config.pbtxt 1/ model.savemodel/ <save-model files>

© Copyright 2022-2023, NVIDIA. Last updated on Jan 10, 2023.