Triton Inference Server Overview
Triton Inference Server simplifies the deployment of AI models by serving inference requests at scale in production. It lets teams deploy trained AI models from any framework (TensorFlow, NVIDIA® TensorRT, PyTorch, ONNX Runtime, or custom) in addition to any local storage or cloud platform GPU- or CPU-based infrastructure (cloud, data center, or edge).
The model repository is a directory where we store the models deployed by the Triton Inference Server for Inference. A model repository is a folder that has the structure below. For more information about the Triton model, repository format, see here.
With all the files explained below, the model repository has already been preloaded into the VM for the lab walkthrough.
<model-repository-path>/ <model-name>/ [config.pbtxt] [<output-labels-file> ...] <version>/ <model-definition-file> <version>/ <model-definition-file> ... <model-name>/ [config.pbtxt] [<output-labels-file> ...] <version>/ <model-definition-file> <version>/ <model-definition-file>
TensorFlow saves models can be saved into one of two formats: GraphDef or SavedModel. Triton supports both formats.
A TensorFlow GraphDef is a single file that by default must be named model.graphdef. A TensorFlow SavedModel is a directory containing multiple files. By default, the directory must be named model.savedmodel. The model configuration can override these default names using the default_model_filename property.
A minimal model repository for a TensorFlow SavedModel model is:
<model-repository-path>/ <model-name>/ config.pbtxt 1/ model.savemodel/ <save-model files>