Now that you have has successfully trained and saved the model, the next step is to deploy the model to Triton Inference Server. Within this next section of the lab, you will become familiar with the key elements for successfully deploying trained models to Triton Inference Server for image classification. We are leveraging the same VM to train the model and run Triton Inference Server for this lab.

Triton Inference Server

Triton Inference Server simplifies the deployment of AI models by serving inference requests at scale in production. It lets teams deploy trained AI models from any framework (TensorFlow, NVIDIA® TensorRT, PyTorch, ONNX Runtime, or custom) in addition to any local storage or cloud platform GPU- or CPU-based infrastructure (cloud, data center, or edge).

Model Repository

The model repository is a directory where we store the models deployed by the Triton Inference Server for Inference. A model repository is a folder that has the structure below. For more information about the Triton model, repository format, see here.

Important

With all the files explained below, the model repository has already been preloaded into the VM for the lab walkthrough.

Copy
Copied!

            
            <model-repository-path>/
<model-name>/
  [config.pbtxt]
  [<output-labels-file> ...]
  <version>/
    <model-definition-file>
  <version>/
    <model-definition-file>
  ...
<model-name>/
  [config.pbtxt]
  [<output-labels-file> ...]
  <version>/
    <model-definition-file>
  <version>/
    <model-definition-file>

TensorFlow Models

TensorFlow saves models can be saved into one of two formats: GraphDef or SavedModel. Triton supports both formats.

A TensorFlow GraphDef is a single file that by default must be named model.graphdef. A TensorFlow SavedModel is a directory containing multiple files. By default, the directory must be named model.savedmodel. The model configuration can override these default names using the default_model_filename property.

A minimal model repository for a TensorFlow SavedModel model is:

Copy
Copied!

            
            <model-repository-path>/
  <model-name>/
    config.pbtxt
    1/
     model.savemodel/
       <save-model files>

Triton Inference Server Overview

Deploy the Trained Model on the Triton Inference Server