Triton Inference Server Overview
Now that you have has successfully trained and saved the model, the next step is to deploy the model to Triton Inference Server. Within this next section of the lab, you will become familiar with the key elements for successfully deploying trained models to Triton Inference Server for Image Classification. We are leveraging the same VM to train the model and run Triton Inference Server for this lab.
Triton Inference Server
Triton Inference Server simplifies the deployment of AI models by serving inference requests at scale in production. It lets teams deploy trained AI models from any framework (TensorFlow, NVIDIA® TensorRT, PyTorch, ONNX Runtime, or custom) in addition to any local storage or cloud platform GPU- or CPU-based infrastructure (cloud, data center, or edge).
Model Repository
The model repository is a directory where we store the models deployed by the Triton Inference Server for Inference. A model repository is a folder that has the structure below. For more information about the Triton model, repository format, see here.
<model-repository-path>/
<model-name>/
[config.pbtxt]
[<output-labels-file> ...]
<version>/
<model-definition-file>
<version>/
<model-definition-file>
...
<model-name>/
[config.pbtxt]
[<output-labels-file> ...]
<version>/
<model-definition-file>
<version>/
<model-definition-file>