Model Repository
Contents
Model Repository#
The Triton Inference Server serves models from one or more model repositories that are specified when the server is started. While Triton is running, the models being served can be modified as described in Model Management.
Repository Layout#
These repository paths are specified when Triton is started using the –model-repository option. The –model-repository option can be specified multiple times to included models from multiple repositories. The directories and files that compose a model repository must follow a required layout. Assuming a repository path is specified as follows.
$ tritonserver --model-repository=<model-repository-path>
The corresponding repository layout must be:
<model-repository-path>/
<model-name>/
[config.pbtxt]
[<output-labels-file> ...]
<version>/
<model-definition-file>
<version>/
<model-definition-file>
...
<model-name>/
[config.pbtxt]
[<output-labels-file> ...]
<version>/
<model-definition-file>
<version>/
<model-definition-file>
...
...
Within the top-level model repository directory there must be zero or
more
Each
Model Repository Locations#
Triton can access models from one or more locally accessible file paths, from Google Cloud Storage, from Amazon S3, and from Azure Storage.
Local File System#
For a locally accessible file-system the absolute path must be specified.
$ tritonserver --model-repository=/path/to/model/repository ...
Cloud Storage with Environment variables#
Google Cloud Storage#
For a model repository residing in Google Cloud Storage, the repository path must be prefixed with gs://.
$ tritonserver --model-repository=gs://bucket/path/to/model/repository ...
S3#
For a model repository residing in Amazon S3, the path must be prefixed with s3://.
$ tritonserver --model-repository=s3://bucket/path/to/model/repository ...
For a local or private instance of S3, the prefix s3:// must be followed by the host and port (separated by a semicolon) and subsequently the bucket path.
$ tritonserver --model-repository=s3://host:port/bucket/path/to/model/repository ...
By default, Triton uses HTTP to communicate with your instance of S3. If your instance of S3 supports HTTPS and you wish for Triton to use the HTTPS protocol to communicate with it, you can specify the same in the model repository path by prefixing the host name with https://.
$ tritonserver --model-repository=s3://https://host:port/bucket/path/to/model/repository ...
When using S3, the credentials and default region can be passed by using either the aws config command or via the respective environment variables. If the environment variables are set they will take a higher priority and will be used by Triton instead of the credentials set using the aws config command.
Azure Storage#
For a model repository residing in Azure Storage, the repository path must be prefixed with as://.
$ tritonserver --model-repository=as://account_name/container_name/path/to/model/repository ...
When using Azure Storage, you must set the AZURE_STORAGE_ACCOUNT
and AZURE_STORAGE_KEY
environment variables to an account that has access to the Azure Storage repository.
If you don’t know your AZURE_STORAGE_KEY
and have your Azure CLI correctly configured,
here’s an example of how to find a key corresponding to your AZURE_STORAGE_ACCOUNT
:
$ export AZURE_STORAGE_ACCOUNT="account_name"
$ export AZURE_STORAGE_KEY=$(az storage account keys list -n $AZURE_STORAGE_ACCOUNT --query "[0].value")
Cloud Storage with Credential file (Beta)#
This feature is currently in beta and may be subject to change.
To group the credentials into a single file for Triton, you may set the
TRITON_CLOUD_CREDENTIAL_PATH
environment variable to a path pointing to a
JSON file of the following format, residing in the local file system.
export TRITON_CLOUD_CREDENTIAL_PATH="cloud_credential.json"
“cloud_credential.json”:
{
"gs": {
"": "PATH_TO_GOOGLE_APPLICATION_CREDENTIALS",
"gs://gcs-bucket-002": "PATH_TO_GOOGLE_APPLICATION_CREDENTIALS_2"
},
"s3": {
"": {
"secret_key": "AWS_SECRET_ACCESS_KEY",
"key_id": "AWS_ACCESS_KEY_ID",
"region": "AWS_DEFAULT_REGION",
"session_token": "",
"profile": ""
},
"s3://s3-bucket-002": {
"secret_key": "AWS_SECRET_ACCESS_KEY_2",
"key_id": "AWS_ACCESS_KEY_ID_2",
"region": "AWS_DEFAULT_REGION_2",
"session_token": "AWS_SESSION_TOKEN_2",
"profile": "AWS_PROFILE_2"
}
},
"as": {
"": {
"account_str": "AZURE_STORAGE_ACCOUNT",
"account_key": "AZURE_STORAGE_KEY"
},
"as://Account-002/Container": {
"account_str": "",
"account_key": ""
}
}
}
To match a credential, the longest matching credential name against the start
of a given path is used. For example: gs://gcs-bucket-002/model_repository
will match the “gs://gcs-bucket-002” GCS credential, and
gs://any-other-gcs-bucket
will match the “” GCS credential.
This feature is intended for use-cases which multiple credentials are needed for each cloud storage provider. Be sure to replace any credential paths/keys with the actual paths/keys from the example above.
If the TRITON_CLOUD_CREDENTIAL_PATH
environment variable is not set, the
Cloud Storage with Environment variables
will be used.
Model Versions#
Each model can have one or more versions available in the model repository. Each version is stored in its own, numerically named, subdirectory where the name of the subdirectory corresponds to the version number of the model. The subdirectories that are not numerically named, or have names that start with zero (0) will be ignored. Each model configuration specifies a version policy that controls which of the versions in the model repository are made available by Triton at any given time.
Model Files#
The contents of each model version sub-directory is determined by the type of the model and the requirements of the backend that supports the model.
TensorRT Models#
A TensorRT model definition is called a Plan. A TensorRT Plan is a single file that by default must be named model.plan. This default name can be overridden using the default_model_filename property in the model configuration.
A TensorRT Plan is specific to a GPU’s CUDA Compute Capability. As a result, TensorRT models will need to set the cc_model_filenames property in the model configuration to associate each Plan file with the corresponding Compute Capability.
A minimal model repository for a TensorRT model is:
<model-repository-path>/
<model-name>/
config.pbtxt
1/
model.plan
ONNX Models#
An ONNX model is a single file or a directory containing multiple files. By default the file or directory must be named model.onnx. This default name can be overridden using the default_model_filename property in the model configuration.
Triton supports all ONNX models that are supported by the version of ONNX Runtime being used by Triton. Models will not be supported if they use a stale ONNX opset version or contain operators with unsupported types.
A minimal model repository for a ONNX model contained in a single file is:
<model-repository-path>/
<model-name>/
config.pbtxt
1/
model.onnx
An ONNX model composed from multiple files must be contained in a directory. By default this directory must be named model.onnx but can be overridden using the default_model_filename property in the model configuration. The main model file within this directory must be named model.onnx. A minimal model repository for a ONNX model contained in a directory is:
<model-repository-path>/
<model-name>/
config.pbtxt
1/
model.onnx/
model.onnx
<other model files>
TorchScript Models#
An TorchScript model is a single file that by default must be named model.pt. This default name can be overridden using the default_model_filename property in the model configuration. It is possible that some models traced with different versions of PyTorch may not be supported by Triton due to changes in the underlying opset.
A minimal model repository for a TorchScript model is:
<model-repository-path>/
<model-name>/
config.pbtxt
1/
model.pt
TensorFlow Models#
TensorFlow saves models in one of two formats: GraphDef or SavedModel. Triton supports both formats.
A TensorFlow GraphDef is a single file that by default must be named model.graphdef. A TensorFlow SavedModel is a directory containing multiple files. By default the directory must be named model.savedmodel. These default names can be overridden using the default_model_filename property in the model configuration.
A minimal model repository for a TensorFlow GraphDef model is:
<model-repository-path>/
<model-name>/
config.pbtxt
1/
model.graphdef
A minimal model repository for a TensorFlow SavedModel model is:
<model-repository-path>/
<model-name>/
config.pbtxt
1/
model.savedmodel/
<saved-model files>
OpenVINO Models#
An OpenVINO model is represented by two files, a *.xml and *.bin file. By default the *.xml file must be named model.xml. This default name can be overridden using the default_model_filename property in the model configuration.
A minimal model repository for an OpenVINO model is:
<model-repository-path>/
<model-name>/
config.pbtxt
1/
model.xml
model.bin
Python Models#
The Python backend allows you to run Python code as a model within Triton. By default the Python script must be named model.py but this default name can be overridden using the default_model_filename property in the model configuration.
A minimal model repository for a Python model is:
<model-repository-path>/
<model-name>/
config.pbtxt
1/
model.py
DALI Models#
The DALI backend
allows you to run a DALI pipeline as
a model within Triton. In order to use this backend, you need to generate
a file, by default named model.dali
, and include it in your model repository.
Please refer to DALI backend documentation
for the
description, how to generate model.dali
. The default model file name can be
overridden using the default_model_filename property in the
model configuration.
A minimal model repository for a DALI model is:
<model-repository-path>/
<model-name>/
config.pbtxt
1/
model.dali