Model repositories hold the model artifacts to be loaded into and served by the deployed Triton Inference Servers. Model repositories for Triton Management Service are similar in structure and content to Triton Inference Server model repositories, but there are different options and configurations for the available locations.
Typically, model repositories are configured by specifying the remote location of the repository and a Repository Name when you deploy TMS. The method of specifying the location of the repository is dependent on its type. TMS operations requiring references to the model repository (that is, lease creation requests) use the configured Repository Names. Several different types of model repositories are available.
TMS Configuration
HTTPS model repositories are not required to be pre-specified in the TMS values.yaml
file.
However, you can associate a Kubernetes Secret with a particular HTTP URL in the values.yaml
file, in which case TMS provides the contents of the secret in the Authorization
request header:
# values.yaml
server:
modelRepositories:
https:
- secretName: Name of the Kubernetes secret to read and provide as a Authorization header for download requests.
targetUri: URL of the remote web-sever in \<domain_label_or_ip_address\>/\<path\> format, used to determine if secrets apply to a model request or not.
The default values.yaml
file contains an example secret named “ngc-model-pull”.
The targetUri
is used to determine the secret best suited for use to download a given model based on the model’s URN.
URN matching is broken up into two parts:
Match the DNS label right to left, or absolute match of an IP Address. For example:
models.company.com
would matchcdn.models.company.com
, but would not matchmodels.cdn.company.com
.Match the path portion of URN from left-to-right. For example:
internal-cdn/repository
would matchinternal-cdn/repository/ai_models
, but would not matchinternal-cdn/ai_models/repository
.
To create a model-pull secret, use the following syntax:
kubectl create secret generic <secret-name> --from-file <secret-name>
Then add add your <secret-name>
to the values.yaml#server.modelRepositories.https
list with the corresponding targetUri
value.
Setting Up the Repository
Models in HTTPS repositories must be zipped versions of the directories in a Triton Model Repository. They must be served by a web server and accessible through HTTP GET requests.
For example, if the Triton model repository is structured as follows:
model_repository/
└── my_model
├── 1
│ └── model.onnx
└── config.pbtxt
Then you must serve a file, my_model.zip
, that contains one of the following file layouts:
$ unzip -l my_model.zip Archive: my_model.zip Length Date Time Name --------- ---------- ----- ---- 0 2022-04-28 22:27 1/ 356 2022-04-28 22:27 1/model.onnx 59 2022-06-01 21:12 config.pbtxt --------- ------- 415 3 files
$ unzip -l my_model.zip Archive: my_model.zip Length Date Time Name --------- ---------- ----- ---- 0 2022-07-08 00:23 my_model/ 59 2022-06-01 21:12 my_model/config.pbtxt 0 2022-04-28 22:27 my_model/1/ 356 2022-04-28 22:27 my_model/1/model.onnx --------- ------- 415 4 files
The my_model.zip
file, and any other zip files with a similar structure, can be served by a wide variety of web servers. One approach is to use the http.server
module in the Python standard library. In a directory containing the zip file, execute the following command:
python -m http.server --directory .
This serves the model with a URI http://localhost:8000/my_models.zip
.
Model URI
To refer to a model in an HTTPS repository, use the full URL of the server. For example:
tmsctl lease create -t ${tms_address} -m "name=my_model,uri=http://www.example.com/models/my_model.zip"
TMS Configuration
TMS enables TMS administrators to provide model repositories from Kubernetes Persistent Volume Claims for requested Triton instances.
To enable requested Triton instances to load models from a persistent volume claim, provide the name of the particular Kubernetes persistent volume claim in an entry under values.yaml#server.modelRepositories.volumes
, along with a valid name for the repository. The Persistent Volume Claim is then mounted as a volume onto any Triton pod launched by TMS.
# values.yaml
server:
modelRepositories:
volumes:
# Name used to reference this model repository as part of lease acquisition.
# May contain only lowercase alphanumeric characters (without spaces, hyphens `-` are permitted).
- repositoryName: volume-models
# Kubernetes persistent volume claim (pvc) used to fetch models.
volumeClaimName: example-volume-claim
Setting Up the Repository
Persistent Volumes in Kubernetes are cluster resources that can be consumed. A Persistent Volume Claim (PVC) is a particular request to use that resource. Because model repositories, in TMS, are used by multiple Triton instances, you must create a specific PVC for your repository that can then be mounted onto multiple pods.
One way to set up the repository is to create the model repository outside of Kubernetes in storage that can be consumed as a Persistent Volume. Define that Persistent Volume, and then attach a Persistent Volume Claim to it that allows Kubernetes pods to consume it. For an example see the NFS Model Repository path in the quickstart guide.
Typically, Persistent Volume Claims are exposed directly as file systems, so to create a model repository you can use the same structure as a Triton Inference Server model repository. For example:
model_repository/
└── my_model
├── 1
│ └── model.onnx
└── config.pbtxt
See the following resources for creating Persistent Volumes and Claims backed by various types of storage:
NFS: TMS Quickstart Guide
AWS Elastic Block Storage: AWS Documentation. Only supported on Amazon EKS clusters.
Azure Blob Storage: Azure Documentation. Only supported on Azure Kubernetes Service clusters.
Azure Files: Azure Documentation. Only supported on Azure Kubernetes Service clusters.
Model URI
To refer to a model in a PVC repository, prefix the model name with model://
and use the name of the model repository that is configured in the values.yaml
file. For example:
tmsctl lease create -t ${tms_address} -m "name=my_model,uri=model://volume-models/my_model"
TMS Configuration
To configure access to an S3 compatible object store, you must specify a Repository Name, a Bucket Name, and an S3 service Endpoint.
#values.yaml
server:
modelRepositories:
s3:
# Name used to reference this model repository as part of lease acquisition.
# May contain only lowercase alphanumeric characters without spaces. Hyphens `-` are permitted.
- repositoryName: repo0
# Name of the S3 bucket used to fetch models.
bucketName: tms-models
# Service URL of the S3 bucket.
# If both 'endpoint' and 'awsRegion' fields are specified, TMS defaults to using the value from 'endpoint'.
# Must be a valid URL designating an existing endpoint (eg. "http:/s3.us-west-2.amazonaws.com" or "http:/play.min.io:9000").
# To learn more, see: https://docs.aws.amazon.com/general/latest/gr/s3.html#amazon_s3_website_endpoints.
endpoint: "https://s3.us-west-2.amazonaws.com"
If your S3 Object Store is an actual AWS S3 bucket, you can provide the AWS Region of your bucket instead of the explicit endpoint. For example:
#values.yaml
server:
modelRepositories:
s3:
- repositoryName: repo0
bucketName: tms-models
# Service region code of AWS S3 bucket.
# Field is for S3 buckets exclusively deployed through AWS.
# Non-AWS S3 buckets must be configured through the `endpoint` field.
# Must be a valid code designating to existing AWS region (eg. "us-west-2").
# To learn more, see: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html).
awsRegion: "us-west-2"
If your model repository is in a private S3 bucket that requires access credentials, you have two options.
You can create a Kubernetes Secrets containing an access key ID and one containing a secret access key that represent the authority to list and retrieve the objects in the bucket. Then, you specify those secrets in the
values.yaml
file, for example:
#values.yaml
server:
modelRepositories:
S3:
- repositoryName: repo0
bucketName: tms-models
endpoint: "https://s3.us-west-2.amazonaws.com"
# Name of the Kubernetes secret to read and provide as the access key ID to download objects from the S3 bucket.
# Optional value when IAM or default AWS environment variables are not used for authorizing TMS to read from an S3 bucket.
accessKey: "access-key-secret-name"
# Name of the Kubernetes secret containing the secret access key to read from the S3 bucket
# Optional value when IAM or default AWS environment variables are not used for authorizing TMS to read from an S3 bucket.
secretKey: "secret-key-secret-name"
If you are using AWS S3 buckets and TMS is to be deployed on EKS, you can associate an AWS IAM role, which has
s3:ListBucket
ands3:GetObject
permissions for that bucket, with the TMS Kubernetes service account. You can do this by providing the Amazon Resource Name of that IAM role in thevalues.yaml
file.
#values.yaml
server:
security:
aws:
# AWS IAM role used read models S3 buckets configured in `modelRepositories.S3`.
role: arn:aws:iam::00000000:role/Tms-s3-role
You should also ensure that the role you provide here has a trust policy that allows the tms-triton
service account to assume that role. For example, you can create this IAM role with the following eksctl
command:
eksctl create iamserviceaccount --cluster tms-cluster --name=tms-server --attach-policy-arn=arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess --role-only --role-name=Tms-s3-role --approve
You do not have to provide the key ID and secret key when you use this option.
See the documentation on Configuring a Kubernetes service account to assume an IAM role to learn more.
Setting Up the Repository
S3 model repositories must be set organized into folders that are similar to the following structure:
tms-models #bucket name
└── my_model #S3 folder
├── 1
│ └── model.onnx
└── config.pbtxt
All model folders (like the my_model
folder above) must be at the top level of your bucket, or contained in a single parent directory. If your model repository is not at the top level folder of your bucket, you must include the full path when referring to the model in lease
commands.
You must also ensure that you have an IAM role available that has access to the bucket (and folder) containing the models, or that the bucket is publicly accessible.
Model URI
To refer to a model in an S3 repository, prefix the model name with model://
and the name of the model repository that is configured in the values.yaml
file. Internally TMS resolves this to the correct S3 URL. For example:
tmsctl lease create -t ${tms_address} -m "name=my_model,uri=model://aws-models/my_model"