Model Repositories - NVIDIA Docs

Model repositories hold the model artifacts to be loaded into and served by the deployed Triton Inference Servers. Model repositories for Triton Management Service are similar in structure and content to Triton Inference Server model repositories, but there are different options and configurations for the available locations.

Typically, model repositories are configured by specifying the remote location of the repository and a Repository Name when you deploy TMS. The method of specifying the location of the repository is dependent on its type. TMS operations requiring references to the model repository (that is, lease creation requests) use the configured Repository Names. Several different types of model repositories are available.

HTTPS

TMS Configuration

HTTPS model repositories are not required to be pre-specified in the TMS values.yaml file. However, you can associate a Kubernetes Secret with a particular HTTP URL in the values.yaml file, in which case TMS provides the contents of the secret in the Authorization request header:

Copy
Copied!

            
            # values.yaml
server:
  modelRepositories:
    https:
    - secretName: Name of the Kubernetes secret to read and provide as a Authorization header for download requests.
      targetUri: URL of the remote web-sever in \<domain_label_or_ip_address\>/\<path\> format, used to determine if secrets apply to a model request or not.

The default values.yaml file contains an example secret named “ngc-model-pull”.

The targetUri is used to determine the secret best suited for use to download a given model based on the model’s URN. URN matching is broken up into two parts:

Match the DNS label right to left, or absolute match of an IP Address. For example: models.company.com would match cdn.models.company.com, but would not match models.cdn.company.com.
Match the path portion of URN from left-to-right. For example: internal-cdn/repository would match internal-cdn/repository/ai_models, but would not match internal-cdn/ai_models/repository.

To create a model-pull secret, use the following syntax:

Copy
Copied!

            
            kubectl create secret generic <secret-name> --from-file <secret-name>

Then add add your <secret-name> to the values.yaml#server.modelRepositories.https list with the corresponding targetUri value.

Setting Up the Repository

Models in HTTPS repositories must be zipped versions of the directories in a Triton Model Repository. They must be served by a web server and accessible through HTTP GET requests.

For example, if the Triton model repository is structured as follows:

Copy
Copied!

            
            model_repository/
└── my_model
    ├── 1
    │   └── model.onnx
    └── config.pbtxt

Then you must serve a file, my_model.zip, that contains one of the following file layouts:

Copy
Copied!

            
            $ unzip -l my_model.zip
Archive:  my_model.zip
Length      Date    Time    Name
---------  ---------- -----   ----
0  2022-04-28 22:27   1/
356  2022-04-28 22:27   1/model.onnx
59  2022-06-01 21:12   config.pbtxt
---------                     -------
415                     3 files

Copy
Copied!

            
            $ unzip -l my_model.zip
Archive:  my_model.zip
Length      Date    Time    Name
---------  ---------- -----   ----
0  2022-07-08 00:23   my_model/
59  2022-06-01 21:12   my_model/config.pbtxt
0  2022-04-28 22:27   my_model/1/
356  2022-04-28 22:27   my_model/1/model.onnx
---------                     -------
415                     4 files

The my_model.zip file, and any other zip files with a similar structure, can be served by a wide variety of web servers. One approach is to use the http.server module in the Python standard library. In a directory containing the zip file, execute the following command:

Copy
Copied!

            
            python -m http.server --directory .

This serves the model with a URI http://localhost:8000/my_models.zip.

Model URI

To refer to a model in an HTTPS repository, use the full URL of the server. For example:

Copy
Copied!

            
            tmsctl lease create -t ${tms_address} -m "name=my_model,uri=http://www.example.com/models/my_model.zip"

Persistent Volume Claim

TMS Configuration

TMS enables TMS administrators to provide model repositories from Kubernetes Persistent Volume Claims for requested Triton instances.

To enable requested Triton instances to load models from a persistent volume claim, provide the name of the particular Kubernetes persistent volume claim in an entry under values.yaml#server.modelRepositories.volumes, along with a valid name for the repository. The Persistent Volume Claim is then mounted as a volume onto any Triton pod launched by TMS.

Copy
Copied!

            
            # values.yaml
server:
  modelRepositories:
    volumes:

      # Name used to reference this model repository as part of lease acquisition.
      # May contain only lowercase alphanumeric characters (without spaces, hyphens `-` are permitted).
    - repositoryName: volume-models

      # Kubernetes persistent volume claim (pvc) used to fetch models.
      volumeClaimName: example-volume-claim

Setting Up the Repository

Persistent Volumes in Kubernetes are cluster resources that can be consumed. A Persistent Volume Claim (PVC) is a particular request to use that resource. Because model repositories, in TMS, are used by multiple Triton instances, you must create a specific PVC for your repository that can then be mounted onto multiple pods.

One way to set up the repository is to create the model repository outside of Kubernetes in storage that can be consumed as a Persistent Volume. Define that Persistent Volume, and then attach a Persistent Volume Claim to it that allows Kubernetes pods to consume it. For an example see the NFS Model Repository path in the quickstart guide.

Typically, Persistent Volume Claims are exposed directly as file systems, so to create a model repository you can use the same structure as a Triton Inference Server model repository. For example:

Copy
Copied!

            
            model_repository/
└── my_model
    ├── 1
    │   └── model.onnx
    └── config.pbtxt

See the following resources for creating Persistent Volumes and Claims backed by various types of storage:

NFS: TMS Quickstart Guide
AWS Elastic Block Storage: AWS Documentation. Only supported on Amazon EKS clusters.
Azure Blob Storage: Azure Documentation. Only supported on Azure Kubernetes Service clusters.
Azure Files: Azure Documentation. Only supported on Azure Kubernetes Service clusters.

Model URI

To refer to a model in a PVC repository, prefix the model name with model:// and use the name of the model repository that is configured in the values.yaml file. For example:

Copy
Copied!

            
            tmsctl lease create -t ${tms_address} -m "name=my_model,uri=model://volume-models/my_model"

S3 Object Store

TMS Configuration

To configure access to an S3 compatible object store, you must specify a Repository Name, a Bucket Name, and an S3 service Endpoint.

Copy
Copied!

            
            #values.yaml
server:
  modelRepositories:
    s3:
      # Name used to reference this model repository as part of lease acquisition.
      # May contain only lowercase alphanumeric characters without spaces. Hyphens `-` are permitted.
    - repositoryName: repo0

      # Name of the S3 bucket used to fetch models.
      bucketName: tms-models

      # Service URL of the S3 bucket.
      # If both 'endpoint' and 'awsRegion' fields are specified, TMS defaults to using the value from 'endpoint'.
      # Must be a valid URL designating an existing endpoint (eg. "http:/s3.us-west-2.amazonaws.com" or "http:/play.min.io:9000").
      # To learn more, see: https://docs.aws.amazon.com/general/latest/gr/s3.html#amazon_s3_website_endpoints.
      endpoint: "https://s3.us-west-2.amazonaws.com"

If your S3 Object Store is an actual AWS S3 bucket, you can provide the AWS Region of your bucket instead of the explicit endpoint. For example:

Copy
Copied!

            
            #values.yaml
server:
  modelRepositories:
    s3:
    - repositoryName: repo0
      bucketName: tms-models

      # Service region code of AWS S3 bucket.
      # Field is for S3 buckets exclusively deployed through AWS.
      # Non-AWS S3 buckets must be configured through the `endpoint` field.
      # Must be a valid code designating to existing AWS region (eg. "us-west-2").
      # To learn more, see: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html).
      awsRegion: "us-west-2"

If your model repository is in a private S3 bucket that requires access credentials, you have two options.

You can create a Kubernetes Secrets containing an access key ID and one containing a secret access key that represent the authority to list and retrieve the objects in the bucket. Then, you specify those secrets in the values.yaml file, for example:

Copy
Copied!

            
            #values.yaml
server:
  modelRepositories:
    S3:
    - repositoryName: repo0
      bucketName: tms-models
      endpoint: "https://s3.us-west-2.amazonaws.com"

      # Name of the Kubernetes secret to read and provide as the access key ID to download objects from the S3 bucket.
      # Optional value when IAM or default AWS environment variables are not used for authorizing TMS to read from an S3 bucket.
      accessKey: "access-key-secret-name"

      # Name of the Kubernetes secret containing the secret access key to read from the S3 bucket
      # Optional value when IAM or default AWS environment variables are not used for authorizing TMS to read from an S3 bucket.
      secretKey: "secret-key-secret-name"

If you are using AWS S3 buckets and TMS is to be deployed on EKS, you can associate an AWS IAM role, which has s3:ListBucket and s3:GetObject permissions for that bucket, with the TMS Kubernetes service account. You can do this by providing the Amazon Resource Name of that IAM role in the values.yaml file.

Copy
Copied!

            
            #values.yaml
server:
  security:
    aws:
      # AWS IAM role used read models S3 buckets configured in `modelRepositories.S3`.
      role: arn:aws:iam::00000000:role/Tms-s3-role

You should also ensure that the role you provide here has a trust policy that allows the tms-triton service account to assume that role. For example, you can create this IAM role with the following eksctl command:

Copy
Copied!

            
            eksctl create iamserviceaccount --cluster tms-cluster --name=tms-server --attach-policy-arn=arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess --role-only --role-name=Tms-s3-role --approve

Note

You do not have to provide the key ID and secret key when you use this option.

See the documentation on Configuring a Kubernetes service account to assume an IAM role to learn more.

Setting Up the Repository

S3 model repositories must be set organized into folders that are similar to the following structure:

Copy
Copied!

            
            tms-models #bucket name
└── my_model #S3 folder
    ├── 1
    │   └── model.onnx
    └── config.pbtxt

All model folders (like the my_model folder above) must be at the top level of your bucket, or contained in a single parent directory. If your model repository is not at the top level folder of your bucket, you must include the full path when referring to the model in lease commands.

You must also ensure that you have an IAM role available that has access to the bucket (and folder) containing the models, or that the bucket is publicly accessible.

Model URI

To refer to a model in an S3 repository, prefix the model name with model:// and the name of the model repository that is configured in the values.yaml file. Internally TMS resolves this to the correct S3 URL. For example:

Copy
Copied!

            
            tmsctl lease create -t ${tms_address} -m "name=my_model,uri=model://aws-models/my_model"