Step #1: Set Up The Model Repository
Before we install TMS, we first have to configure the necessary cluster resources to allow TMS to access our model repository. This lab comes with a range of AI models pre-loaded, which are located in the
model_repository directory under the following folders:
These are NLP models for sensitive information detection (SID) used with the NVIDIA Morpheus Framework. The purpose of this model is to identify potentially sensitive information in network packet data as quickly as possible to limit exposure and take corrective action. Sensitive information is a broad term but can be generalized to any data that should be guarded from unauthorized access. Credit card numbers, passwords, authorization keys, and emails are all examples of sensitive information.
This folder contains a Triton ensemble. model made up of three components. These three components comprise a deployable Hugging Face transformer model used for generating text embeddings.
This folder contains a sample image recognition model from the Triton Infernce Server example model repository.
TMS has various support options for model repositories, such as NFS servers, AWS S3 storage, and more. You can find information about model repository options here. For this lab, we will store our models in a persistent volume on our cluster, where our models exist on the disk of the local machine for this lab.
This model repository setup is not a recommended configuration for multi-node production environments, since the models would need to be stored on each node separately in order for the persistent volume to always have access to the models.
Open the SSH Console from the left pane, and create the manifests for the persistent volume and persistent volume claim for our model repository:
cat << 'EOF' >> pv.yaml apiVersion: v1 kind: PersistentVolume metadata: name: pv-local labels: type: local spec: storageClassName: local capacity: storage: 5Gi accessModes: - ReadWriteMany persistentVolumeReclaimPolicy: Retain hostPath: path: "/home/nvidia/model_repository" EOF
cat << 'EOF' >> pvc.yaml apiVersion: v1 kind: PersistentVolumeClaim metadata: name: hostpath-pvc spec: storageClassName: local accessModes: - ReadWriteMany resources: requests: storage: 5Gi volumeName: pv-local selector: matchLabels: type: local EOF
Deploy the persistent volume and persistent volume claim to the cluster
kubectl create -f pv.yaml && kubectl create -f pvc.yaml
Confirm that our persistent volume and persistent volume claim have both been created
$ kubectl get pv NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE pv-local 5Gi RWX Retain Bound default/hostpath-pvc local 19s tms-docs 3Gi RWO Recycle Bound tms-lp-lab/tms-docs manual 75m
$ kubectl get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE hostpath-pvc Bound pv-local 5Gi RWX local 46s