Before we install TMS, we first have to configure the necessary cluster resources to allow TMS to access our model repository. This lab comes with a range of AI models pre-loaded, which are located in the model_repository
directory under the following folders:
- sid
- text-embeddings
- image-rec
These are NLP models for sensitive information detection (SID) used with the NVIDIA Morpheus Framework. The purpose of this model is to identify potentially sensitive information in network packet data as quickly as possible to limit exposure and take corrective action. Sensitive information is a broad term but can be generalized to any data that should be guarded from unauthorized access. Credit card numbers, passwords, authorization keys, and emails are all examples of sensitive information.
This folder contains a Triton ensemble. model made up of three components. These three components comprise a deployable Hugging Face transformer model used for generating text embeddings.
This folder contains a sample image recognition model from the Triton Infernce Server example model repository.
TMS has various support options for model repositories, such as NFS servers, AWS S3 storage, and more. You can find information about model repository options here. For this lab, we will store our models in a persistent volume on our cluster, where our models exist on the disk of the local machine for this lab.
This model repository setup is not a recommended configuration for multi-node production environments, since the models would need to be stored on each node separately in order for the persistent volume to always have access to the models.
Open the SSH Console from the left pane, and create the manifests for the persistent volume and persistent volume claim for our model repository:
cat << 'EOF' >> pv.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv-local
labels:
type: local
spec:
storageClassName: local
capacity:
storage: 5Gi
accessModes:
- ReadWriteMany
persistentVolumeReclaimPolicy: Retain
hostPath:
path: "/home/nvidia/model_repository"
EOF
cat << 'EOF' >> pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: hostpath-pvc
spec:
storageClassName: local
accessModes:
- ReadWriteMany
resources:
requests:
storage: 5Gi
volumeName: pv-local
selector:
matchLabels:
type: local
EOF
Deploy the persistent volume and persistent volume claim to the cluster
kubectl create -f pv.yaml && kubectl create -f pvc.yaml
Confirm that our persistent volume and persistent volume claim have both been created
$ kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pv-local 5Gi RWX Retain Bound default/hostpath-pvc local 19s
tms-docs 3Gi RWO Recycle Bound tms-lp-lab/tms-docs manual 75m
$ kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
hostpath-pvc Bound pv-local 5Gi RWX local 46s