Open the SSH Console from the left pane, and navigate to the TMS helm chart directory.
cd tms-helm
If you would like to understand the configuration options we are setting for this TMS deployment, you can examine the
values.yaml
file usingcat
or your preferred text editor such asvim
ornano
. If you don’t want to examine the configuration options, you may proceed to the next step.Below is just an overview of the settings you would have to specify by editing the
values.yaml
. In this lab, these have already been set for you, so you may proceed to the next step to install TMS when you are ready.images.secrets[0]
should be set to a Kubernetes secret containing your NGC API Key, in our case, that is"ngc-container-pull"
NoteThe
ngc-container-pull
Kubernetes secret has already been created for you. Outside of this lab, you would have to create a secret using your NGC API key.
server.modelRepositories.volumes[0].repositoryName
can be set to anything. This is the name TMS will assign to our model repository, here we are usingvolume-models
.server.modelRepositories.volumes[0].volumeClaimName
should be set to the name of the Kubernetes persistent volume claim storing our models. The one we created earlier is calledhostpath-pvc
.
The
server.autoscaling.replicas
section should be set according to your cluster set up. Since we are using two GPU instances, we want to deploy a maximum of 2 Triton instances. Here is how ours is set up:
The autoscaling metrics settings specify which metrics you will be able to auto scale on. In this lab, we will only auto scale based on query queue time:
The Prometheus values
server.autoscaling.prometheus.podMonitorLabels
andserver.autoscaling.prometheus.ruleLabels
should be set to our Prometheus deploymentOutside of this lab, you will have to determine the values to set for
server.autoscaling.prometheus.podMonitorLabels
andserver.autoscaling.prometheus.ruleLabels
, which you can do by running the following commands to examine your Prometheus deployment:kubectl get prometheus -n monitoring prometheus-kube-prometheus-prometheus -o yaml
Scan the output YAML for the property
spec.podMonitorSelector.matchLabels
, this is the value to set forserver.autoscaling.prometheus.podMonitorLabels
invalues.yaml
Scan the output YAML for the property
spec.ruleSelector.matchLabels
, this is the value to set forserver.autoscaling.prometheus.ruleLabels
invalues.yaml
Deploy TMS using the values file we just edited and the helm chart TGZ file, which is located in the
tms-helm
directory.helm install tms -f values.yaml <TMS TGZ file>
Confirm that the TMS pod is running
$ kubectl get pods NAME READY STATUS RESTARTS AGE tms-7b986fc97-pv28g 2/2 Running 0 2m
Throughout this lab, we are going to be utilizing some inferencing and performance analysis tools that are included in the NVIDIA Triton SDK container. We are going to need to access these tools throughout the lab, so in preparation, we will deploy a triton-client pod.
First, return to our home direcory
cd ~
Next, create the manifest for our triton-client
pod
cat << 'EOF' >> triton-client.yaml
apiVersion: v1
kind: Pod
metadata:
name: triton-client
namespace: default
spec:
containers:
- args:
- sleep infinity
command:
- bash
- -c
image: nvcr.io/nvidia/tritonserver:23.03-py3-sdk
name: triton-client
imagePullSecrets:
- name: ngc-container-pull
EOF
Deploy the pod to the cluster
kubectl create -f triton-client.yaml