Step #3: Install TMS

AI Model Orchestration with Triton Management Service (Latest)
  1. Open the SSH Console from the left pane, and navigate to the TMS helm chart directory.


    cd tms-helm

  1. If you would like to understand the configuration options we are setting for this TMS deployment, you can examine the values.yaml file using cat or your preferred text editor such as vim or nano. If you don’t want to examine the configuration options, you may proceed to the next step.

    Below is just an overview of the settings you would have to specify by editing the values.yaml. In this lab, these have already been set for you, so you may proceed to the next step to install TMS when you are ready.

    • images.secrets[0] should be set to a Kubernetes secret containing your NGC API Key, in our case, that is "ngc-container-pull"


      The ngc-container-pull Kubernetes secret has already been created for you. Outside of this lab, you would have to create a secret using your NGC API key.


    • server.modelRepositories.volumes[0].repositoryName can be set to anything. This is the name TMS will assign to our model repository, here we are using volume-models.

    • server.modelRepositories.volumes[0].volumeClaimName should be set to the name of the Kubernetes persistent volume claim storing our models. The one we created earlier is called hostpath-pvc.


    • The server.autoscaling.replicas section should be set according to your cluster set up. Since we are using two GPU instances, we want to deploy a maximum of 2 Triton instances. Here is how ours is set up:


    • The autoscaling metrics settings specify which metrics you will be able to auto scale on. In this lab, we will only auto scale based on query queue time:


    • The Prometheus values server.autoscaling.prometheus.podMonitorLabels and server.autoscaling.prometheus.ruleLabels should be set to our Prometheus deployment


      Outside of this lab, you will have to determine the values to set for server.autoscaling.prometheus.podMonitorLabels and server.autoscaling.prometheus.ruleLabels, which you can do by running the following commands to examine your Prometheus deployment:


      kubectl get prometheus -n monitoring prometheus-kube-prometheus-prometheus -o yaml

      • Scan the output YAML for the property spec.podMonitorSelector.matchLabels, this is the value to set for server.autoscaling.prometheus.podMonitorLabels in values.yaml

      • Scan the output YAML for the property spec.ruleSelector.matchLabels, this is the value to set for server.autoscaling.prometheus.ruleLabels in values.yaml


  2. Deploy TMS using the values file we just edited and the helm chart TGZ file, which is located in the tms-helm directory.


    helm install tms -f values.yaml <TMS TGZ file>

  3. Confirm that the TMS pod is running


    $ kubectl get pods NAME READY STATUS RESTARTS AGE tms-7b986fc97-pv28g 2/2 Running 0 2m

Throughout this lab, we are going to be utilizing some inferencing and performance analysis tools that are included in the NVIDIA Triton SDK container. We are going to need to access these tools throughout the lab, so in preparation, we will deploy a triton-client pod.

First, return to our home direcory


cd ~

Next, create the manifest for our triton-client pod


cat << 'EOF' >> triton-client.yaml apiVersion: v1 kind: Pod metadata: name: triton-client namespace: default spec: containers: - args: - sleep infinity command: - bash - -c image: name: triton-client imagePullSecrets: - name: ngc-container-pull EOF

Deploy the pod to the cluster


kubectl create -f triton-client.yaml

Previous Step #2: Install Prometheus
Next Step #4: Creating Leases and Performing Inference
© Copyright 2022-2024, NVIDIA. Last updated on May 2, 2024.