TMS Basics Tutorial

Deployment Guide (1.1.0)

This tutorial walks you through the basics of creating a lease, running inference against it, and releasing the lease. This guide assumes the following:

  • You are familiar with the basics of leases.

  • You already have a TMS cluster up and running, with the appropriate secrets configured to get containers from NGC. If you do not, see the deployment guide.

  • You have tmsctl, the TMS CLI tool, already installed. This can be downloaded from NGC.

  • You have a model repository configured that is hosting your models.

  • You can run kubectl commands to communicate with your cluster. kubectl is required to run kubectl port-forward commands, which open ports to the Triton server where your models are hosted.

    You can skip the kubectl port-forward commands if you run the tutorial in a pod inside the same cluster that is running Kubernetes. This option requires modifications to refer to the correct service rather than localhost.

You must validate that you can communicate with the TMS server. This tutorial assumes that you are running the steps outlined here outside the Kubernetes cluster hosting TMS and need to open a port to connect to it.

To do this, use the kubectl port-forward.

  1. Locate the TMS service. By default, it should be named tms and be running on port 30345. The rest of the tutorials assumes this is the case for your installation. If not, you’ll have to modify some commands.

  2. To check where TMS is running, run kubectl get svc:

Copy
Copied!
            

% kubectl get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE tms ClusterIP 10.98.225.223 <none> 30345/TCP 7s

With the name of the service and port number on which it is listening, you can run a kubectl port-forward command to communicate with the service from your local machine. You must leave this command running, so open a separate terminal from the one one where you run the other commands.

  1. Open a separate terminal window and run the following:

Copy
Copied!
            

% kubectl port-forward svc/tms 30345:30345 Forwarding from 127.0.0.1:30345 -> 9345 Forwarding from [::1]:30345 -> 9345

  1. To validate communication with TMS, run a tmsctl lease list command:

Copy
Copied!
            

% tmsctl lease list -t http://localhost:30345 Lease State Expires Triton Count: 0

  1. To avoid having to specify the address of the TMS using the -t flag each time, run the tmsctl target add command to set a default target.

Copy
Copied!
            

% tmsctl target add --set-default test-target http://localhost:30345

Note

The rest of this tutorial assumes you have set this and does not specify the -t option on each command.

  1. (optional) To inspect your set of named targets and see the default, run tmsctl target list:

Copy
Copied!
            

% tmsctl target list * test-target -> http://localhost:30345

  1. Run tmsctl without specifying the -t option:

Copy
Copied!
            

% tmsctl lease list Lease State Expires Triton Count: 0

Creating a lease requires two things:

  • The URI from which to fetch the model.

  • The name of the model for use by Triton.

Your TMS installation must be configured with model repositories, such as one hosted in an S3 bucket, or in a Kubernetes persistent volume. If this has not already been done, see the model repository documentation. You can also pull a model from an HTTP server.

The following steps assume that you have set:

  • $MODEL_URI to the URI from which to fetch the model. For example, if the model is in a repository named my_repo, and is in a folder named my_model, you would set MODEL_URI=model://my_repo/my_model. If instead your model is hosted on an HTTP server, you would set MODEL_URI=https://www.example.com/my_model.zip.

  • $MODEL_NAME to the name that your model has in Triton. This is used as part of the path for inference requests.

To create a lease:

  1. Run the tmsctl lease create command.

  2. Specify a duration of at least 30 minutes to have enough time to run the tutorial, but not so much that if you forget to release the lease you hold the resources for too long. For example, --duration 30m, --duration 1h.

Copy
Copied!
            

% tmsctl lease create -t $TMS_ADDRESS -m name=$MODEL_NAME,uri=$MODEL_URI --duration 30m Lease 9fd209b1f45f424c914ebc2967a3b591 State: Valid Expires: 2023-10-12T23:27:19Z Triton: triton-de245ce9.yournamespace.svc.cluster.local <nvcr.io/nvidia/tritonserver:23.09-py3> Models: Name Url Status $MODEL_NAME $MODEL_URI Ready

  1. Validate that you see output similar to the above.

Note

Note: A few things to note of importance:

  • The lease ID is listed first. This is how you refer to the lease for operations like getting its status, renewing it, or releasing it. In the example above, this is 9fd209b1f45f424c914ebc2967a3b591. The rest of this tutorial refers to this as $LEASE_ID.

  • The line starting with Triton: gives the URL of the Triton server hosting your lease (with all its models). Above, it is triton-de245ce9.yournamespace.svc.cluster.local. The first component of this (triton-de245ce9), is referred to as $TRITON_SERVER in the rest of the tutorial.

With your lease ready, you can run inference against any of its models (just one in this example). The details of the parameters vary widely depending on your particular model. The following examples show the overall idea, but you must to adjust it for your model. Typically, you would have an application that is making inference requests rather than doing it manually. This demonstrates how to go from creating a lease to running inference.

  1. The Kubernetes services associated with the leases are only available inside the cluster. To reach it externally, you need to run the kubectl port-forward command:

Copy
Copied!
            

% kubectl port-forward svc/$TRITON_SERVER 8000:8000

  1. Run inference against the server, replacing the parameters specific to your model:

Copy
Copied!
            

% curl -X POST -H "Content-Type: application/json" http://localhost:8000/v2/models/$MODEL_NAME/infer \ --data '{ "inputs": [ {"name": "INPUT", "shape": [1], "datatype": "FP32", "data": [10] }]}'

  1. Validate that you see output similar to the following:

Copy
Copied!
            

{"model_name":"$MODEL_NAME","model_version":"1","outputs":[{"name":"OUTPUT","datatype":"FP32","shape":[1],"data":[10.0]}]}

After you have a lease, you can perform many different operations on it. The following sections list some of the basic operations.

List Leases

You can always run tmsctl lease list to see the state of leases in your TMS installation.

Copy
Copied!
            

% tmsctl lease list Lease State Expires Triton 9fd209b1f45f424c914ebc2967a3b591 Valid 2023-10-12T23:39:28 triton-de245ce9.epauli.svc.cluster.local Count: 1

Lease Status

To get detailed information about a lease, run tmsctl lease status.

Copy
Copied!
            

% tmsctl lease status $LEASE_ID Lease 9fd209b1f45f424c914ebc2967a3b591 State: Valid Expires: 2023-10-12T23:39:28Z Triton: triton-de245ce9.yournamespace.svc.cluster.local <nvcr.io/nvidia/tritonserver:23.09-py3> Readied: 2023-10-12T23:17:19Z Models: Name Url Status $MODEL_NAME $MODEL_URI Ready Events: Type Source Age Message Status Triton Manager 0s Creating Triton deployment. Status Triton Manager 4s Triton deployment ready. Status Triton Sidecar 5s identity cached; model size: 1930. Status Triton Sidecar 7s identity is ready. Status Lease Provider 9s Lease ready. Status Lease Service 8m Lease renewed by request. Status Lease Service 12m Lease renewed by request.

Renew

Your TMS installation is configured with a default duration for leases. After that time elapses, TMS automatically releases the lease. If you still need it, you can run tmsctl lease renew to renew the lease. For example:

Copy
Copied!
            

% tmsctl lease renew $LEASE_ID Renewed lease 9fd209b1f45f424c914ebc2967a3b591 [Valid] Expires: 2023-10-12T23:34:44

Create a Custom Lease Name

You can create additional names for a lease, which you can use to run inference. You can use this feature to provide more meaningful names to your Triton instances, and move the name from one lease to another so that you can update your models without changing the URL that your application uses.

To be able to use the name myname to refer to your lease:

  1. Run the following:

Copy
Copied!
            

% tmsctl lease name create myname $LEASE_ID Lease name "myname". Target lease: $LEASE_ID

  1. Run inference using the myname hostname. You can still use the name of the Triton server as well.

  2. To test the new name, kill your previous kubectl port-forward command, and run a new one. This time, use myname instead of the name previously provided by TMS. For example:

Copy
Copied!
            

% kubectl port-forward svc/myname 8000:8000 Forwarding from 127.0.0.1:8000 -> 8000 Forwarding from [::1]:8000 -> 8000

Copy
Copied!
            

% curl -X POST -H "Content-Type: application/json" http://localhost:8000/v2/models/$MODEL_NAME/infer \ --data '{ "inputs": [ {"name": "INPUT", "shape": [1], "datatype": "FP32", "data": [10] }]}'

When you are done using a lease, you can release all resources associated with it by running tmsctl lease release.

Copy
Copied!
            

% tmsctl lease release $LEASE_ID Lease $LEASE_ID State: Released

© Copyright 2023, NVIDIA. Last updated on Dec 11, 2023.