This tutorial walks you through the basics of creating a lease, running inference against it, and releasing the lease. This guide assumes the following:
You are familiar with the basics of leases.
You already have a TMS cluster up and running, with the appropriate secrets configured to get containers from NGC. If you do not, see the deployment guide.
You have
tmsctl
, the TMS CLI tool, already installed. This can be downloaded from NGC.You have a model repository configured that is hosting your models.
You can run
kubectl
commands to communicate with your cluster.kubectl
is required to runkubectl port-forward
commands, which open ports to the Triton server where your models are hosted.You can skip the
kubectl port-forward
commands if you run the tutorial in a pod inside the same cluster that is running Kubernetes. This option requires modifications to refer to the correct service rather thanlocalhost
.
You must validate that you can communicate with the TMS server. This tutorial assumes that you are running the steps outlined here outside the Kubernetes cluster hosting TMS and need to open a port to connect to it.
To do this, use the
kubectl port-forward
.
Locate the TMS service. By default, it should be named
tms
and be running on port 30345. The rest of the tutorials assumes this is the case for your installation. If not, you’ll have to modify some commands.To check where TMS is running, run
kubectl get svc
:
% kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
tms ClusterIP 10.98.225.223 <none> 30345/TCP 7s
With the name of the service and port number on which it is listening, you can run a kubectl port-forward
command to communicate with the service from your local machine. You must leave this command running,
so open a separate terminal from the one one where you run the other commands.
Open a separate terminal window and run the following:
% kubectl port-forward svc/tms 30345:30345
Forwarding from 127.0.0.1:30345 -> 9345
Forwarding from [::1]:30345 -> 9345
To validate communication with TMS, run a
tmsctl lease list
command:
% tmsctl lease list -t http://localhost:30345
Lease State Expires Triton
Count: 0
To avoid having to specify the address of the TMS using the
-t
flag each time, run thetmsctl target add
command to set a default target.
% tmsctl target add --set-default test-target http://localhost:30345
The rest of this tutorial assumes you have set this and does not specify the -t
option on each command.
(optional) To inspect your set of named targets and see the default, run
tmsctl target list
:
% tmsctl target list
* test-target -> http://localhost:30345
Run
tmsctl
without specifying the-t
option:
% tmsctl lease list
Lease State Expires Triton
Count: 0
Creating a lease requires two things:
The URI from which to fetch the model.
The name of the model for use by Triton.
Your TMS installation must be configured with model repositories, such as one hosted in an S3 bucket, or in a Kubernetes persistent volume. If this has not already been done, see the model repository documentation. You can also pull a model from an HTTP server.
The following steps assume that you have set:
$MODEL_URI
to the URI from which to fetch the model. For example, if the model is in a repository namedmy_repo
, and is in a folder namedmy_model
, you would setMODEL_URI=model://my_repo/my_model
. If instead your model is hosted on an HTTP server, you would setMODEL_URI=https://www.example.com/my_model.zip
.$MODEL_NAME
to the name that your model has in Triton. This is used as part of the path for inference requests.
To create a lease:
Run the
tmsctl lease create
command.Specify a duration of at least 30 minutes to have enough time to run the tutorial, but not so much that if you forget to release the lease you hold the resources for too long. For example,
--duration 30m
,--duration 1h
.
% tmsctl lease create -t $TMS_ADDRESS -m name=$MODEL_NAME,uri=$MODEL_URI --duration 30m
Lease 9fd209b1f45f424c914ebc2967a3b591
State: Valid
Expires: 2023-10-12T23:27:19Z
Triton: triton-de245ce9.yournamespace.svc.cluster.local
<nvcr.io/nvidia/tritonserver:23.09-py3>
Models:
Name Url Status
$MODEL_NAME $MODEL_URI Ready
Validate that you see output similar to the above.
Note: A few things to note of importance:
The lease ID is listed first. This is how you refer to the lease for operations like getting its status, renewing it, or releasing it. In the example above, this is
9fd209b1f45f424c914ebc2967a3b591
. The rest of this tutorial refers to this as$LEASE_ID
.The line starting with
Triton:
gives the URL of the Triton server hosting your lease (with all its models). Above, it istriton-de245ce9.yournamespace.svc.cluster.local
. The first component of this (triton-de245ce9
), is referred to as$TRITON_SERVER
in the rest of the tutorial.
With your lease ready, you can run inference against any of its models (just one in this example). The details of the parameters vary widely depending on your particular model. The following examples show the overall idea, but you must to adjust it for your model. Typically, you would have an application that is making inference requests rather than doing it manually. This demonstrates how to go from creating a lease to running inference.
The Kubernetes services associated with the leases are only available inside the cluster. To reach it externally, you need to run the
kubectl port-forward
command:
% kubectl port-forward svc/$TRITON_SERVER 8000:8000
Run inference against the server, replacing the parameters specific to your model:
% curl -X POST -H "Content-Type: application/json" http://localhost:8000/v2/models/$MODEL_NAME/infer \
--data '{ "inputs": [ {"name": "INPUT", "shape": [1], "datatype": "FP32", "data": [10] }]}'
Validate that you see output similar to the following:
{"model_name":"$MODEL_NAME","model_version":"1","outputs":[{"name":"OUTPUT","datatype":"FP32","shape":[1],"data":[10.0]}]}
After you have a lease, you can perform many different operations on it. The following sections list some of the basic operations.
List Leases
You can always run tmsctl lease list
to see the state of leases in your TMS installation.
% tmsctl lease list
Lease State Expires Triton
9fd209b1f45f424c914ebc2967a3b591 Valid 2023-10-12T23:39:28 triton-de245ce9.epauli.svc.cluster.local
Count: 1
Lease Status
To get detailed information about a lease, run tmsctl lease status
.
% tmsctl lease status $LEASE_ID
Lease 9fd209b1f45f424c914ebc2967a3b591
State: Valid
Expires: 2023-10-12T23:39:28Z
Triton: triton-de245ce9.yournamespace.svc.cluster.local
<nvcr.io/nvidia/tritonserver:23.09-py3>
Readied: 2023-10-12T23:17:19Z
Models:
Name Url Status
$MODEL_NAME $MODEL_URI Ready
Events:
Type Source Age Message
Status Triton Manager 0s Creating Triton deployment.
Status Triton Manager 4s Triton deployment ready.
Status Triton Sidecar 5s identity cached; model size: 1930.
Status Triton Sidecar 7s identity is ready.
Status Lease Provider 9s Lease ready.
Status Lease Service 8m Lease renewed by request.
Status Lease Service 12m Lease renewed by request.
Renew
Your TMS installation is configured with a default duration for leases. After that time elapses, TMS
automatically releases the lease. If you still need it, you can run tmsctl lease renew
to renew the lease. For example:
% tmsctl lease renew $LEASE_ID
Renewed lease 9fd209b1f45f424c914ebc2967a3b591 [Valid]
Expires: 2023-10-12T23:34:44
Create a Custom Lease Name
You can create additional names for a lease, which you can use to run inference. You can use this feature to provide more meaningful names to your Triton instances, and move the name from one lease to another so that you can update your models without changing the URL that your application uses.
To be able to use the name myname
to refer to your lease:
Run the following:
% tmsctl lease name create myname $LEASE_ID
Lease name "myname".
Target lease: $LEASE_ID
Run inference using the
myname
hostname. You can still use the name of the Triton server as well.To test the new name, kill your previous
kubectl port-forward
command, and run a new one. This time, usemyname
instead of the name previously provided by TMS. For example:
% kubectl port-forward svc/myname 8000:8000
Forwarding from 127.0.0.1:8000 -> 8000
Forwarding from [::1]:8000 -> 8000
% curl -X POST -H "Content-Type: application/json" http://localhost:8000/v2/models/$MODEL_NAME/infer \
--data '{ "inputs": [ {"name": "INPUT", "shape": [1], "datatype": "FP32", "data": [10] }]}'
When you are done using a lease, you can release all resources associated with it by running tmsctl lease release
.
% tmsctl lease release $LEASE_ID
Lease $LEASE_ID
State: Released