We are now ready to use TMS to create a leases. You can think of a lease as TMS’s way of telling Triton to serve specific models for a specific duration of time. The benefit of this is that TMS will serve your models for however long you specify, and then automatically unload them from your GPUs. TMS also recognizes when the models are actively being used, and can automatically renew the model’s lifetime until it is no longer in use. We will be creating and managing leases using tmsctl
, which is TMS’s command line tool for interacting with TMS. You can read the tmsctl
documentation here.
You can also use the TMS API to create leases and interact with TMS, but in this lab we will only use tmsctl
Navigate to the
tmsctl
directory through the SSH console.cd ~/tmsctl
Unzip the
tmsctl
executableunzip tms_ctl_v1.0.0/tmsctl.zip
NoteOutside of this lab, you would have to pull
tmsctl_ctl_v1.0.0
from NGC.
Since our SSH console is outside of the cluster, we need to start a port-forwarding process to the TMS service on the cluster. You can see the TMS service on the cluster, which uses port 30345 by default.
$ kubectl get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 2d tms ClusterIP 10.104.242.223 <none> 30345/TCP 5m
Start the port-forward process in the background.
kubectl port-forward svc/tms 30345:30345 &
NoteIf you leave and return to the SSH console, this background process might be killed. If you are seeing errors while trying to use
tmsctl
it is likely because this port forwarding process isn’t running.
Finally, we have to set
tmsctl
’s target, which is our local cluster running TMS../tmsctl target add --force --set-default tms http://127.0.0.1:30345
We are now ready to use tmsctl
to interact with TMS.
We’ll start by creating a basic lease for image recognition. There are many lease creation options supported by tmsctl
, some of which we will take advantage of to deploy this model:
--duration
tells TMS to how long to serve the model for--auto-renew
tells TMS to automatically renew the lease (serve the model for longer) if it has received an inference request recently--renewal-duration
tells TMS how long to renew the lease for when automatically renewing--auto-renew-activity-window
tells TMS how recently the model needs to have received a request in order for the lease to auto-renew before expiring
Create the lease using
tmsctl
. This may take a few minutes, so be patient../tmsctl lease create -m "name=densenet_onnx,uri=model://volume-models/image-rec/densenet_onnx/" --duration 30m --auto-renew --renewal-duration 10m --auto-renew-activity-window 5m
Be sure to make note of the Triton URL output from the lease creation command. The URL will look something like
triton-3eaa065c.default.svc.cluster.local
and appear here:Creating a lease will deploy a Triton pod to serve our model, which you can see deployed to our cluster
$ kubectl get pods NAME READY STATUS RESTARTS AGE tms-5444f8df5b-5lkws 2/2 Running 0 20m triton-03923a23-bd6f8cc86-5kz24 2/2 Running 0 2m triton-client 1/1 Running 0 20m
You can see the lease we just created, as well as the associated Triton URL, by running the following command
./tmsctl lease list
To remove leases with
tmsctl
, use the./tmsctl lease release
command. For now, you can leave this lease active since we are going to send an inference request to it.Once the lease is created, the model is being served and you should see the new Triton service on the cluster.
$ kubectl get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 168m tms ClusterIP 10.106.62.75 <none> 30345/TCP 6m54s triton-8dd12f38 ClusterIP 10.107.235.96 <none> 9345/TCP,8001/TCP,8000/TCP,8002/TCP 2m51s
Earlier we deployed a Triton SDK client that contains tools to help us interact with our models. Start a shell in the
triton-client
podkubectl exec -it triton-client -- bash
From the Triton SDK pod, send an inference request to the model containing this image:
NoteBe sure to supply the Triton URL from lease creation in the inference command, and do not forget to add
:8001
after the URL in this command to specify the GRPC port./workspace/install/bin/image_client -m densenet_onnx -c 3 -s INCEPTION -i grpc /workspace/images/mug.jpg -u <triton URL from lease deployment>:8001
NoteYou can also send inference requests from outside of the cluster by port-forwarding to the Triton service, or by configuring a secure external URL for your cluster/service.
You can see the output from the model:
Request 0, batch size 1 Image '/workspace/images/mug.jpg': 15.349565 (504) = COFFEE MUG 13.227464 (968) = CUP 10.424888 (505) = COFFEEPOT
Once you are done sending the inference request, you can exit out of the Triton client pod:
exit
Once out of the
triton-client
pod, you can unload the model from the GPU. In TMS terms, you release the lease./tmsctl lease release <lease ID>
The lease ID was output from the lease creation command, and can also be retrieved through
./tmsctl lease list
.
We’ve seen how to load models in/out of the GPU through the TMS lease system, which can automatically unload your models from your system, freeing up GPU space and reducing idle resource usage. Now let’s dive into more of the features and special capabilities of TMS.