Step #4: Creating Leases and Performing Inference

We are now ready to use TMS to create a leases. You can think of a lease as TMS’s way of telling Triton to serve specific models for a specific duration of time. The benefit of this is that TMS will serve your models for however long you specify, and then automatically unload them from your GPUs. TMS also recognizes when the models are actively being used, and can automatically renew the model’s lifetime until it is no longer in use. We will be creating and managing leases using tmsctl, which is TMS’s command line tool for interacting with TMS. You can read the tmsctl documentation here.

Note

You can also use the TMS API to create leases and interact with TMS, but in this lab we will only use tmsctl

  1. Navigate to the tmsctl directory through the SSH console.

    Copy
    Copied!
                

    cd ~/tmsctl


  1. Unzip the tmsctl executable

    Copy
    Copied!
                

    unzip tms_ctl_v1.0.0/tmsctl.zip

    Note

    Outside of this lab, you would have to pull tmsctl_ctl_v1.0.0 from NGC.


  2. Since our SSH console is outside of the cluster, we need to start a port-forwarding process to the TMS service on the cluster. You can see the TMS service on the cluster, which uses port 30345 by default.

    Copy
    Copied!
                

    $ kubectl get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 2d tms ClusterIP 10.104.242.223 <none> 30345/TCP 5m

    Start the port-forward process in the background.

    Copy
    Copied!
                

    kubectl port-forward svc/tms 30345:30345 &

    Note

    If you leave and return to the SSH console, this background process might be killed. If you are seeing errors while trying to use tmsctl it is likely because this port forwarding process isn’t running.


  3. Finally, we have to set tmsctl’s target, which is our local cluster running TMS.

    Copy
    Copied!
                

    ./tmsctl target add --force --set-default tms http://127.0.0.1:30345


We are now ready to use tmsctl to interact with TMS.

We’ll start by creating a basic lease for image recognition. There are many lease creation options supported by tmsctl, some of which we will take advantage of to deploy this model:

  • --duration tells TMS to how long to serve the model for

  • --auto-renew tells TMS to automatically renew the lease (serve the model for longer) if it has received an inference request recently

  • --renewal-duration tells TMS how long to renew the lease for when automatically renewing

  • --auto-renew-activity-window tells TMS how recently the model needs to have received a request in order for the lease to auto-renew before expiring

  1. Create the lease using tmsctl. This may take a few minutes, so be patient.

    Copy
    Copied!
                

    ./tmsctl lease create -m "name=densenet_onnx,uri=model://volume-models/image-rec/densenet_onnx/" --duration 30m --auto-renew --renewal-duration 10m --auto-renew-activity-window 5m

    Be sure to make note of the Triton URL output from the lease creation command. The URL will look something like triton-3eaa065c.default.svc.cluster.local and appear here:

    lease-url.png

    Creating a lease will deploy a Triton pod to serve our model, which you can see deployed to our cluster

    Copy
    Copied!
                

    $ kubectl get pods NAME READY STATUS RESTARTS AGE tms-5444f8df5b-5lkws 2/2 Running 0 20m triton-03923a23-bd6f8cc86-5kz24 2/2 Running 0 2m triton-client 1/1 Running 0 20m


  2. You can see the lease we just created, as well as the associated Triton URL, by running the following command

    Copy
    Copied!
                

    ./tmsctl lease list

    To remove leases with tmsctl, use the ./tmsctl lease release command. For now, you can leave this lease active since we are going to send an inference request to it.

  3. Once the lease is created, the model is being served and you should see the new Triton service on the cluster.

    Copy
    Copied!
                

    $ kubectl get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 168m tms ClusterIP 10.106.62.75 <none> 30345/TCP 6m54s triton-8dd12f38 ClusterIP 10.107.235.96 <none> 9345/TCP,8001/TCP,8000/TCP,8002/TCP 2m51s


  4. Earlier we deployed a Triton SDK client that contains tools to help us interact with our models. Start a shell in the triton-client pod

    Copy
    Copied!
                

    kubectl exec -it triton-client -- bash


  5. From the Triton SDK pod, send an inference request to the model containing this image:

    Note

    Be sure to supply the Triton URL from lease creation in the inference command, and do not forget to add :8001 after the URL in this command to specify the GRPC port.

    Copy
    Copied!
                

    /workspace/install/bin/image_client -m densenet_onnx -c 3 -s INCEPTION -i grpc /workspace/images/mug.jpg -u <triton URL from lease deployment>:8001

    mug.jpg
    Note

    You can also send inference requests from outside of the cluster by port-forwarding to the Triton service, or by configuring a secure external URL for your cluster/service.


  6. You can see the output from the model:

    Copy
    Copied!
                

    Request 0, batch size 1 Image '/workspace/images/mug.jpg': 15.349565 (504) = COFFEE MUG 13.227464 (968) = CUP 10.424888 (505) = COFFEEPOT


  7. Once you are done sending the inference request, you can exit out of the Triton client pod:

    Copy
    Copied!
                

    exit


  8. Once out of the triton-client pod, you can unload the model from the GPU. In TMS terms, you release the lease

    Copy
    Copied!
                

    ./tmsctl lease release <lease ID>

    The lease ID was output from the lease creation command, and can also be retrieved through ./tmsctl lease list.

We’ve seen how to load models in/out of the GPU through the TMS lease system, which can automatically unload your models from your system, freeing up GPU space and reducing idle resource usage. Now let’s dive into more of the features and special capabilities of TMS.

© Copyright 2022-2023, NVIDIA. Last updated on Sep 29, 2023.