Leases

The Lease Service exposes a number of RPC end-points: Acquire, Release, Renew, and Status. The Triton Allowlist Service exposes the following RPC end-points: Append, List and Remove. Each of the end-point accept a single structured request and respond with a structured response.

Note

The gRPC protocol supports streaming requests and/or responses. This means that one or both sides of the interaction can stream data to the other. Functionally, this allows the server to being sending response data before the client has finished sending request data.

The expected order of operations with regards to the Lease Service are as follows:

Lease/Acquire to create a new lease with a specified set of AI models.

Assuming the request is successful, the response will include a unique identifier and an expiration date for the new lease.

All models in a lease acquire request are considered bundled. They cannot be loaded or unloaded separately. Additionally, all models in a lease will be loaded into the same instance of Triton Inference Server. If it is impossible to do so (e.g. insufficient memory), then the lease will be marked as invalid and any models successfully loaded models will be unloaded after the first model load failure is detected. TMS does not support partially loaded leases.

A lease can created as part of a Triton Pool or using a bespoke Triton instance. This is determined by the use of the triton_options value in the gRPC API.

This RPC begins streaming a response once the request has been received. The server will send a series of model status updates to the caller to show continued progress as the lease’s models are deployed. Model status updates will be sparse (not include status of every model every time).

The final response from the server will include status for every model in the request as well as data for the lease itself.
Lease/Renew to extend the lease’s duration. Once a lease is renewed it assigned a new expiration date.

Once a lease has expired, it is no longer valid and any associated models will be unloaded and become unavailable. Any resources consumed by the lease are returned to the hosting Triton Inference Server to be used by future leases. In the case that a Triton Inference Server instance becomes unnecessary, it will be deleted and its resources returned to the cluster.
Lease/Status to get the current status of a specific lease.

Requesting the status of an expired or released lease is a valid operation.
Lease/Release to terminate a lease before its expiration is reached.

Once a lease has been released, it is no longer valid and any associated models will be unloaded and become unavailable. Any resources consumed by the lease are returned to the hosting Triton Inference Server to be used by future leases. In the case that a Triton Inference Server instance becomes unnecessary, it will be deleted and its resources returned to the cluster.