Triton Management Service Deployment Guide (1.1.0)

Triton Management Service (TMS) is an application that helps you manage and orchestrate a fleet of Triton Inference Servers in a Kubernetes cluster. Key features of TMS include:

  • Easily creating and deleting Triton instances on-demand.

  • Securely loading models from remote storage locations.

  • Autoscaling Triton instances to meet dynamic workloads while minimizing resource utilization during times of low demand.

  • Loading models into pooled Triton instances, allowing you to use less resources while maintaining quality-of-service metrics.

One of the main organizational units in TMS is a lease. A lease is a description of a model, or ensemble models, along with a description of the hardware needed to run them (for example, number of GPUs, amount of memory). To learn more about leases, see the description of leases.

To get started with TMS, see the deployment guide to learn how to install and configure TMS. After you have TMS running, see the basic operations tutorial for how to create Triton instances and load models into them. If you don’t already have a Kubernetes cluster but want to try out TMS, see the minikube quickstart guide for an example of how to set up a test environment and install TMS in it.

Contents

Getting Started

Key Concepts

Reference

Release Notes

© Copyright 2023, NVIDIA. Last updated on Dec 11, 2023.