Triton Management Service Deployment Guide (1.3.0)

Triton Management Service (TMS) is an application that helps users manage and orchestrate a fleet of Triton Inference Servers in a Kubernetes cluster. Key features of TMS include:

  • Easily creating and deleting Triton instances on-demand.

  • Securely loading models from remote storage locations.

  • Autoscaling Triton instances to meet dynamic workloads while minimizing resource utilization during times of low demand.

  • Loading models into pooled Triton instances, allowing you to use less resources while maintaining quality-of-service metrics.

One of the main organizational units in TMS is the concept of a lease. A lease is a description of a model (or ensemble of models), along with a description of the hardware needed to run them (e.g. number of GPUs, amount of memory). To learn more about leases, see the description of leases.

To get started with TMS, first see the deployment guide to learn how to install and configure TMS. Once you have TMS running, see the basic operations tutorial to learn how to create Triton instances and load models into them. If you don’t already have a Kubernetes cluster but want to try out TMS, see the minikube quickstart guide for an example of how to set up a test environment and install TMS in it.

Contents

Getting Started

Key Concepts

Reference

Release Notes

© Copyright 2023, NVIDIA. Last updated on Dec 19, 2023.