Triton Management Service (TMS) is a Kubernetes microservice, and expects to be deployed into a Kubernetes managed cluster. To more easily facilitate its deployment into your Kubernetes cluster, TMS provides a Helm chart designed to simplify the deployment, or installation, process.
In order to deploy TMS the helm
tool (download)
and the TMS Helm chart (download) must be installed on the local system.
Additionally, the local user will require cluster administrator privileges.
Obtaining TMS Helm Chart
The TMS Helm chart can be downloaded from NVIDIA NGC. To do so, use the following command:
helm fetch https://helm.ngc.nvidia.com/nvaie/charts/triton-management-service-1.0.0.tgz --username='$oauthtoken' --password=<YOUR API KEY>
Extracting the values.yaml
file from the downloaded chart’s TAR file is easy. To do so, use the following command:
helm show values triton-management-service-1.0.0.tgz > values.yaml
This will create a values.yaml
file the current directory, which can modified to meet deployment needs.
See Helm Chart Values for a listing of the configurable values.
Kubernetes Secrets
Setting up secrets in Kubernetes for TMS is fairly straightforward, and we’ll cover the basics here.
Note that creation of Kubernetes secrets requires sufficient cluster privileges, and therefore might, if you lack sufficient privileges, require a cluster administrator to create them on your behalf.
Container Pull Secrets
TMS Helm chart will include any secrets listed under values.yaml#images.secrets
. The default values.yaml
file contains an example secret named “ngc-container-pull”.
To create an image-pull secret, use:
kubectl create secret docker-registry <secret-name> --docker-server=<docker-server-urn> --docker-username=<username> --docker-password=<password>
Then which ever value was chosen for <secret-name>
add to the values.yaml#images.secrets
list.
Configuring Model Repositories
To connect to a model repository, see the model repository page.
Configuring Autoscaling
To enable and configure autoscaling, see the separate autoscaling configuration guide.
Configuring Triton Containers
TMS allows the TMS administrator to configure some aspect of the containers that will be created for Triton instances.
These can be configured via the top-level triton
object in values.yaml
.
Currently, only resource constraints are specified in this section. These are all listed under resources
. TMS admins
may specify both the requestDefault
resources that Triton containers will get, as well as the requestMaximum
values that users
may request on a per-lease basis.
A sample configuration is shown below.
triton:
resources:
requestDefault:
cpu: 2
gpu: 1
memory: 4Gi
sharedMemory: 256Mi
requestMaximum:
cpu: 4
gpu: 2
memory: 8Gi
sharedMemory: 512Mi
The fields in both the default
and max
sections are defined as follows. For each value, the max
value must be at least
as large as the default
value.
cpu
: The number of whole or factional CPUs assigned to Triton. Can be specified either a number of cores (e.g.4
), or a number followed bym
, which represents milli-CPUs (e.g.1500m
).Minimum value:
1
(or1000m
).Default:
2
gpu
: The number of whole GPUs assigned to Triton. Must be a whole number – GPUs cannot be fractionally assigned.Minimum value:
0
Default:
1
systemMemory
: The amount of system memory, as a number plus units (e.g.4Gi
).Units allowed:
Ki
,Mi
,Gi
,Ti
Minimum value:
256Mi
, and at least128Mi
more thansharedMemory
.Default:
4Gi
sharedMemory
: The amount of shared memory, as number plus units (same units asmemory
).Minimum value:
32Mi
Default:
256Mi
Note: Some backends (e.g. PyTorch) allow the user to use shared memory to allocate tensors.
If you plan on using this, make sure you set a higher value.
Configuring Persisted Database
To enable and configure TMS to persist database contents, a volume claim bounded to a sizeable kuberenetes persistent volume must be provided
to values.yaml#server.databaseStorage.volumeClaimName
.
In the case of server failure or restart, TMS will be able to reload the contents of the database from this volume.
It should be noted that server performance can be affected by slow or unreliable storage solutions used for the persisted volume.
Assuming you’ve followed the steps above, and downloaded the TMS Helm chart, exported its values.yaml
file, and modified it as necessary, use the following command to install (aka deploy) TMS:
helm install <name-of-tms-installation> -f values.yaml triton-management-service-1.0.0.tgz
The Kubernetes cluster where TMS is installed should be properly secured according to best practices and the security posture of your organization.
Any additional, optional services connected to TMS such as Prometheus and Prometheus adapter should also be secured. We recommend the cluster administrator properly secure access to any S3 or other external model repositories which TMS will utilize. We reccomend leverating encryption in transit and at rest, scoping access to cluster resources following the principle of least privilege, as well as configuring audit logging for your cluster.
TMS default configuration does not allow connections from outside of the Kubernetes cluster. The user assumes responsibility for securing any external connections when changing the default configuration values.