Kubernetes¶
Included in the NGC Helm Repository is a chart designed to automate for push-button deployment to a Kubernetes cluster.
The Jarvis AI Services Helm Chart can be used to deploy ASR, NLP, and TTS services automatically. The Helm chart performs a number of functions:
Pulls Docker images from NGC for Jarvis Speech Server, and utility containers for downloading and converting models.
Downloads requested model artifacts from NGC as configured in the
values.yaml
file.Generates the Triton Inference Server model repository.
Starts Jarvis Speech as configured in a Kubernetes pod.
Exposes the Jarvis Speech Server as configured services.
Example pre-trained models are released with Jarvis for each of the services. The Helm chart comes pre-configured for downloading and deploying all of these models.
Note: The Helm chart configuration can be modified for your use
case by modifying the values.yaml
file. In this file, you can change
the settings related to which models to deploy, where to store them, and
how to expose the services.
Validate Kubernetes with NVIDIA GPU support.
Kubernetes with GPU is well supported by NVIDIA. The instructions at Install Kubernetes should be consulted to ensure your environment is properly setup.
If using an NVIDIA A100 GPU with Multi-Instance GPU (MIG) support, refer to MIG Support in Kubernetes.
Download and modify the Helm chart for your use. Fetch it from NGC:
export NGC_API_KEY=<your_api_key> helm fetch https://helm.ngc.nvidia.com/nvidia/jarvis/charts/jarvis-api-v1.0.0-b.1.tgz \ --username=\$oauthtoken --password=$NGC_API_KEY --untar
The result of the above operation will be a new directory called
jarvis-api
in your current working directory. Within that directory is avalues.yaml
file which can be modified to suit your use case (see kubernetes_secrets and Jarvis Settings).After the
values.yaml
file has been updated to reflect the deployment requirements, Jarvis can be deployed to the Kubernetes cluster:helm install jarvis-api jarvis-api
Alternatively, use the
--set
option to install without modifying thevalues.yaml
file. Make sure to set the NGC API key, email, andmodel_key_string
to the appropriate values. By default,model_key_string
istlt_encode
.helm install jarvis-api \ --set ngcCredentials.password=`echo -n $NGC_API_KEY | base64 -w0` \ --set ngcCredentials.email=your_email@your_domain.com \ --set modelRepoGenerator.modelDeployKey=`echo -n model_key_string | base64 -w0`
Helm configuration. The following sections point out a few key areas of the
values.yaml
file and considerations for deployment. Consult the individual service documentation for more details as well as the Helm chart’svalues.yaml
file, which contains inline comments explaining the configuration options.
Required Software¶
To deploy Jarvis, a functioning Kubernetes environment with a GPU (Volta or later) is required. This can be either on premise or in a cloud provider, or within a managed Kubernetes environment so long as the environment has GPU support enabled.
Kubernetes Secrets¶
The Helm deployment uses multiple Kubernetes secrets for obtaining access
to NGC: one for Docker images, another for model artifacts, and one for encypted models.
By default, these are named imagepullsecret
, modelpullsecret
, and jarvis-model-deploy-key
respectively. The names of the secrets can be modified in the values.yaml
file,
however, if you are deploying into an EGX or FleetCommander managed environment,
your environment will have support for imagepullsecret
and modelpullsecret
today. These secrets
are managed by the chart, and can be manipulated by setting the respective values
within the ngcCredentials
section within values.yaml
.
Jarvis Settings¶
The values.yaml
for Jarvis is intended to provide maximum
flexibility in deployment configurations.
The replicaCount
field is used to configure the number of
identical instances (or pods) of the services that are deployed. When
load-balanced appropriately, increasing this number (as resources
permit) will enable horizontal scaling for increased load.
Individual speech services (ASR, NLP, or TTS) may be disabled by
changing the jarvis.speechServices.[asr|nlp|tts]
key to false
.
Prebuilt models not required for your deployment can be deleted from
the list in modelRepoGenerator.ngcModelConfigs
.
NVIDIA recommends you remove models and disable services that are not
used to reduce deployment time and GPU memory usage.
By default, models are downloaded from NGC, optimized for TensorRT (if
necessary) before the service starts, and stored in a short-lived
location. When the pod terminates, these model artifacts are deleted
and the storage is freed for other workloads. This behavior is controlled
by the modelDeployVolume
field and its default value
emptyDir: {}
. See the Kubernetes Volumes
documentation
for alternative options that can be used for persistent storage.
Note:
Persistent storage should only be used in homogenous deployments where GPU models are identical.
Currently provided models nearly fill a T4’s memory (16GB). We recommend running a subset of models/services if using a single GPU.
Ingress Controller¶
There is a base configuration for a simple ingress controller
using Traefik. This can be configured through the values.yaml
,
or can be replaced with any controller supporting http2
and grpc
.
Ingress controllers will be found in both on prem and cloud based deployments. For this to work correctly, you must have functional name resolution using whatever mechanism (DNS, /etc/host files etc).
For any sort of multi-pod scaling you must have a correctly configured ingress controller performing http2/grpc load balancing including name resolution.
Further details can be found in the ingress:
section in the values.yaml
file.
Load Balancer¶
For L2 load balancing, a barebones config using MetalLB has been supplied
and is located in the loadbalancer:
section in the values.yaml
file.
This will be useful in prem deployments, however, cloud-based deployments will need to use the approriate service from their provider as the networking is generally not exposed at this layer.
More details can be found in the loadbalancer:
section in the values.yaml
file.