.. _helm_deploy: Kubernetes ========== Included in the NGC Helm Repository is a chart designed to automate for push-button deployment to a Kubernetes cluster. The `Jarvis AI Services Helm Chart `_ can be used to deploy ASR, NLP, and TTS services automatically. The Helm chart performs a number of functions: - Pulls Docker images from NGC for Jarvis Speech Server, and utility containers for downloading and converting models. - Downloads requested model artifacts from NGC as configured in the ``values.yaml`` file. - Generates the Triton Inference Server model repository. - Starts Jarvis Speech as configured in a Kubernetes pod. - Exposes the Jarvis Speech Server as configured services. Example pre-trained models are released with Jarvis for each of the services. The Helm chart comes pre-configured for downloading and deploying all of these models. **Note:** The Helm chart configuration can be modified for your use case by modifying the ``values.yaml`` file. In this file, you can change the settings related to which models to deploy, where to store them, and how to expose the services. #. Validate Kubernetes with NVIDIA GPU support. Kubernetes with GPU is well supported by NVIDIA. The instructions at `Install Kubernetes `_ should be consulted to ensure your environment is properly setup. If using an NVIDIA A100 GPU with Multi-Instance GPU (MIG) support, refer to `MIG Support in Kubernetes `_. #. Download and modify the Helm chart for your use. Fetch it from NGC: .. prompt:: bash :substitutions: export NGC_API_KEY= helm fetch https://helm.ngc.nvidia.com/|NgcOrgTeam|/charts/jarvis-api-v|VersionNum|.tgz \ --username=\$oauthtoken --password=$NGC_API_KEY --untar The result of the above operation will be a new directory called ``jarvis-api`` in your current working directory. Within that directory is a ``values.yaml`` file which can be modified to suit your use case (see :ref:`kubernetes_secrets` and :ref:`jarvis_settings`). #. After the ``values.yaml`` file has been updated to reflect the deployment requirements, Jarvis can be deployed to the Kubernetes cluster: .. prompt:: bash :substitutions: helm install jarvis-api jarvis-api Alternatively, use the ``--set`` option to install without modifying the ``values.yaml`` file. Make sure to set the NGC API key, email, and ``model_key_string`` to the appropriate values. By default, ``model_key_string`` is ``tlt_encode``. .. prompt:: bash :substitutions: helm install jarvis-api \ --set ngcCredentials.password=`echo -n $NGC_API_KEY | base64 -w0` \ --set ngcCredentials.email=your_email@your_domain.com \ --set modelRepoGenerator.modelDeployKey=`echo -n model_key_string | base64 -w0` #. Helm configuration. The following sections point out a few key areas of the ``values.yaml`` file and considerations for deployment. Consult the individual service documentation for more details as well as the Helm chart’s ``values.yaml`` file, which contains inline comments explaining the configuration options. .. _required_software: Required Software ---------------- To deploy Jarvis, a functioning Kubernetes environment with a GPU (Volta or later) is required. This can be either on premise or in a cloud provider, or within a managed Kubernetes environment so long as the environment has GPU support enabled. .. _installing_minikube:: .. only:: internal Installing Minikube ------------------- For our internal testing we used minikube from https://minikube.sigs.k8s.io/docs/start/. To install minikube, run: ..prompt:: bash curl -LO https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64 sudo install minikube-linux-amd64 /usr/local/bin/minikube sudo minikube start --driver=none You must use ``sudo`` above to start minikube. There may be additional dependencies depending on your environment, minikube will tell you what is needed. Follow those instructions. You can either install ``kubectl`` or use the version built into minikube (minikube kubectl) Install ``helm3`` from: https://helm.sh/docs/intro/install/: .. prompt:: bash $ curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3 $ chmod 700 get_helm.sh $ ./get_helm.sh Now that the Kubernetes cluster has come up and we have the needed tools, we need to enable GPU support. Using https://github.com/NVIDIA/k8s-device-plugin, run: .. prompt:: bash helm repo add nvdp https://nvidia.github.io/k8s-device-plugin helm repo update helm install \ --generate-name \ nvdp/nvidia-device-plugin Add ``traefik`` for ``ingress`` controller: .. prompt:: bash helm repo add traefik https://containous.github.io/traefik-helm-chart helm repo update helm install traefik traefik/traefik or: helm install traefik traefik/traefik --set dashboard.enabled=true,serviceType=NodePort,dashboard.domain=dashboard.traefik,rbac.enabled=true --namespace kube-system If using L2 load balancing (this will require additional IP addresses - check with your network admin): .. prompt:: bash kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.9.3/manifests/namespace.yaml kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.9.3/manifests/metallb.yaml # On first install only kubectl create secret generic -n metallb-system memberlist --from-literal=secretkey="$(openssl rand -base64 128)" We used ``ifconfig`` to create virtual IPs on our local network. If you are on a network with dhcp assigned address, check with your network admin for static IP allocation. .. prompt:: bash sudo ifconfig interface:$VIRTUALNUM $NEWIP change the following per your network: sudo ifconfig interface:1 10.42.0.191 In order to properly loadbalance L7 requests, name services will need to be working. ``/etc/hosts`` is easy to setup. If you need a more permanent solution and are in the NVIDIA corp network, this tool will also work: https://sc-itss-02.nvidia.com/users/dns/hostrecord. If you are in a CSP or other environment, you will need to refer to their documentation as needed. At this point the environment should be ready for Jarvis installation. .. _kubernetes_secrets: Kubernetes Secrets ------------------ The Helm deployment uses multiple Kubernetes secrets for obtaining access to NGC: one for Docker images, another for model artifacts, and one for encypted models. By default, these are named ``imagepullsecret``, ``modelpullsecret``, and ``jarvis-model-deploy-key`` respectively. The names of the secrets can be modified in the ``values.yaml`` file, however, if you are deploying into an EGX or FleetCommander managed environment, your environment will have support for ``imagepullsecret`` and ``modelpullsecret`` today. These secrets are managed by the chart, and can be manipulated by setting the respective values within the ``ngcCredentials`` section within ``values.yaml``. .. _jarvis_settings: Jarvis Settings --------------- The ``values.yaml`` for Jarvis is intended to provide maximum flexibility in deployment configurations. The ``replicaCount`` field is used to configure the number of identical instances (or pods) of the services that are deployed. When load-balanced appropriately, increasing this number (as resources permit) will enable horizontal scaling for increased load. Individual speech services (ASR, NLP, or TTS) may be disabled by changing the ``jarvis.speechServices.[asr|nlp|tts]`` key to ``false``. Prebuilt models not required for your deployment can be deleted from the list in ``modelRepoGenerator.ngcModelConfigs``. NVIDIA recommends you remove models and disable services that are not used to reduce deployment time and GPU memory usage. By default, models are downloaded from NGC, optimized for TensorRT (if necessary) before the service starts, and stored in a short-lived location. When the pod terminates, these model artifacts are deleted and the storage is freed for other workloads. This behavior is controlled by the ``modelDeployVolume`` field and its default value ``emptyDir: {}``. See the `Kubernetes Volumes documentation `_ for alternative options that can be used for persistent storage. **Note:** - Persistent storage should only be used in homogenous deployments where GPU models are identical. - Currently provided models nearly fill a T4's memory (16GB). We recommend running a subset of models/services if using a single GPU. Ingress Controller ------------------ There is a base configuration for a simple ingress controller using Traefik. This can be configured through the ``values.yaml``, or can be replaced with any controller supporting ``http2`` and ``grpc``. Ingress controllers will be found in both on prem and cloud based deployments. For this to work correctly, you must have functional name resolution using whatever mechanism (DNS, /etc/host files etc). For any sort of multi-pod scaling you must have a correctly configured ingress controller performing http2/grpc load balancing including name resolution. Further details can be found in the ``ingress:`` section in the ``values.yaml`` file. Load Balancer ------------- For L2 load balancing, a barebones config using MetalLB has been supplied and is located in the ``loadbalancer:`` section in the ``values.yaml`` file. This will be useful in prem deployments, however, cloud-based deployments will need to use the approriate service from their provider as the networking is generally not exposed at this layer. More details can be found in the ``loadbalancer:`` section in the ``values.yaml`` file.