.. _helm_deploy:

Kubernetes
==========

Included in the NGC Helm Repository is a chart designed to automate
for push-button deployment to a Kubernetes cluster.

The `Jarvis AI Services Helm
Chart <https://ngc.nvidia.com/helm-charts/|NgcOrgTeam|:jarvis-api>`_
can be used to deploy ASR, NLP, and TTS services automatically.
The Helm chart performs a number of functions:

-   Pulls Docker images from NGC for Jarvis Speech Server,
    and utility containers for downloading and converting models.
-   Downloads requested model artifacts from NGC as configured in the
    ``values.yaml`` file.
-   Generates the Triton Inference Server model repository.
-   Starts Jarvis Speech as configured in a Kubernetes pod.
-   Exposes the Jarvis Speech Server as configured services.

Example pre-trained models are released with Jarvis for each of the
services. The Helm chart comes pre-configured for downloading and
deploying all of these models.

**Note:** The Helm chart configuration can be modified for your use
case by modifying the ``values.yaml`` file. In this file, you can change
the settings related to which models to deploy, where to store them, and 
how to expose the services.

#. Validate Kubernetes with NVIDIA GPU support.

   Kubernetes with GPU is well supported by NVIDIA.  The instructions at `Install Kubernetes <https://docs.nvidia.com/datacenter/cloud-native/kubernetes/install-k8s.html>`_ should be consulted to ensure your environment is properly setup.

   If using an NVIDIA A100 GPU with Multi-Instance GPU (MIG) support, refer to `MIG Support in Kubernetes <https://docs.nvidia.com/datacenter/cloud-native/kubernetes/mig-k8s.html>`_.
   
#.  Download and modify the Helm chart for your use. Fetch it from NGC:

    .. prompt:: bash
        :substitutions:
    
        export NGC_API_KEY=<your_api_key>
        helm fetch https://helm.ngc.nvidia.com/|NgcOrgTeam|/charts/jarvis-api-v|VersionNum|.tgz \
            --username=\$oauthtoken --password=$NGC_API_KEY --untar
   
    The result of the above operation will be a new directory called
    ``jarvis-api`` in your current working directory. Within that directory
    is a ``values.yaml`` file which can be modified to suit your use case (see
    :ref:`kubernetes_secrets` and :ref:`jarvis_settings`).

#.  After the ``values.yaml`` file has been updated to reflect the
    deployment requirements, Jarvis can be deployed to the Kubernetes
    cluster:

    .. prompt:: bash
        :substitutions:
    
        helm install jarvis-api jarvis-api
    

    Alternatively, use the ``--set`` option to install without modifying the ``values.yaml`` file. 
    Make sure to set the NGC API key, email, and ``model_key_string`` to the appropriate values. By default,
    ``model_key_string`` is ``tlt_encode``.

    .. prompt:: bash
        :substitutions:

        helm install jarvis-api \
            --set ngcCredentials.password=`echo -n $NGC_API_KEY | base64 -w0` \
            --set ngcCredentials.email=your_email@your_domain.com \
            --set modelRepoGenerator.modelDeployKey=`echo -n model_key_string | base64 -w0`
   

#.  Helm configuration. The following sections point out a few key
    areas of the ``values.yaml`` file and considerations for deployment. Consult
    the individual service documentation for more details as well as
    the Helm chart’s ``values.yaml`` file, which contains inline comments
    explaining the configuration options.

.. _required_software:

Required Software
----------------

To deploy Jarvis, a functioning Kubernetes environment with a GPU (Volta or later) is required.  This can be
either on premise or in a cloud provider, or within a managed Kubernetes environment so long as the environment
has GPU support enabled.

.. _installing_minikube:: 

.. only:: internal

    Installing Minikube
    -------------------
    For our internal testing we used minikube from https://minikube.sigs.k8s.io/docs/start/. To install minikube, run:

    ..prompt:: bash

        curl -LO https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64
        sudo install minikube-linux-amd64 /usr/local/bin/minikube
        sudo minikube start --driver=none


    You must use ``sudo`` above to start minikube.  There may be additional dependencies depending on your environment, minikube will tell you what is needed.  Follow those instructions.
    You can either install ``kubectl`` or use the version built into minikube (minikube kubectl)

    Install ``helm3`` from: https://helm.sh/docs/intro/install/:

    .. prompt:: bash

        $ curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3
        $ chmod 700 get_helm.sh
        $ ./get_helm.sh


    Now that the Kubernetes cluster has come up and we have the needed tools, we need to enable GPU support.
    Using https://github.com/NVIDIA/k8s-device-plugin, run: 

    .. prompt:: bash

        helm repo add nvdp https://nvidia.github.io/k8s-device-plugin
        helm repo update
        helm install \
            --generate-name \
            nvdp/nvidia-device-plugin

    Add ``traefik`` for ``ingress`` controller:

        .. prompt:: bash
        
            helm repo add traefik https://containous.github.io/traefik-helm-chart
            helm repo update
            helm install traefik traefik/traefik
            or:
            helm install traefik traefik/traefik --set dashboard.enabled=true,serviceType=NodePort,dashboard.domain=dashboard.traefik,rbac.enabled=true  --namespace kube-system
        

    If using L2 load balancing (this will require additional IP addresses - check with your network admin):

        .. prompt:: bash

        kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.9.3/manifests/namespace.yaml
        kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.9.3/manifests/metallb.yaml
        # On first install only
        kubectl create secret generic -n metallb-system memberlist --from-literal=secretkey="$(openssl rand -base64 128)"

        
    We used ``ifconfig`` to create virtual IPs on our local network.  If you are on a network with dhcp assigned address, check with your network admin for static IP allocation.

        .. prompt:: bash

        sudo ifconfig interface:$VIRTUALNUM $NEWIP
        change the following per your network:
        sudo ifconfig interface:1 10.42.0.191
        

    In order to properly loadbalance L7 requests, name services will need to be working.  ``/etc/hosts`` is easy to setup.  If you need a more permanent solution and are in the NVIDIA corp network, this tool will also work: https://sc-itss-02.nvidia.com/users/dns/hostrecord. If you are in a CSP or other environment, you will need to refer to their documentation as needed.

    At this point the environment should be ready for Jarvis installation.


.. _kubernetes_secrets:

Kubernetes Secrets
------------------

The Helm deployment uses multiple Kubernetes secrets for obtaining access
to NGC: one for Docker images, another for model artifacts, and one for encypted models.
By default, these are named ``imagepullsecret``, ``modelpullsecret``, and ``jarvis-model-deploy-key``
respectively.  The names of the secrets can be modified in the ``values.yaml`` file,
however, if you are deploying into an EGX or FleetCommander managed environment,
your environment will have support for ``imagepullsecret`` and ``modelpullsecret`` today.  These secrets
are managed by the chart, and can be manipulated by setting the respective values
within the ``ngcCredentials`` section within ``values.yaml``.


.. _jarvis_settings:

Jarvis Settings
---------------

The ``values.yaml`` for Jarvis is intended to provide maximum 
flexibility in deployment configurations.

The ``replicaCount`` field is used to configure the number of
identical instances (or pods) of the services that are deployed. When
load-balanced appropriately, increasing this number (as resources
permit) will enable horizontal scaling for increased load.

Individual speech services (ASR, NLP, or TTS) may be disabled by
changing the ``jarvis.speechServices.[asr|nlp|tts]`` key to ``false``.

Prebuilt models not required for your deployment can be deleted from
the list in ``modelRepoGenerator.ngcModelConfigs``.
NVIDIA recommends you remove models and disable services that are not
used to reduce deployment time and GPU memory usage.

By default, models are downloaded from NGC, optimized for TensorRT (if
necessary) before the service starts, and stored in a short-lived
location. When the pod terminates, these model artifacts are deleted
and the storage is freed for other workloads. This behavior is controlled
by the ``modelDeployVolume`` field and its default value
``emptyDir: {}``. See the `Kubernetes Volumes
documentation <https://kubernetes.io/docs/concepts/storage/volumes/>`_
for alternative options that can be used for persistent storage.

**Note:**

-   Persistent storage should only be used in homogenous deployments
    where GPU models are identical.
-   Currently provided models nearly fill a T4's memory (16GB). We recommend
    running a subset of models/services if using a single GPU.

Ingress Controller
------------------

There is a base configuration for a simple ingress controller
using Traefik. This can be configured through the ``values.yaml``,
or can be replaced with any controller supporting ``http2`` and ``grpc``.

Ingress controllers will be found in both on prem and cloud based deployments.
For this to work correctly, you must have functional name resolution using
whatever mechanism (DNS, /etc/host files etc).

For any sort of multi-pod scaling you must have a correctly configured
ingress controller performing http2/grpc load balancing including name resolution.

Further details can be found in the ``ingress:`` section in the ``values.yaml`` file.

Load Balancer
-------------

For L2 load balancing, a barebones config using MetalLB has been supplied
and is located in the ``loadbalancer:`` section in the ``values.yaml`` file.

This will be useful in prem deployments, however, cloud-based deployments
will need to use the approriate service from their provider as the networking
is generally not exposed at this layer.

More details can be found in the ``loadbalancer:`` section in the ``values.yaml`` file.