Deploying Jarvis ASR Service on AWS EKS
=======================================

This is a how-to sample for deploying and scaling Jarvis ASR Service on Amazon Web Services (AWS) Elastic Kubernetes Service (EKS) with traefik based load balancing. It includes the following steps:

#. Downloading and modifying the Jarvis API Helm Chart to add a node selector and to be a headless service.
#. Downloading and modifying the Traefik Helm Chart to add a node selector and expose it on the cluster-internal IP.
#. Defining the client service and ingress route.
#. Defining cluster config and launching the cluster via ``eksctl``.
#. Benchmarking
#. Scaling the cluster and the Jarvis ASR Service.

This sample assumes that: 
- The system has access to Jarvis via NGC (check via the Docker login ``nvcr.io``).
- The system has pre-installed ``eksctl``, ``helm`` and ``kubectl``.

This sample has been tested on: ``eksctl`` (0.35.0), ``helm`` (v3.4.1), ``kubectl`` (v1.17.0), and ``traefik`` (2.2.8, API version v2).

Note: Each step may have validation, debug, cleanup, and monitoring steps associated with each step in case you'd like to re-do that step.

Downloading and Modifying the Jarvis API Helm Chart
--------------------------------------------------

#. Download and untar the Jarvis API Helm Chart.

   .. code-block:: bash
       :substitutions:

       $ export NGC_API_KEY=<<NGC_API_KEY>>
       $ helm fetch https://helm.ngc.nvidia.com/ea-2-jarvis/charts/jarvis-api-0.2.1-ea.tgz --username='$oauthtoken' --password=$NGC_API_KEY
       $ tar -xvzf jarvis-api-0.2.1-ea.tgz

#. Within the ``jarvis-nvidia`` folder, modify the following files:
   
    - ``values.yaml``

        - Set all the services to ``false`` except the ASR service (vision, TTS, NLP). 
        
        - Remove the ``trtPlugins`` since we are not using vision or NLP services in this example. 
        
        - For ``ngcModelConfigs``, keep only one model relevant to ASR (``ea-2-jarvis::jarvis_asr_jasper_english_base:config_jasper_asr_trt_ensemble_streaming_throughput.yaml:ea2``) and comment out the rest.
    
    - ``templates/deployment.yaml``
    
        - Within ``spec.template.spec`` add:

            .. code-block:: yaml
                :substitutions:

             nodeSelector:
                    eks.amazonaws.com/nodegroup: gpu-linux-workers

                tolerations:
                - key: gpu-type
                    operator: Equal
                    value: v100
                    effect: NoSchedule

        - This tells the service to deploy on a node-group named ```gpu-linux-workers`` and also restricts it to V100 GPU type.
   
    - ``templates/service.yaml``
       
        - Within ``spec``, replace ``type: {{ .Values.service.type }}`` with ``clusterIP: None``.
       
        - We have now made this a headless service (overriding the ``.Values.service.type`` originally set to be ``LoadBalancer`` in ``values.yaml``). 

Downloading and Modifying the Traefik Helm Chart
------------------------------------------------

#. Download and untar the Traefik Helm Chart.

   .. code-block:: bash
       :substitutions:

       $ helm repo add traefik https://helm.traefik.io/traefik
       $ helm repo update
       $ helm fetch traefik/traefik
       $ tar -zxvf traefik-9.1.1.tgz

#. Within the ``traefik`` folder, modify the following files:

    - ``values.yaml``

        - Change ``service.type`` from ``LoadBalancer`` to ``ClusterIP``. This will expose the service on a cluster-internal IP.
        
        - Set ``nodeSelector`` to ``{ eks.amazonaws.com/nodegroup: cpu-linux-lb }``. Similar to what we did for Jarvis API Service, this will tell the Traefik Service to run on the ``cpu-linux-lb`` node-group. 

Defining the Client Service and Ingress Route
---------------------------------------------

#. Pull the ``jarvis-api-client`` container from Jarvis NGC.
#. Service to deploy on the ``cpu-linux-client`` node-group. The client ``deployment.yaml`` looks like the following:

   .. code-block:: yaml
       :substitutions:

       apiVersion: apps/v1
       kind: Deployment
       metadata:
       name: ss-client
       labels:
           app: "jarvisasrclient"
       namespace: jarvis
       spec:
       replicas: 1
       selector:
           matchLabels:
           app: "jarvisasrclient"
       template:
           metadata:
           labels:
               app: "jarvisasrclient"
           spec:
           nodeSelector:
               eks.amazonaws.com/nodegroup: cpu-linux-clients
           imagePullSecrets:
               - name: jarvis-ea-regcred
           containers:
               - name: jarvis-client
               image: "nvcr.io/ea-2-jarvis/jarvis-api-client:ea2"
               command: ["/bin/bash"]
               args: ["-c", "while true; do sleep 5; done"]

   With all the individual services ready to go, we need to define an ingress route that will enable the traefik load balancer to balance the incoming requests across multiple ``jarvis-api`` services. Here is how ``jarvis-ingress.yaml`` is defined:

   .. code-block:: yaml
       :substitutions:

       apiVersion: traefik.containo.us/v1alpha1
       kind: IngressRoute
       metadata:
       name: jarvis-ingressroute
       namespace: jarvis
       spec:
       entryPoints:
       - web
       routes:
       - match: Host(`jarvis.nvda`)
           kind: Rule
           services:
           - name: jarvis-jarvis-api
               port: 50051
               scheme: h2c

Defining and Launching the EKS Cluster
--------------------------------------

So far, we've talked about 3 node-groups, ``cpu-linux-client``, ``cpu-linux-lb`` and ``gpu-linux-workers``. 

#.  Set each of these node-groups in our cluster.

    - ``cpu-linux-client``
      We want to use `m5.2xlarge <https://aws.amazon.com/ec2/instance-types/m5/>`_ (general purpose) instances with minimum size 1 and maximum size 4.

    - ``cpu-linux-lb``
      We want to use one `c5.24xlarge <https://aws.amazon.com/ec2/instance-types/c5/>`_ (compute intensive) instance.

    - ``gpu-linux-workers``
      We want to use `p3.2xlarge <https://aws.amazon.com/ec2/instance-types/p3/>`_ (single V100 GPU) with minimum size 1 and maximum size 4.

#.  Build a launch configuration that defines each of these node-groups in ``eks_launch_conf.yaml``.

    .. code-block:: yaml
        :substitutions:

        apiVersion: eksctl.io/v1alpha5
        kind: ClusterConfig

        metadata:
        name: jarvis-cluster
        region: us-west-2
        version: "1.17"

        managedNodeGroups:
        - name: gpu-linux-workers
        labels: { role: workers }
        instanceType: p3.2xlarge
        minSize: 1
        maxSize: 8
        volumeSize: 100
        privateNetworking: true
        ssh:
            allow: true
        - name: cpu-linux-clients
        labels: { role: clients }
        instanceType: m5.2xlarge
        minSize: 1
        maxSize: 4
        volumeSize: 100
        privateNetworking: true
        ssh:
            allow: true
        - name: cpu-linux-lb
        labels: { role: loadbalancers }
        instanceType: c5.24xlarge
        desiredCapacity: 1
        volumeSize: 100
        privateNetworking: true
        ssh:
            allow: true

#.  Launch the cluster with the above config. 

    .. code-block:: bash
        :substitutions:

        $ eksctl create cluster -f eksctl_launch_conf.yaml

    As a result of this command, you should see some changes in your default Kubernetes configuration file, and the nodes should start showing up in Kubernetes. Here is how to check:

    .. code-block:: bash
        :substitutions:

        $ cat .kube/config
        $ kubectl get pods -A
        $ kubectl get nodes --show-labels
        $ kubectl get nodes --selector role=workers
        $ kubectl get nodes --selector role=clients
        $ kubectl get nodes --selector role=loadbalancers

#.  After the cluster is up-and-running, it is time to launch the services. 

    .. code-block:: bash
        :substitutions:

        # setup namespaces
        $ kubectl create namespace jarvis

        # ngc api key setup and secrets setup, if not already set by the helm chart
        $ export NGC_API_KEY=<<NGC_API_KEY>>
        $ kubectl create secret generic jarvis-ea-regcred --from-file=.dockerconfigjson=/home/dgxuser/.docker/config.json --type=kubernetes.io/dockerconfigjson -n jarvis
        $ kubectl create secret generic jarvis-ngc-read --from-literal=key=$NGC_API_KEY -n jarvis

        # install gpu operator
        $ helm repo add nvdp https://nvidia.github.io/k8s-device-plugin
        $ helm repo update
        $ helm install \
        --version=0.7.3 \
        --generate-name \
        --set failOnInitError=false \
        nvdp/nvidia-device-plugin

        # cleanup for gpu operator
        $ helm list
        $ helm del nvidia-device-plugin-1609972038

        # install jarvis
        $ cd jarvis-api-nvidia
        $ helm install --namespace jarvis jarvis .
        $ cd ..

        # debug
        $ kubectl describe pod -n jarvis jarvis-jarvis-api-5d8f5c7dd6-vkd49

        # watch logs
        $ kubectl logs -n jarvis -f jarvis-jarvis-api-5d8f5c7dd6-vkd49 -c jarvis-speech-api

        # cleanup jarvis
        $ helm del jarvis -n jarvis

        # install traefik
        $ cd traefik/
        $ helm install traefik traefik -n jarvis
        $ cd ..

        # remove
        $ kubectl delete deployment traefik -n jarvis

        # install client
        $ cd traefik/
        $ kubectl apply -f deployment.yaml -n jarvis
        $ cd ..

        # remove client
        $ kubectl delete deployment ss-client -n jarvis

        # ingress route apply
        $ cd traefik/
        $ kubectl apply -f jarvis-ingress.yaml -n jarvis
        $ cd ..


Running the Benchmarks
----------------------

After all the services are up-and-running, we can benchmark by stepping into the client container and send requests to the load balancer. 

Here is how the services look like:

.. code-block:: bash
    :substitutions:

    $ kubectl get svc -A
    NAMESPACE     NAME                TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)                                          AGE
    default       kubernetes          ClusterIP   10.100.0.1     <none>        443/TCP                                          53m
    jarvis        jarvis-jarvis-api   ClusterIP   None           <none>        8000/TCP,8001/TCP,8002/TCP,50051/TCP,60051/TCP   91s
    jarvis        traefik             ClusterIP   10.100.182.7   <none>        80/TCP,443/TCP                                   68s
    kube-system   kube-dns            ClusterIP   10.100.0.10    <none>        53/UDP,53/TCP

And here are the pods:

.. code-block:: bash
    :substitutions:

    $ kubectl get pods -A
    NAMESPACE     NAME                                    READY   STATUS    RESTARTS   AGE
    jarvis        jarvis-jarvis-api-5d8f5c7dd6-vkd49      2/2     Running   0          6m33s
    jarvis        ss-client-7ff77cbb76-djt5q              1/1     Running   0          6m2s
    jarvis        traefik-5fb6c8bb47-mlxsg                1/1     Running   0          6m10s
    kube-system   aws-node-fgm52                          1/1     Running   0          51m
    kube-system   aws-node-hbwfn                          1/1     Running   0          50m
    kube-system   aws-node-xltx6                          1/1     Running   0          51m
    kube-system   coredns-5946c5d67c-5w8bv                1/1     Running   0          57m
    kube-system   coredns-5946c5d67c-f728c                1/1     Running   0          57m
    kube-system   kube-proxy-hpp6p                        1/1     Running   0          50m
    kube-system   kube-proxy-t4dvb                        1/1     Running   0          51m
    kube-system   kube-proxy-v2ttk                        1/1     Running   0          51m
    kube-system   nvidia-device-plugin-1611946093-vgg2f   1/1     Running   0          6m46s
    kube-system   nvidia-device-plugin-1611946093-w6969   1/1     Running   0          6m46s
    kube-system   nvidia-device-plugin-1611946093-w7sw4   1/1     Running   0          6m46s

#.  Run the benchmarks. 

    .. code-block:: bash
        :substitutions:

        # exec into the client
        $ kubectl exec --stdin --tty ss-client-7ff77cbb76-djt5q /bin/bash -n jarvis

        # setup fqdn inside the ss-client container with Traefik svc IP
        $ kubectl get svc -A
        $ echo '10.100.182.7 jarvis.nvda' >> /etc/hosts

        # test connectivity, exec into the client and run the following
        $ jarvis_streaming_asr_client --audio_file=/work/wav/vad_test_files/2094-142345-0010.wav --automatic_punctuation=false --jarvis_uri=jarvis.nvda:80

        # run benchmark
        $ for i in `seq 5`; do /usr/local/bin/jarvis_streaming_asr_client --num_parallel_requests=512 --num_iterations=2048 --audio_file=/work/wav/test/1272-135031-0000.wav --interim_results=false --automatic_punctuation=false --print_transcripts=false --chunk_duration_ms=800 --jarvis_uri=jarvis.nvda:80; done | tee output_config1_max_throughtput

#.  Monitor the GPU usage. Step into any of the ``jarvis-api`` pod (``jarvis-trtis`` container) in a separate terminal.

    .. code-block:: bash
        :substitutions:

        # to monitor GPU usage
        $ kubectl exec --stdin --tty jarvis-jarvis-api-5d8f5c7dd6-vkd49 /bin/bash -n jarvis -c jarvis-trtis

        $ watch -n0.1 nvidia-smi


Scaling and Deleting the Cluster
---------------------------------

The cluster and services can be scaled using the following commands:

.. code-block:: bash
    :substitutions:

    # scaling the nodegroups
    $ eksctl scale nodegroup --name=gpu-linux-workers --cluster=jarvis-cluster --nodes=8 --region=us-west-2 # or use the EKS UI

    # now scale the jarvis api
    $ kubectl scale deployments/jarvis-jarvis-api --replicas=8 -n jarvis

For deleting the cluster, use:

.. code-block:: bash
    :substitutions:

    $ eksctl delete cluster jarvis-cluster --region=us-west-2