Deploying Jarvis ASR Service on AWS EKS ======================================= This is a how-to sample for deploying and scaling Jarvis ASR Service on Amazon Web Services (AWS) Elastic Kubernetes Service (EKS) with traefik based load balancing. It includes the following steps: #. Downloading and modifying the Jarvis API Helm Chart to add a node selector and to be a headless service. #. Downloading and modifying the Traefik Helm Chart to add a node selector and expose it on the cluster-internal IP. #. Defining the client service and ingress route. #. Defining cluster config and launching the cluster via ``eksctl``. #. Benchmarking #. Scaling the cluster and the Jarvis ASR Service. This sample assumes that: - The system has access to Jarvis via NGC (check via the Docker login ``nvcr.io``). - The system has pre-installed ``eksctl``, ``helm`` and ``kubectl``. This sample has been tested on: ``eksctl`` (0.35.0), ``helm`` (v3.4.1), ``kubectl`` (v1.17.0), and ``traefik`` (2.2.8, API version v2). Note: Each step may have validation, debug, cleanup, and monitoring steps associated with each step in case you'd like to re-do that step. Downloading and Modifying the Jarvis API Helm Chart -------------------------------------------------- #. Download and untar the Jarvis API Helm Chart. .. code-block:: bash :substitutions: $ export NGC_API_KEY=<> $ helm fetch https://helm.ngc.nvidia.com/ea-2-jarvis/charts/jarvis-api-0.2.1-ea.tgz --username='$oauthtoken' --password=$NGC_API_KEY $ tar -xvzf jarvis-api-0.2.1-ea.tgz #. Within the ``jarvis-nvidia`` folder, modify the following files: - ``values.yaml`` - Set all the services to ``false`` except the ASR service (vision, TTS, NLP). - Remove the ``trtPlugins`` since we are not using vision or NLP services in this example. - For ``ngcModelConfigs``, keep only one model relevant to ASR (``ea-2-jarvis::jarvis_asr_jasper_english_base:config_jasper_asr_trt_ensemble_streaming_throughput.yaml:ea2``) and comment out the rest. - ``templates/deployment.yaml`` - Within ``spec.template.spec`` add: .. code-block:: yaml :substitutions: nodeSelector: eks.amazonaws.com/nodegroup: gpu-linux-workers tolerations: - key: gpu-type operator: Equal value: v100 effect: NoSchedule - This tells the service to deploy on a node-group named ```gpu-linux-workers`` and also restricts it to V100 GPU type. - ``templates/service.yaml`` - Within ``spec``, replace ``type: {{ .Values.service.type }}`` with ``clusterIP: None``. - We have now made this a headless service (overriding the ``.Values.service.type`` originally set to be ``LoadBalancer`` in ``values.yaml``). Downloading and Modifying the Traefik Helm Chart ------------------------------------------------ #. Download and untar the Traefik Helm Chart. .. code-block:: bash :substitutions: $ helm repo add traefik https://helm.traefik.io/traefik $ helm repo update $ helm fetch traefik/traefik $ tar -zxvf traefik-9.1.1.tgz #. Within the ``traefik`` folder, modify the following files: - ``values.yaml`` - Change ``service.type`` from ``LoadBalancer`` to ``ClusterIP``. This will expose the service on a cluster-internal IP. - Set ``nodeSelector`` to ``{ eks.amazonaws.com/nodegroup: cpu-linux-lb }``. Similar to what we did for Jarvis API Service, this will tell the Traefik Service to run on the ``cpu-linux-lb`` node-group. Defining the Client Service and Ingress Route --------------------------------------------- #. Pull the ``jarvis-api-client`` container from Jarvis NGC. #. Service to deploy on the ``cpu-linux-client`` node-group. The client ``deployment.yaml`` looks like the following: .. code-block:: yaml :substitutions: apiVersion: apps/v1 kind: Deployment metadata: name: ss-client labels: app: "jarvisasrclient" namespace: jarvis spec: replicas: 1 selector: matchLabels: app: "jarvisasrclient" template: metadata: labels: app: "jarvisasrclient" spec: nodeSelector: eks.amazonaws.com/nodegroup: cpu-linux-clients imagePullSecrets: - name: jarvis-ea-regcred containers: - name: jarvis-client image: "nvcr.io/ea-2-jarvis/jarvis-api-client:ea2" command: ["/bin/bash"] args: ["-c", "while true; do sleep 5; done"] With all the individual services ready to go, we need to define an ingress route that will enable the traefik load balancer to balance the incoming requests across multiple ``jarvis-api`` services. Here is how ``jarvis-ingress.yaml`` is defined: .. code-block:: yaml :substitutions: apiVersion: traefik.containo.us/v1alpha1 kind: IngressRoute metadata: name: jarvis-ingressroute namespace: jarvis spec: entryPoints: - web routes: - match: Host(`jarvis.nvda`) kind: Rule services: - name: jarvis-jarvis-api port: 50051 scheme: h2c Defining and Launching the EKS Cluster -------------------------------------- So far, we've talked about 3 node-groups, ``cpu-linux-client``, ``cpu-linux-lb`` and ``gpu-linux-workers``. #. Set each of these node-groups in our cluster. - ``cpu-linux-client`` We want to use `m5.2xlarge `_ (general purpose) instances with minimum size 1 and maximum size 4. - ``cpu-linux-lb`` We want to use one `c5.24xlarge `_ (compute intensive) instance. - ``gpu-linux-workers`` We want to use `p3.2xlarge `_ (single V100 GPU) with minimum size 1 and maximum size 4. #. Build a launch configuration that defines each of these node-groups in ``eks_launch_conf.yaml``. .. code-block:: yaml :substitutions: apiVersion: eksctl.io/v1alpha5 kind: ClusterConfig metadata: name: jarvis-cluster region: us-west-2 version: "1.17" managedNodeGroups: - name: gpu-linux-workers labels: { role: workers } instanceType: p3.2xlarge minSize: 1 maxSize: 8 volumeSize: 100 privateNetworking: true ssh: allow: true - name: cpu-linux-clients labels: { role: clients } instanceType: m5.2xlarge minSize: 1 maxSize: 4 volumeSize: 100 privateNetworking: true ssh: allow: true - name: cpu-linux-lb labels: { role: loadbalancers } instanceType: c5.24xlarge desiredCapacity: 1 volumeSize: 100 privateNetworking: true ssh: allow: true #. Launch the cluster with the above config. .. code-block:: bash :substitutions: $ eksctl create cluster -f eksctl_launch_conf.yaml As a result of this command, you should see some changes in your default Kubernetes configuration file, and the nodes should start showing up in Kubernetes. Here is how to check: .. code-block:: bash :substitutions: $ cat .kube/config $ kubectl get pods -A $ kubectl get nodes --show-labels $ kubectl get nodes --selector role=workers $ kubectl get nodes --selector role=clients $ kubectl get nodes --selector role=loadbalancers #. After the cluster is up-and-running, it is time to launch the services. .. code-block:: bash :substitutions: # setup namespaces $ kubectl create namespace jarvis # ngc api key setup and secrets setup, if not already set by the helm chart $ export NGC_API_KEY=<> $ kubectl create secret generic jarvis-ea-regcred --from-file=.dockerconfigjson=/home/dgxuser/.docker/config.json --type=kubernetes.io/dockerconfigjson -n jarvis $ kubectl create secret generic jarvis-ngc-read --from-literal=key=$NGC_API_KEY -n jarvis # install gpu operator $ helm repo add nvdp https://nvidia.github.io/k8s-device-plugin $ helm repo update $ helm install \ --version=0.7.3 \ --generate-name \ --set failOnInitError=false \ nvdp/nvidia-device-plugin # cleanup for gpu operator $ helm list $ helm del nvidia-device-plugin-1609972038 # install jarvis $ cd jarvis-api-nvidia $ helm install --namespace jarvis jarvis . $ cd .. # debug $ kubectl describe pod -n jarvis jarvis-jarvis-api-5d8f5c7dd6-vkd49 # watch logs $ kubectl logs -n jarvis -f jarvis-jarvis-api-5d8f5c7dd6-vkd49 -c jarvis-speech-api # cleanup jarvis $ helm del jarvis -n jarvis # install traefik $ cd traefik/ $ helm install traefik traefik -n jarvis $ cd .. # remove $ kubectl delete deployment traefik -n jarvis # install client $ cd traefik/ $ kubectl apply -f deployment.yaml -n jarvis $ cd .. # remove client $ kubectl delete deployment ss-client -n jarvis # ingress route apply $ cd traefik/ $ kubectl apply -f jarvis-ingress.yaml -n jarvis $ cd .. Running the Benchmarks ---------------------- After all the services are up-and-running, we can benchmark by stepping into the client container and send requests to the load balancer. Here is how the services look like: .. code-block:: bash :substitutions: $ kubectl get svc -A NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE default kubernetes ClusterIP 10.100.0.1 443/TCP 53m jarvis jarvis-jarvis-api ClusterIP None 8000/TCP,8001/TCP,8002/TCP,50051/TCP,60051/TCP 91s jarvis traefik ClusterIP 10.100.182.7 80/TCP,443/TCP 68s kube-system kube-dns ClusterIP 10.100.0.10 53/UDP,53/TCP And here are the pods: .. code-block:: bash :substitutions: $ kubectl get pods -A NAMESPACE NAME READY STATUS RESTARTS AGE jarvis jarvis-jarvis-api-5d8f5c7dd6-vkd49 2/2 Running 0 6m33s jarvis ss-client-7ff77cbb76-djt5q 1/1 Running 0 6m2s jarvis traefik-5fb6c8bb47-mlxsg 1/1 Running 0 6m10s kube-system aws-node-fgm52 1/1 Running 0 51m kube-system aws-node-hbwfn 1/1 Running 0 50m kube-system aws-node-xltx6 1/1 Running 0 51m kube-system coredns-5946c5d67c-5w8bv 1/1 Running 0 57m kube-system coredns-5946c5d67c-f728c 1/1 Running 0 57m kube-system kube-proxy-hpp6p 1/1 Running 0 50m kube-system kube-proxy-t4dvb 1/1 Running 0 51m kube-system kube-proxy-v2ttk 1/1 Running 0 51m kube-system nvidia-device-plugin-1611946093-vgg2f 1/1 Running 0 6m46s kube-system nvidia-device-plugin-1611946093-w6969 1/1 Running 0 6m46s kube-system nvidia-device-plugin-1611946093-w7sw4 1/1 Running 0 6m46s #. Run the benchmarks. .. code-block:: bash :substitutions: # exec into the client $ kubectl exec --stdin --tty ss-client-7ff77cbb76-djt5q /bin/bash -n jarvis # setup fqdn inside the ss-client container with Traefik svc IP $ kubectl get svc -A $ echo '10.100.182.7 jarvis.nvda' >> /etc/hosts # test connectivity, exec into the client and run the following $ jarvis_streaming_asr_client --audio_file=/work/wav/vad_test_files/2094-142345-0010.wav --automatic_punctuation=false --jarvis_uri=jarvis.nvda:80 # run benchmark $ for i in `seq 5`; do /usr/local/bin/jarvis_streaming_asr_client --num_parallel_requests=512 --num_iterations=2048 --audio_file=/work/wav/test/1272-135031-0000.wav --interim_results=false --automatic_punctuation=false --print_transcripts=false --chunk_duration_ms=800 --jarvis_uri=jarvis.nvda:80; done | tee output_config1_max_throughtput #. Monitor the GPU usage. Step into any of the ``jarvis-api`` pod (``jarvis-trtis`` container) in a separate terminal. .. code-block:: bash :substitutions: # to monitor GPU usage $ kubectl exec --stdin --tty jarvis-jarvis-api-5d8f5c7dd6-vkd49 /bin/bash -n jarvis -c jarvis-trtis $ watch -n0.1 nvidia-smi Scaling and Deleting the Cluster --------------------------------- The cluster and services can be scaled using the following commands: .. code-block:: bash :substitutions: # scaling the nodegroups $ eksctl scale nodegroup --name=gpu-linux-workers --cluster=jarvis-cluster --nodes=8 --region=us-west-2 # or use the EKS UI # now scale the jarvis api $ kubectl scale deployments/jarvis-jarvis-api --replicas=8 -n jarvis For deleting the cluster, use: .. code-block:: bash :substitutions: $ eksctl delete cluster jarvis-cluster --region=us-west-2