Deploying Jarvis ASR Service on AWS EKS¶
This is a how-to sample for deploying and scaling Jarvis ASR Service on Amazon Web Services (AWS) Elastic Kubernetes Service (EKS) with traefik based load balancing. It includes the following steps:
Downloading and modifying the Jarvis API Helm Chart to add a node selector and to be a headless service.
Downloading and modifying the Traefik Helm Chart to add a node selector and expose it on the cluster-internal IP.
Defining the client service and ingress route.
Defining cluster config and launching the cluster via
eksctl
.Benchmarking
Scaling the cluster and the Jarvis ASR Service.
This sample assumes that:
- The system has access to Jarvis via NGC (check via the Docker login nvcr.io
).
- The system has pre-installed eksctl
, helm
and kubectl
.
This sample has been tested on: eksctl
(0.35.0), helm
(v3.4.1), kubectl
(v1.17.0), and traefik
(2.2.8, API version v2).
Note: Each step may have validation, debug, cleanup, and monitoring steps associated with each step in case you’d like to re-do that step.
Downloading and Modifying the Jarvis API Helm Chart¶
Download and untar the Jarvis API Helm Chart.
$ export NGC_API_KEY=<<NGC_API_KEY>> $ helm fetch https://helm.ngc.nvidia.com/ea-2-jarvis/charts/jarvis-api-0.2.1-ea.tgz --username='$oauthtoken' --password=$NGC_API_KEY $ tar -xvzf jarvis-api-0.2.1-ea.tgz
Within the
jarvis-nvidia
folder, modify the following files:values.yaml
Set all the services to
false
except the ASR service (vision, TTS, NLP).Remove the
trtPlugins
since we are not using vision or NLP services in this example.For
ngcModelConfigs
, keep only one model relevant to ASR (ea-2-jarvis::jarvis_asr_jasper_english_base:config_jasper_asr_trt_ensemble_streaming_throughput.yaml:ea2
) and comment out the rest.
templates/deployment.yaml
Within
spec.template.spec
add:This tells the service to deploy on a node-group named
`gpu-linux-workers
and also restricts it to V100 GPU type.
templates/service.yaml
Within
spec
, replacetype: {{ .Values.service.type }}
withclusterIP: None
.We have now made this a headless service (overriding the
.Values.service.type
originally set to beLoadBalancer
invalues.yaml
).
Downloading and Modifying the Traefik Helm Chart¶
Download and untar the Traefik Helm Chart.
$ helm repo add traefik https://helm.traefik.io/traefik $ helm repo update $ helm fetch traefik/traefik $ tar -zxvf traefik-9.1.1.tgz
Within the
traefik
folder, modify the following files:values.yaml
Change
service.type
fromLoadBalancer
toClusterIP
. This will expose the service on a cluster-internal IP.Set
nodeSelector
to{ eks.amazonaws.com/nodegroup: cpu-linux-lb }
. Similar to what we did for Jarvis API Service, this will tell the Traefik Service to run on thecpu-linux-lb
node-group.
Defining the Client Service and Ingress Route¶
Pull the
jarvis-api-client
container from Jarvis NGC.Service to deploy on the
cpu-linux-client
node-group. The clientdeployment.yaml
looks like the following:apiVersion: apps/v1 kind: Deployment metadata: name: ss-client labels: app: "jarvisasrclient" namespace: jarvis spec: replicas: 1 selector: matchLabels: app: "jarvisasrclient" template: metadata: labels: app: "jarvisasrclient" spec: nodeSelector: eks.amazonaws.com/nodegroup: cpu-linux-clients imagePullSecrets: - name: jarvis-ea-regcred containers: - name: jarvis-client image: "nvcr.io/ea-2-jarvis/jarvis-api-client:ea2" command: ["/bin/bash"] args: ["-c", "while true; do sleep 5; done"]
With all the individual services ready to go, we need to define an ingress route that will enable the traefik load balancer to balance the incoming requests across multiple
jarvis-api
services. Here is howjarvis-ingress.yaml
is defined:apiVersion: traefik.containo.us/v1alpha1 kind: IngressRoute metadata: name: jarvis-ingressroute namespace: jarvis spec: entryPoints: - web routes: - match: Host(`jarvis.nvda`) kind: Rule services: - name: jarvis-jarvis-api port: 50051 scheme: h2c
Defining and Launching the EKS Cluster¶
So far, we’ve talked about 3 node-groups, cpu-linux-client
, cpu-linux-lb
and gpu-linux-workers
.
Set each of these node-groups in our cluster.
cpu-linux-client
We want to use m5.2xlarge (general purpose) instances with minimum size 1 and maximum size 4.cpu-linux-lb
We want to use one c5.24xlarge (compute intensive) instance.gpu-linux-workers
We want to use p3.2xlarge (single V100 GPU) with minimum size 1 and maximum size 4.
Build a launch configuration that defines each of these node-groups in
eks_launch_conf.yaml
.apiVersion: eksctl.io/v1alpha5 kind: ClusterConfig metadata: name: jarvis-cluster region: us-west-2 version: "1.17" managedNodeGroups: - name: gpu-linux-workers labels: { role: workers } instanceType: p3.2xlarge minSize: 1 maxSize: 8 volumeSize: 100 privateNetworking: true ssh: allow: true - name: cpu-linux-clients labels: { role: clients } instanceType: m5.2xlarge minSize: 1 maxSize: 4 volumeSize: 100 privateNetworking: true ssh: allow: true - name: cpu-linux-lb labels: { role: loadbalancers } instanceType: c5.24xlarge desiredCapacity: 1 volumeSize: 100 privateNetworking: true ssh: allow: true
Launch the cluster with the above config.
$ eksctl create cluster -f eksctl_launch_conf.yaml
As a result of this command, you should see some changes in your default Kubernetes configuration file, and the nodes should start showing up in Kubernetes. Here is how to check:
$ cat .kube/config $ kubectl get pods -A $ kubectl get nodes --show-labels $ kubectl get nodes --selector role=workers $ kubectl get nodes --selector role=clients $ kubectl get nodes --selector role=loadbalancers
After the cluster is up-and-running, it is time to launch the services.
# setup namespaces $ kubectl create namespace jarvis # ngc api key setup and secrets setup, if not already set by the helm chart $ export NGC_API_KEY=<<NGC_API_KEY>> $ kubectl create secret generic jarvis-ea-regcred --from-file=.dockerconfigjson=/home/dgxuser/.docker/config.json --type=kubernetes.io/dockerconfigjson -n jarvis $ kubectl create secret generic jarvis-ngc-read --from-literal=key=$NGC_API_KEY -n jarvis # install gpu operator $ helm repo add nvdp https://nvidia.github.io/k8s-device-plugin $ helm repo update $ helm install \ --version=0.7.3 \ --generate-name \ --set failOnInitError=false \ nvdp/nvidia-device-plugin # cleanup for gpu operator $ helm list $ helm del nvidia-device-plugin-1609972038 # install jarvis $ cd jarvis-api-nvidia $ helm install --namespace jarvis jarvis . $ cd .. # debug $ kubectl describe pod -n jarvis jarvis-jarvis-api-5d8f5c7dd6-vkd49 # watch logs $ kubectl logs -n jarvis -f jarvis-jarvis-api-5d8f5c7dd6-vkd49 -c jarvis-speech-api # cleanup jarvis $ helm del jarvis -n jarvis # install traefik $ cd traefik/ $ helm install traefik traefik -n jarvis $ cd .. # remove $ kubectl delete deployment traefik -n jarvis # install client $ cd traefik/ $ kubectl apply -f deployment.yaml -n jarvis $ cd .. # remove client $ kubectl delete deployment ss-client -n jarvis # ingress route apply $ cd traefik/ $ kubectl apply -f jarvis-ingress.yaml -n jarvis $ cd ..
Running the Benchmarks¶
After all the services are up-and-running, we can benchmark by stepping into the client container and send requests to the load balancer.
Here is how the services look like:
$ kubectl get svc -A
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default kubernetes ClusterIP 10.100.0.1 <none> 443/TCP 53m
jarvis jarvis-jarvis-api ClusterIP None <none> 8000/TCP,8001/TCP,8002/TCP,50051/TCP,60051/TCP 91s
jarvis traefik ClusterIP 10.100.182.7 <none> 80/TCP,443/TCP 68s
kube-system kube-dns ClusterIP 10.100.0.10 <none> 53/UDP,53/TCP
And here are the pods:
$ kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
jarvis jarvis-jarvis-api-5d8f5c7dd6-vkd49 2/2 Running 0 6m33s
jarvis ss-client-7ff77cbb76-djt5q 1/1 Running 0 6m2s
jarvis traefik-5fb6c8bb47-mlxsg 1/1 Running 0 6m10s
kube-system aws-node-fgm52 1/1 Running 0 51m
kube-system aws-node-hbwfn 1/1 Running 0 50m
kube-system aws-node-xltx6 1/1 Running 0 51m
kube-system coredns-5946c5d67c-5w8bv 1/1 Running 0 57m
kube-system coredns-5946c5d67c-f728c 1/1 Running 0 57m
kube-system kube-proxy-hpp6p 1/1 Running 0 50m
kube-system kube-proxy-t4dvb 1/1 Running 0 51m
kube-system kube-proxy-v2ttk 1/1 Running 0 51m
kube-system nvidia-device-plugin-1611946093-vgg2f 1/1 Running 0 6m46s
kube-system nvidia-device-plugin-1611946093-w6969 1/1 Running 0 6m46s
kube-system nvidia-device-plugin-1611946093-w7sw4 1/1 Running 0 6m46s
Run the benchmarks.
# exec into the client $ kubectl exec --stdin --tty ss-client-7ff77cbb76-djt5q /bin/bash -n jarvis # setup fqdn inside the ss-client container with Traefik svc IP $ kubectl get svc -A $ echo '10.100.182.7 jarvis.nvda' >> /etc/hosts # test connectivity, exec into the client and run the following $ jarvis_streaming_asr_client --audio_file=/work/wav/vad_test_files/2094-142345-0010.wav --automatic_punctuation=false --jarvis_uri=jarvis.nvda:80 # run benchmark $ for i in `seq 5`; do /usr/local/bin/jarvis_streaming_asr_client --num_parallel_requests=512 --num_iterations=2048 --audio_file=/work/wav/test/1272-135031-0000.wav --interim_results=false --automatic_punctuation=false --print_transcripts=false --chunk_duration_ms=800 --jarvis_uri=jarvis.nvda:80; done | tee output_config1_max_throughtput
Monitor the GPU usage. Step into any of the
jarvis-api
pod (jarvis-trtis
container) in a separate terminal.# to monitor GPU usage $ kubectl exec --stdin --tty jarvis-jarvis-api-5d8f5c7dd6-vkd49 /bin/bash -n jarvis -c jarvis-trtis $ watch -n0.1 nvidia-smi
Scaling and Deleting the Cluster¶
The cluster and services can be scaled using the following commands:
# scaling the nodegroups
$ eksctl scale nodegroup --name=gpu-linux-workers --cluster=jarvis-cluster --nodes=8 --region=us-west-2 # or use the EKS UI
# now scale the jarvis api
$ kubectl scale deployments/jarvis-jarvis-api --replicas=8 -n jarvis
For deleting the cluster, use:
$ eksctl delete cluster jarvis-cluster --region=us-west-2