Deploying Riva ASR Service on AWS EKS¶
This is a sample for deploying and scaling Riva ASR Service on Amazon Web Services (AWS) Elastic Kubernetes Service (EKS) with traefik based load balancing. It includes the following steps:
- Downloading and modifying the Riva API Helm Chart to add a node selector and to be a headless service. 
- Downloading and modifying the Traefik Helm Chart to add a node selector and expose it on the cluster-internal IP. 
- Defining the client service and ingress route. 
- Defining cluster config and launching the cluster via - eksctl.
- Benchmarking 
- Scaling the cluster and the Riva ASR Service. 
This sample assumes that:
- The system has access to Riva via NGC (check via the Docker login - nvcr.io).
- The system has pre-installed - eksctl,- helmand- kubectl.
This sample has been tested on: eksctl (0.35.0), helm (v3.4.1), kubectl (v1.17.0), and traefik (2.2.8, API version v2).
Note
Each step can have validation, debug, cleanup, and monitoring steps associated with each step in case you’d like to re-do that step.
Downloading and Modifying the Riva API Helm Chart¶
- Download and untar the Riva API Helm Chart. Replace - VERSION_TAGwith the specific version needed.- $ export NGC_API_KEY=<<replace with your NGC_API_KEY>> $ export VERSION_TAG="1.2.1-beta" $ helm fetch https://helm.ngc.nvidia.com/nvidia/riva/charts/riva-api-${VERSION_TAG}.tgz --username='$oauthtoken' --password=$NGC_API_KEY $ tar -xvzf riva-api-${VERSION_TAG}.tgz 
- Within the - riva-nvidiafolder, modify the following files:- values.yaml- Set all services to - falseexcept the ASR service (TTS, NLP).
- For - ngcModelConfigs.asr, select the model to use for ASR; optionally comment out the rest.
 
- templates/deployment.yaml- Within - spec.template.specadd:- nodeSelector: eks.amazonaws.com/nodegroup: gpu-linux-workers tolerations: - key: gpu-type operator: Equal value: v100 effect: NoSchedule 
- This tells the service to deploy on a node-group named - gpu-linux-workersand also restricts it to V100 GPU type.
 
- templates/service.yaml- Within - spec, replace- type: {{ .Values.service.type }}with- clusterIP: None.
- We have now made this a headless service (overriding the - .Values.service.typeoriginally set to be- LoadBalancerin- values.yaml).
 
 
Downloading and Modifying the Traefik Helm Chart¶
- Download and untar the Traefik Helm Chart. - $ helm repo add traefik https://helm.traefik.io/traefik $ helm repo update $ helm fetch traefik/traefik $ tar -zxvf traefik-9.1.1.tgz 
- Within the - traefikfolder, modify the following files:- values.yaml- Change - service.typefrom- LoadBalancerto- ClusterIP. This will expose the service on a cluster-internal IP.
- Set - nodeSelectorto- { eks.amazonaws.com/nodegroup: cpu-linux-lb }. Similar to what we did for Riva API Service, this will tell the Traefik Service to run on the- cpu-linux-lbnode-group.
 
 
Defining the Client Service and Ingress Route¶
- Pull the - riva-api-clientcontainer from Riva NGC.
- Service to deploy on the - cpu-linux-clientnode-group. The client- deployment.yamllooks like the following:- apiVersion: apps/v1 kind: Deployment metadata: name: ss-client labels: app: "rivaasrclient" namespace: riva spec: replicas: 1 selector: matchLabels: app: "rivaasrclient" template: metadata: labels: app: "rivaasrclient" spec: nodeSelector: eks.amazonaws.com/nodegroup: cpu-linux-clients imagePullSecrets: - name: riva-ea-regcred containers: - name: riva-client image: "nvcr.io/nvidia/riva/riva-speech-client:1.5.0-beta" command: ["/bin/bash"] args: ["-c", "while true; do sleep 5; done"] - With all the individual services ready to go, we need to define an ingress route that will enable the traefik load balancer to balance the incoming requests across multiple - riva-apiservices. The code below shows how- riva-ingress.yamlis defined. This relies on a DNS entry matching the Host clause. Currently, this looks for- riva.nvda. Replace or add a DNS resolution suitable for the deployment environment.- apiVersion: traefik.containo.us/v1alpha1 kind: IngressRoute metadata: name: riva-ingressroute namespace: riva spec: entryPoints: - web routes: - match: Host(`riva.nvda`) kind: Rule services: - name: riva-riva-api port: 50051 scheme: h2c 
Defining and Launching the EKS Cluster¶
So far, we’ve talked about 3 node-groups, cpu-linux-client, cpu-linux-lb and gpu-linux-workers.
- Set each of these node-groups in our cluster. - cpu-linux-clientWe want to use m5.2xlarge (general purpose) instances with minimum size 1 and maximum size 4.
- cpu-linux-lbWe want to use one c5.24xlarge (compute intensive) instance.
- gpu-linux-workersWe want to use p3.2xlarge (single V100 GPU) with minimum size 1 and maximum size 4.
 
- Build and launch a configuration that defines each of these node-groups and save to a file called - eks_launch_conf.yaml.- apiVersion: eksctl.io/v1alpha5 kind: ClusterConfig metadata: name: riva-cluster region: us-west-2 version: "1.17" managedNodeGroups: - name: gpu-linux-workers labels: { role: workers } instanceType: p3.2xlarge minSize: 1 maxSize: 8 volumeSize: 100 privateNetworking: true ssh: allow: true - name: cpu-linux-clients labels: { role: clients } instanceType: m5.2xlarge minSize: 1 maxSize: 4 volumeSize: 100 privateNetworking: true ssh: allow: true - name: cpu-linux-lb labels: { role: loadbalancers } instanceType: c5.24xlarge desiredCapacity: 1 volumeSize: 100 privateNetworking: true ssh: allow: true 
- Launch the cluster with the above config. - $ eksctl create cluster -f eksctl_launch_conf.yaml - As a result of this command, you should see some changes in your default Kubernetes configuration file, and the nodes should start showing up in Kubernetes. Here is how to check: - $ cat .kube/config $ kubectl get pods -A $ kubectl get nodes --show-labels $ kubectl get nodes --selector role=workers $ kubectl get nodes --selector role=clients $ kubectl get nodes --selector role=loadbalancers 
- After the cluster is up-and-running, it is time to launch the services. - # setup namespaces $ kubectl create namespace riva # ngc api key setup and secrets setup, if not already set by the helm chart $ export NGC_API_KEY=<<NGC_API_KEY>> # install gpu operator $ helm repo add nvdp https://nvidia.github.io/k8s-device-plugin $ helm repo update $ helm install \ --generate-name \ --set failOnInitError=false \ nvdp/nvidia-device-plugin # install riva $ cd riva-api-nvidia $ helm install --namespace riva riva --set ngcCredentials.password=`echo -n $NGC_API_KEY | base64 -w0` --set modelRepoGenerator.modelDeployKey=`echo -n tlt_encode | base64 -w0` --set riva.speechServices.asr=true --set riva.speechServices.tts=true --set riva.speechServices.nlp=true $ cd .. # debug $ export pod=`kubectl get pods -n riva | cut -d " " -f 1|grep riva` $ kubectl describe pod -n riva $pod # watch logs $ kubectl logs -n riva -f $pod -c riva-speech-api # install traefik $ cd traefik/ $ helm install traefik traefik -n riva $ cd .. # install client $ cd traefik/ $ kubectl apply -f deployment.yaml -n riva $ cd .. # ingress route apply $ cd traefik/ $ kubectl apply -f riva-ingress.yaml -n riva $ cd .. 
Running the Benchmarks¶
After all the services are up-and-running, we can benchmark by stepping into the client container and send requests to the load balancer.
Here is how the services look like:
$ kubectl get svc -A
NAMESPACE     NAME                TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)                                          AGE
default       kubernetes          ClusterIP   10.100.0.1     <none>        443/TCP                                          53m
riva        riva-riva-api   ClusterIP   None           <none>        8000/TCP,8001/TCP,8002/TCP,50051/TCP,60051/TCP   91s
riva        traefik             ClusterIP   10.100.182.7   <none>        80/TCP,443/TCP                                   68s
kube-system   kube-dns            ClusterIP   10.100.0.10    <none>        53/UDP,53/TCP
And here are the pods:
$ kubectl get pods -A
NAMESPACE     NAME                                    READY   STATUS    RESTARTS   AGE
riva        riva-riva-api-5d8f5c7dd6-vkd49      2/2     Running   0          6m33s
riva        ss-client-7ff77cbb76-djt5q              1/1     Running   0          6m2s
riva        traefik-5fb6c8bb47-mlxsg                1/1     Running   0          6m10s
kube-system   aws-node-fgm52                          1/1     Running   0          51m
kube-system   aws-node-hbwfn                          1/1     Running   0          50m
kube-system   aws-node-xltx6                          1/1     Running   0          51m
kube-system   coredns-5946c5d67c-5w8bv                1/1     Running   0          57m
kube-system   coredns-5946c5d67c-f728c                1/1     Running   0          57m
kube-system   kube-proxy-hpp6p                        1/1     Running   0          50m
kube-system   kube-proxy-t4dvb                        1/1     Running   0          51m
kube-system   kube-proxy-v2ttk                        1/1     Running   0          51m
kube-system   nvidia-device-plugin-1611946093-vgg2f   1/1     Running   0          6m46s
kube-system   nvidia-device-plugin-1611946093-w6969   1/1     Running   0          6m46s
kube-system   nvidia-device-plugin-1611946093-w7sw4   1/1     Running   0          6m46s
- Run the benchmarks. - # exec into the client $ export clnt=`kubectl get pods -n riva | cut -d " " -f 1| grep ss-client` $ kubectl exec --stdin --tty $clnt /bin/bash -n riva # setup fqdn inside the ss-client container with Traefik svc IP $ kubectl get svc -A $ echo '10.100.182.7 riva.nvda' >> /etc/hosts # test connectivity, exec into the client and run the following $ riva_streaming_asr_client --audio_file=/work/wav/vad_test_files/2094-142345-0010.wav --automatic_punctuation=false --riva_uri=riva.nvda:80 # run benchmark $ for i in `seq 5`; do /usr/local/bin/riva_streaming_asr_client --num_parallel_requests=512 --num_iterations=2048 --audio_file=/work/wav/test/1272-135031-0000.wav --interim_results=false --automatic_punctuation=false --print_transcripts=false --chunk_duration_ms=800 --riva_uri=riva.nvda:80; done | tee output_config1_max_throughtput 
- Monitor the GPU usage. Navigate into any of the - riva-apipod (- riva-speechcontainer) in a separate terminal.- # to monitor GPU usage $ kubectl exec --stdin --tty /bin/bash -n riva -c riva-speech $ watch -n0.1 nvidia-smi 
Scaling and Deleting the Cluster¶
The cluster and services can be scaled using the following commands:
# scaling the nodegroups
$ eksctl scale nodegroup --name=gpu-linux-workers --cluster=riva-cluster --nodes=8 --region=us-west-2 # or use the EKS UI
# now scale the riva api
$ kubectl scale deployments/riva-riva-api --replicas=8 -n riva
For deleting the cluster, run:
$ eksctl delete cluster riva-cluster --region=us-west-2