Kubernetes (Amazon EKS)
Contents
Kubernetes (Amazon EKS)¶
This section is not supported for embedded platforms.
This is a sample for deploying and scaling Riva ASR Service on Amazon Web Services (AWS) Elastic Kubernetes Service (EKS) with Traefik-based load balancing. It includes the following steps:
Downloading and modifying the Riva API Helm Chart to add a node selector and to be a headless service.
Downloading and modifying the Traefik Helm Chart to add a node selector and expose it on the cluster-internal IP.
Defining the client service and ingress route.
Defining cluster config and launching the cluster via
eksctl.Benchmarking
Scaling the cluster and the Riva ASR Service.
This sample assumes that:
The system has access to Riva via NGC (check via the Docker login
nvcr.io).The system has pre-installed
eksctl,helmandkubectl.
This sample has been tested on: eksctl (0.35.0), helm (v3.4.1), kubectl (v1.17.0), and traefik (2.2.8, API version v2).
Note
Each step can have validation, debug, cleanup, and monitoring steps associated in case you’d like to re-do that step.
Downloading and Modifying the Riva API Helm Chart¶
Download and untar the Riva API Helm Chart. Replace
VERSION_TAGwith the specific version needed.export NGC_API_KEY=<<replace with your NGC_API_KEY>> export VERSION_TAG="2.0.0" helm fetch https://helm.ngc.nvidia.com/nvidia/riva/charts/riva-api-${VERSION_TAG}.tgz --username='$oauthtoken' --password=$NGC_API_KEY tar -xvzf riva-api-${VERSION_TAG}.tgz
Within the
riva-apifolder, modify the following files:values.yamlSet all services to
falseexcept the ASR service (TTS, NLP).For
ngcModelConfigs.asr, select the model to use for ASR; optionally comment out the rest.
templates/deployment.yamlWithin
spec.template.specadd:nodeSelector: eks.amazonaws.com/nodegroup: gpu-linux-workers tolerations: - key: gpu-type operator: Equal value: v100 effect: NoSchedule
This tells the service to deploy on a node-group named
gpu-linux-workersand also restricts it to V100 GPU type.templates/service.yamlWithin
spec, replacetype: {{ .Values.service.type }}withclusterIP: None.You have now made this a headless service (overriding the
.Values.service.typeoriginally set to beLoadBalancerinvalues.yaml).
Downloading and Modifying the Traefik Helm Chart¶
Download and untar the Traefik Helm Chart.
helm repo add traefik https://helm.traefik.io/traefik helm repo update helm fetch traefik/traefik tar -zxvf traefik-*.tgz
Within the
traefikfolder, modify the following files:values.yamlChange
service.typefromLoadBalancertoClusterIP. This exposes the service on a cluster-internal IP.Set
nodeSelectorto{ eks.amazonaws.com/nodegroup: cpu-linux-lb }. Similar to what you did for the Riva API Service, this tells the Traefik Service to run on thecpu-linux-lbnode-group.
Defining the Client Service and Ingress Route¶
Pull the
riva-api-clientcontainer from Riva NGC.Service to deploy on the
cpu-linux-clientnode-group. The clientdeployment.yamllooks like the following:apiVersion: apps/v1 kind: Deployment metadata: name: ss-client labels: app: "rivaasrclient" namespace: riva spec: replicas: 1 selector: matchLabels: app: "rivaasrclient" template: metadata: labels: app: "rivaasrclient" spec: nodeSelector: eks.amazonaws.com/nodegroup: cpu-linux-clients imagePullSecrets: - name: imagepullsecret containers: - name: riva-client image: "nvcr.io/nvidia/riva/riva-speech-client:2.0.0" command: ["/bin/bash"] args: ["-c", "while true; do sleep 5; done"]
With all the individual services ready to go, you need to define an ingress route that enables the Traefik load balancer to balance the incoming requests across multiple
riva-apiservices. The code below shows howriva-ingress.yamlis defined. This relies on a DNS entry matching the Host clause. Currently, this looks forriva.nvda. Replace or add a DNS resolution suitable for the deployment environment.apiVersion: traefik.containo.us/v1alpha1 kind: IngressRoute metadata: name: riva-ingressroute namespace: riva spec: entryPoints: - web routes: - match: Host(`riva.nvda`) kind: Rule services: - name: riva-riva-api port: 50051 scheme: h2c
Defining and Launching the EKS Cluster¶
As discussed in previous sections, there are three node-groups, cpu-linux-client, cpu-linux-lb and gpu-linux-workers.
Set each of these node-groups in the cluster.
cpu-linux-clientUse m5.2xlarge (general purpose) instances with minimum size 1 and maximum size 4.cpu-linux-lbUse one c5.24xlarge (compute intensive) instance.gpu-linux-workersUse p3.2xlarge (single V100 GPU) with minimum size 1 and maximum size 4.
Build and launch a configuration that defines each of these node-groups and save to a file called
eks_launch_conf.yaml.apiVersion: eksctl.io/v1alpha5 kind: ClusterConfig metadata: name: riva-cluster region: us-west-2 version: "1.17" managedNodeGroups: - name: gpu-linux-workers labels: { role: workers } instanceType: p3.2xlarge minSize: 1 maxSize: 8 volumeSize: 100 privateNetworking: true ssh: allow: true - name: cpu-linux-clients labels: { role: clients } instanceType: m5.2xlarge minSize: 1 maxSize: 4 volumeSize: 100 privateNetworking: true ssh: allow: true - name: cpu-linux-lb labels: { role: loadbalancers } instanceType: c5.24xlarge desiredCapacity: 1 volumeSize: 100 privateNetworking: true ssh: allow: true
Launch the cluster with the above configuration.
eksctl create cluster -f eks_launch_conf.yaml
As a result of this command, you should see some changes in your default Kubernetes configuration file, and the nodes should start showing up in Kubernetes. Here is how to check:
cat .kube/config kubectl get pods -A kubectl get nodes --show-labels kubectl get nodes --selector role=workers kubectl get nodes --selector role=clients kubectl get nodes --selector role=loadbalancers
After the cluster is up-and-running, launch the services.
# ngc api key setup and secrets setup, if not already set by the helm chart export NGC_API_KEY=<<NGC_API_KEY>> # install gpu operator helm repo add nvdp https://nvidia.github.io/k8s-device-plugin helm repo update helm install \ --generate-name \ --set failOnInitError=false \ nvdp/nvidia-device-plugin # install riva # be sure riva-api is a subdirectory in your CWD # $ls # riva-api helm install riva-api riva-api --set ngcCredentials.password=`echo -n $NGC_API_KEY | base64 -w0` --set modelRepoGenerator.modelDeployKey=`echo -n tlt_encode | base64 -w0` --set riva.speechServices.asr=true --set riva.speechServices.tts=true --set riva.speechServices.nlp=true cd .. # debug export pod=`kubectl get pods | cut -d " " -f 1 | grep riva` kubectl describe pod $pod # watch logs kubectl logs -f $pod -c riva-speech-api # install traefik cd traefik/ helm install traefik traefik cd .. # install client cd traefik/ kubectl apply -f deployment.yaml cd .. # ingress route apply cd traefik/ kubectl apply -f riva-ingress.yaml cd ..
Running the Benchmarks¶
After all the services are up-and-running, benchmark by stepping into the client container and send requests to the load balancer.
Here is how the services look:
$ kubectl get svc -A
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default kubernetes ClusterIP 10.100.0.1 <none> 443/TCP 53m
default riva-api ClusterIP None <none> 8000/TCP,8001/TCP,8002/TCP,50051/TCP,60051/TCP 91s
default traefik ClusterIP 10.100.182.7 <none> 80/TCP,443/TCP 68s
kube-system kube-dns ClusterIP 10.100.0.10 <none> 53/UDP,53/TCP
And here are the pods:
$ kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
default riva-riva-api-5d8f5c7dd6-vkd49 2/2 Running 0 6m33s
default ss-client-7ff77cbb76-djt5q 1/1 Running 0 6m2s
default traefik-5fb6c8bb47-mlxsg 1/1 Running 0 6m10s
kube-system aws-node-fgm52 1/1 Running 0 51m
kube-system aws-node-hbwfn 1/1 Running 0 50m
kube-system aws-node-xltx6 1/1 Running 0 51m
kube-system coredns-5946c5d67c-5w8bv 1/1 Running 0 57m
kube-system coredns-5946c5d67c-f728c 1/1 Running 0 57m
kube-system kube-proxy-hpp6p 1/1 Running 0 50m
kube-system kube-proxy-t4dvb 1/1 Running 0 51m
kube-system kube-proxy-v2ttk 1/1 Running 0 51m
kube-system nvidia-device-plugin-1611946093-vgg2f 1/1 Running 0 6m46s
kube-system nvidia-device-plugin-1611946093-w6969 1/1 Running 0 6m46s
kube-system nvidia-device-plugin-1611946093-w7sw4 1/1 Running 0 6m46s
Run the benchmarks.
# exec into the client export clnt=`kubectl get pods | cut -d " " -f 1| grep ss-client` kubectl exec --stdin --tty $clnt /bin/bash -n riva # setup fqdn inside the ss-client container with Traefik svc IP kubectl get svc -A echo '10.100.182.7 riva.nvda' >> /etc/hosts # test connectivity, exec into the client and run the following riva_streaming_asr_client --audio_file=/work/wav/vad_test_files/2094-142345-0010.wav --automatic_punctuation=false --riva_uri=riva.nvda:80 # run benchmark for i in `seq 5`; do /usr/local/bin/riva_streaming_asr_client --num_parallel_requests=512 --num_iterations=2048 --audio_file=/work/wav/test/1272-135031-0000.wav --interim_results=false --automatic_punctuation=false --print_transcripts=false --chunk_duration_ms=800 --riva_uri=riva.nvda:80; done | tee output_config1_max_throughtput
Monitor the GPU usage. Navigate into any of the
riva-apipod (riva-speechcontainer) in a separate terminal.# to monitor GPU usage kubectl exec --stdin --tty /bin/bash -c riva-speech watch -n0.1 nvidia-smi
Scaling and Deleting the Cluster¶
The cluster and services can be scaled using the following commands:
# scaling the nodegroups
eksctl scale nodegroup --name=gpu-linux-workers --cluster=riva-cluster --nodes=8 --region=us-west-2 # or use the EKS UI
# now scale the riva api
kubectl scale deployments/riva-riva-api --replicas=8
For deleting the cluster, run:
eksctl delete cluster riva-cluster --region=us-west-2