Kubernetes (Amazon EKS)

This section is not supported for embedded platforms.

This is a sample for deploying and scaling Riva ASR Service on Amazon Web Services (AWS) Elastic Kubernetes Service (EKS) with Traefik-based load balancing. It includes the following steps:

  1. Downloading and modifying the Riva API Helm Chart to add a node selector and to be a headless service.

  2. Downloading and modifying the Traefik Helm Chart to add a node selector and expose it on the cluster-internal IP.

  3. Defining the client service and ingress route.

  4. Defining cluster config and launching the cluster via eksctl.

  5. Benchmarking

  6. Scaling the cluster and the Riva ASR Service.

This sample assumes that:

  • The system has access to Riva via NGC (check via the Docker login nvcr.io).

  • The system has pre-installed eksctl, helm and kubectl.

This sample has been tested on: eksctl (0.35.0), helm (v3.4.1), kubectl (v1.17.0), and traefik (2.2.8, API version v2).

Note

Each step can have validation, debug, cleanup, and monitoring steps associated in case you’d like to re-do that step.

Downloading and Modifying the Riva API Helm Chart

  1. Download and untar the Riva API Helm Chart. Replace VERSION_TAG with the specific version needed.

    export NGC_API_KEY=<<replace with your NGC_API_KEY>>
    export VERSION_TAG="2.0.0"
    helm fetch https://helm.ngc.nvidia.com/nvidia/riva/charts/riva-api-${VERSION_TAG}.tgz --username='$oauthtoken' --password=$NGC_API_KEY
    tar -xvzf riva-api-${VERSION_TAG}.tgz
    
  2. Within the riva-api folder, modify the following files:

    1. values.yaml

      • Set all services to false except the ASR service (TTS, NLP).

      • For ngcModelConfigs.asr, select the model to use for ASR; optionally comment out the rest.

    2. templates/deployment.yaml

      • Within spec.template.spec add:

        nodeSelector:
          eks.amazonaws.com/nodegroup: gpu-linux-workers
        
        tolerations:
          - key: gpu-type
            operator: Equal
            value: v100
            effect: NoSchedule
        

      This tells the service to deploy on a node-group named gpu-linux-workers and also restricts it to V100 GPU type.

    3. templates/service.yaml

      • Within spec, replace type: {{ .Values.service.type }} with clusterIP: None.

      • You have now made this a headless service (overriding the .Values.service.type originally set to be LoadBalancer in values.yaml).

Downloading and Modifying the Traefik Helm Chart

  1. Download and untar the Traefik Helm Chart.

    helm repo add traefik https://helm.traefik.io/traefik
    helm repo update
    helm fetch traefik/traefik
    tar -zxvf traefik-*.tgz
    
  2. Within the traefik folder, modify the following files:

    1. values.yaml

      • Change service.type from LoadBalancer to ClusterIP. This exposes the service on a cluster-internal IP.

      • Set nodeSelector to { eks.amazonaws.com/nodegroup: cpu-linux-lb }. Similar to what you did for the Riva API Service, this tells the Traefik Service to run on the cpu-linux-lb node-group.

Defining the Client Service and Ingress Route

  1. Pull the riva-api-client container from Riva NGC.

  2. Service to deploy on the cpu-linux-client node-group. The client deployment.yaml looks like the following:

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: ss-client
      labels:
        app: "rivaasrclient"
      namespace: riva
    spec:
      replicas: 1
      selector:
        matchLabels:
        app: "rivaasrclient"
    template:
        metadata:
        labels:
            app: "rivaasrclient"
        spec:
        nodeSelector:
            eks.amazonaws.com/nodegroup: cpu-linux-clients
        imagePullSecrets:
            - name: imagepullsecret
        containers:
            - name: riva-client
              image: "nvcr.io/nvidia/riva/riva-speech-client:2.0.0"
              command: ["/bin/bash"]
              args: ["-c", "while true; do sleep 5; done"]
    

    With all the individual services ready to go, you need to define an ingress route that enables the Traefik load balancer to balance the incoming requests across multiple riva-api services. The code below shows how riva-ingress.yaml is defined. This relies on a DNS entry matching the Host clause. Currently, this looks for riva.nvda. Replace or add a DNS resolution suitable for the deployment environment.

    apiVersion: traefik.containo.us/v1alpha1
    kind: IngressRoute
    metadata:
      name: riva-ingressroute
      namespace: riva
    spec:
      entryPoints:
        - web
      routes:
        - match: Host(`riva.nvda`)
          kind: Rule
          services:
            - name: riva-riva-api
              port: 50051
              scheme: h2c
    

Defining and Launching the EKS Cluster

As discussed in previous sections, there are three node-groups, cpu-linux-client, cpu-linux-lb and gpu-linux-workers.

  1. Set each of these node-groups in the cluster.

    • cpu-linux-client Use m5.2xlarge (general purpose) instances with minimum size 1 and maximum size 4.

    • cpu-linux-lb Use one c5.24xlarge (compute intensive) instance.

    • gpu-linux-workers Use p3.2xlarge (single V100 GPU) with minimum size 1 and maximum size 4.

  2. Build and launch a configuration that defines each of these node-groups and save to a file called eks_launch_conf.yaml.

    apiVersion: eksctl.io/v1alpha5
    kind: ClusterConfig
    
    metadata:
      name: riva-cluster
      region: us-west-2
      version: "1.17"
    
    managedNodeGroups:
      - name: gpu-linux-workers
        labels: { role: workers }
        instanceType: p3.2xlarge
        minSize: 1
        maxSize: 8
        volumeSize: 100
        privateNetworking: true
        ssh:
          allow: true
      - name: cpu-linux-clients
        labels: { role: clients }
        instanceType: m5.2xlarge
        minSize: 1
        maxSize: 4
        volumeSize: 100
        privateNetworking: true
        ssh:
          allow: true
      - name: cpu-linux-lb
        labels: { role: loadbalancers }
        instanceType: c5.24xlarge
        desiredCapacity: 1
        volumeSize: 100
        privateNetworking: true
        ssh:
          allow: true
    
  3. Launch the cluster with the above configuration.

    eksctl create cluster -f eks_launch_conf.yaml
    

    As a result of this command, you should see some changes in your default Kubernetes configuration file, and the nodes should start showing up in Kubernetes. Here is how to check:

    cat .kube/config
    kubectl get pods -A
    kubectl get nodes --show-labels
    kubectl get nodes --selector role=workers
    kubectl get nodes --selector role=clients
    kubectl get nodes --selector role=loadbalancers
    
  4. After the cluster is up-and-running, launch the services.

    # ngc api key setup and secrets setup, if not already set by the helm chart
    export NGC_API_KEY=<<NGC_API_KEY>>
    
    # install gpu operator
    helm repo add nvdp https://nvidia.github.io/k8s-device-plugin
    helm repo update
    helm install \
        --generate-name \
        --set failOnInitError=false \
        nvdp/nvidia-device-plugin
    
    # install riva
    # be sure riva-api is a subdirectory in your CWD
    # $ls
    # riva-api
    
    helm install riva-api riva-api --set ngcCredentials.password=`echo -n $NGC_API_KEY | base64 -w0` --set modelRepoGenerator.modelDeployKey=`echo -n tlt_encode | base64 -w0` --set riva.speechServices.asr=true --set riva.speechServices.tts=true --set riva.speechServices.nlp=true
    cd ..
    
    # debug
    export pod=`kubectl get pods | cut -d " " -f 1 | grep riva`
    kubectl describe pod $pod
    
    # watch logs
    kubectl logs -f $pod -c riva-speech-api
    
    # install traefik
    cd traefik/
    helm install traefik traefik 
    cd ..
    
    # install client
    cd traefik/
    kubectl apply -f deployment.yaml 
    cd ..
    
    # ingress route apply
    cd traefik/
    kubectl apply -f riva-ingress.yaml
    cd ..
    

Running the Benchmarks

After all the services are up-and-running, benchmark by stepping into the client container and send requests to the load balancer.

Here is how the services look:

$ kubectl get svc -A
NAMESPACE     NAME                TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)                                          AGE
default       kubernetes          ClusterIP   10.100.0.1     <none>        443/TCP                                          53m
default        riva-api   ClusterIP   None           <none>        8000/TCP,8001/TCP,8002/TCP,50051/TCP,60051/TCP   91s
default        traefik             ClusterIP   10.100.182.7   <none>        80/TCP,443/TCP                                   68s
kube-system   kube-dns            ClusterIP   10.100.0.10    <none>        53/UDP,53/TCP

And here are the pods:

$ kubectl get pods -A
NAMESPACE     NAME                                    READY   STATUS    RESTARTS   AGE
default        riva-riva-api-5d8f5c7dd6-vkd49      2/2     Running   0          6m33s
default        ss-client-7ff77cbb76-djt5q              1/1     Running   0          6m2s
default        traefik-5fb6c8bb47-mlxsg                1/1     Running   0          6m10s
kube-system   aws-node-fgm52                          1/1     Running   0          51m
kube-system   aws-node-hbwfn                          1/1     Running   0          50m
kube-system   aws-node-xltx6                          1/1     Running   0          51m
kube-system   coredns-5946c5d67c-5w8bv                1/1     Running   0          57m
kube-system   coredns-5946c5d67c-f728c                1/1     Running   0          57m
kube-system   kube-proxy-hpp6p                        1/1     Running   0          50m
kube-system   kube-proxy-t4dvb                        1/1     Running   0          51m
kube-system   kube-proxy-v2ttk                        1/1     Running   0          51m
kube-system   nvidia-device-plugin-1611946093-vgg2f   1/1     Running   0          6m46s
kube-system   nvidia-device-plugin-1611946093-w6969   1/1     Running   0          6m46s
kube-system   nvidia-device-plugin-1611946093-w7sw4   1/1     Running   0          6m46s
  1. Run the benchmarks.

    # exec into the client
    export clnt=`kubectl get pods | cut -d " " -f 1| grep ss-client`
    kubectl exec --stdin --tty $clnt /bin/bash -n riva
    
    # setup fqdn inside the ss-client container with Traefik svc IP
    kubectl get svc -A
    echo '10.100.182.7 riva.nvda' >> /etc/hosts
    
    # test connectivity, exec into the client and run the following
    riva_streaming_asr_client --audio_file=/work/wav/vad_test_files/2094-142345-0010.wav --automatic_punctuation=false --riva_uri=riva.nvda:80
    
    # run benchmark
    for i in `seq 5`; do /usr/local/bin/riva_streaming_asr_client --num_parallel_requests=512 --num_iterations=2048 --audio_file=/work/wav/test/1272-135031-0000.wav --interim_results=false --automatic_punctuation=false --print_transcripts=false --chunk_duration_ms=800 --riva_uri=riva.nvda:80; done | tee output_config1_max_throughtput
    
  2. Monitor the GPU usage. Navigate into any of the riva-api pod (riva-speech container) in a separate terminal.

    # to monitor GPU usage
    kubectl exec --stdin --tty  /bin/bash -c riva-speech
    
    watch -n0.1 nvidia-smi
    

Scaling and Deleting the Cluster

The cluster and services can be scaled using the following commands:

# scaling the nodegroups
eksctl scale nodegroup --name=gpu-linux-workers --cluster=riva-cluster --nodes=8 --region=us-west-2 # or use the EKS UI

# now scale the riva api
kubectl scale deployments/riva-riva-api --replicas=8 

For deleting the cluster, run:

eksctl delete cluster riva-cluster --region=us-west-2