SpeechSquad

Local Deployment

This SpeechSquad sample application can be deployed locally. No load balancing required. ASR, NLP and TTS Jarvis services required to be running.

Installation

  1. Install and start Jarvis locally (see Local (Docker)).

  2. Install SpeechSquad:

docker pull nvcr.io/nvidia/jarvis/speech_squad:1.0.0-b.1
  1. Download dataset:

wget https://github.com/NVIDIA/speechsquad/releases/download/v1.0.0-b.1/speechsquad_sample_public_v1.tgz
tar xzf speechsquad_sample_public_v1.tgz

Running the Test Locally

  1. Start Jarvis Speech Server (if you have not already):

    bash jarvis_start.sh
    
  2. Start SpeechSquad Server:

    docker run -it --net=host \
        nvcr.io/nvidia/jarvis/speech_squad:1.0.0-b.1 \
        speechsquad_server  \
            -tts_service_url=0.0.0.0:50051 \
            -nlp_service_url=0.0.0.0:50051 \
            -asr_service_url=0.0.0.0:50051
    
  3. Run performance test:

    docker run -it --net=host \
        -v $(pwd)/speechsquad_sample_public_v1:/work/test_files/speech_squad/ \
        nvcr.io/nvidia/jarvis/speech_squad:1.0.0-b.1 \
        speechsquad_perf_client \
            --squad_questions_json=/work/test_files/speech_squad/recorded_questions.jl \
            --squad_dataset_json=/work/test_files/speech_squad/manifest.json \
            --speech_squad_uri=0.0.0.0:1337 \
            --chunk_duration_ms=800 \
            --executor_count=1 \
            --num_iterations=1 \
            --num_parallel_requests=64 \
            --print_results=false
    

Cloud Deployment

This SpeechSquad sample application requires a fully setup Jarvis environment with working L7 load balancing and name resolution. To best validate scaling we recommend a distinct installation for each service type. To do this, three different invocations of Helm install are needed. This will require a minimum of 3 GPU’s on the intended system . In order to scale up each service, many more GPU’s will be needed.

SpeechSquad may also be run against a Jarvis speech system with all three services running on a single GPU, but do not expect optimal performance.

Assuming values.yaml has been updated for all other configuration details:

  • To setup a NLP only service:

    helm install nlp-jarvis-api jarvis-api \
        --jarvis.speechServices.asr=false \
        --jarvis.speechServices.tts=false
    
  • To setup an ASR only service

    helm install asr-jarvis-api jarvis-api \
        --jarvis.speechServices.nlp=false \
        --jarvis.speechServices.tts=false
    
  • To setup a TTS only service

    helm install tts-jarvis-api jarvis-api \
        --jarvis.speechServices.asr=false \
        --jarvis.speechServices.nlp=false``
    

Then, each deployment can be scaled independently using kubectl scale per normal.

Installation

At a minimum, the SpeechSquad Server container (sss) expects to be able to route to the Jarvis Speech container. For any sort of testing at scale, name resolution will need to be functioning due to the use of layer 7 load balancers. Many issues can be resolved by validating container routing first, then name resolution between pods.

Conceptually, a successful SpeechSquad deployment will have one or more of each pod type. Each pod needs proper connectivity (routing and optionally name resolution) to the pods around it.

+---------------+     +----------------+     +---------------+
|               |     |                |     |               |
| Jarvis Speech | --- | SpeechSquad    | --- |  SpeechSquad  |
|               |     |    Server      |     |     Client    |
+---------------+     +----------------+     +---------------+

Ensuring that each pod is running correctly before moving on to the next, will ease troubleshooting.

  1. Jarvis should be running and responding to client queries before SpeechSquad is installed.

  2. SpeechSquad server should come up and connect to Jarvis (either using IP:PORT or FQDN:PORT).

  3. SpeechSquad client connects to sss (either using appropriate IP:PORT or FQDN:PORT).

  1. Fetch the Helm chart that is hosted on NGC. Ensure you add authentication options, if needed.

    helm fetch https://helm.ngc.nvidia.com/nvidia/jarvis/charts/speechsquad-1.0.0-b.1.tgz
    

    Alternatively, the git repository can be pulled from github.com/nvidia/speechsquad.

  2. Update the values.yaml to match the fully qualified domain names (fqdn)’s for this installation, if using name resolution. SpeechSquad should also have a fqdn for its server (for example, speechsquad.jarvis.nvda). The SpeechSquad server expects endpoints for each TTS, NLP, and ASR service. For example, by default, the values.yaml containts the following:

    • nlp_uri: "jarvis.nvda"

    • asr_uri: "jarvis.nvda"

    • tts_uri: "jarvis.nvda"

If SpeechSquad is being deployed on a single system, these URI’s can be repalced with the IP address.

For load balancing, it is required that the SpeechSquad server container can resolve jarvis.nvda to the endpoint exposing jarvis-speech. The chart allows for /etc/hosts in the container to be set at launch time by setting the .Values.sss.lb_ip: value to match what jarvis.nvda should resolve too.

In the following example, we see that the external-ip of the load balancer is 10.42.0.190. Since we want load balancing set the .Values.sss.lb_ip in the values.yaml for SpeechSquad to match this ip.

$ kubectl get services
NAME           TYPE           CLUSTER-IP       EXTERNAL-IP   PORT(S)                                                        AGE
jarvis-api     LoadBalancer   10.97.16.15      10.42.0.191   8000:31308/TCP,8001:30646/TCP,8002:30042/TCP,50051:32669/TCP   3d19h
kubernetes     ClusterIP      10.96.0.1        <none>        443/TCP                                                        50d
traefik        LoadBalancer   10.107.200.114   10.42.0.190   80:32131/TCP,443:31802/TCP
  1. After the values.yaml reflects the environment correctly, install the Helm chart and optionally pass any command line options for further customization.

    helm install speechsquad speechsquad
    

    This should provide two pods in the cluster, one for the server (SSS) and the other for the clnt.

    $ kubectl get pods
    
    NAME                           READY   STATUS    RESTARTS   AGE
    clnt-ss-5945877dc7-wk9fs       1/1     Running   0          5d16h
    jarvis-api-6947945c67-4f7gt    1/1     Running   0          7d12h
    speechsquad-6974455879-spxgt   1/1     Running   0          5d13h
    
  2. Validate you have ingressroutes setup correctly.

    $ kubectl get ingressroute
    NAME                        AGE
    jarvis-ingressroute         7d12h
    speech-squad-ingressroute   5d16h
    traefik-dashboard           23d
    
  3. Issue describe to ensure the correct Host() clause appears. Entry Points is what port the loadbalancer will accept traffic from. Here we use web which is port 80; the services section will forward traffic that matches the host clause to said service on the port using the specificed protocol.

  4. If using name resolution, kubectl describe ingressroute to ensure the correct Host() clause appears. Entry Points is what port the loadbalancer will accept traffic from. Here we use web which is port 80; the services section will forward traffic that matches the host clause to said service on the port using the specificed protocol.

    $ kubectl describe ingressroute jarvis-ingressroute
    ...
        web
    Routes:
      Kind:   Rule
      Match:  Host(`jarvis.nvda`)
      Services:
        Name:    jarvis-api
        Port:    50051
        Scheme:  h2c
    

    and

    $ kubectl describe ingressroute speech-squad-ingressroute
    
      Entry Points:
      web
    Routes:
      Kind:   Rule
      Match:  Host(`speechsquad.jarvis.nvda`, `speechsquad.jarvis.nvda.nvidia.com`)
      Services:
        Name:    speech-squad
        Port:    1337
        Scheme:  h2c
    
  5. Services can also be validated for each ingressroute by pulling the service name from the route and checking its existance.

    ..prompt

    kubectl get service `kubectl get ingressroute speech-squad-ingressroute -o=json | jq .spec.routes[0].services[0].name -r`
    

Containers

Pull the container with the command below

docker pull nvcr.io/|NgcOrgTeam|/speech_squad:1.0.0-b.1

Dataset

We provide a toy dataset with five examples that can be iterated over an arbitrary number of times for generating load. The data accompanies the official SpeechSquad GitHub Release.

Running the Test

You can run the client from a Docker container or by kubectl exec into the node in the cluster.

  1. To execute the client, ensure the dataset above is in your current working directory.

    ls
    build    client           CREDITS.md  speechsquad_sample_public_v1      LICENSE    reference
    CLA.pdf  CONTRIBUTING.md  Dockerfile  speechsquad_sample_public_v1.tgz  README.md  server
    
  2. Run the following command to perform the test from the same directory.

    docker run -it --net=host -v $(pwd):/work/test_files/speech_squad/ nvcr.io/nvidia/jarvis/speech_squad:1.0.0-b.1 \
        speechsquad_perf_client \
          --squad_questions_json=/work/test_files/speech_squad/speechsquad_sample_public_v1/recorded_questions.jl \
          --squad_dataset_json=/work/test_files/speech_squad/speechsquad_sample_public_v1/manifest.json \
          --speech_squad_uri=speechsquad.jarvis.nvda:80 \
          --chunk_duration_ms=800 --executor_count=1 \
          --num_iterations=1 --num_parallel_requests=64 \
          --print_results=false
    

    If using kubectl exec, the client node must have access to the data volume with the speechsquad_sample_public_v1 in it. This is currently hardcoded to /sss_data, so that the speechsquad_sample_public_v1 directory needs to live in /sss_data on the Kubernetes host which is also runing the client container pod. This is controlled in the deployment.yaml file under Volumes and VolumeMounts.

    export CLNT_POD=$(kubectl get pods | grep clnt  | awk '{print $1}')
    kubectl exec --stdin --tty $CLNT_POD -- /bin/bash
    speechsquad_perf_client \
       --squad_questions_json=/work/test_files/speech_squad/speechsquad_sample_public_v1/recorded_questions.jl \
       --squad_dataset_json=/work/test_files/speech_squad/speechsquad_sample_public_v1/manifest.json \
       --speech_squad_uri=speechsquad.jarvis.nvda:80 \
       --chunk_duration_ms=800 --executor_count=1 \
       --num_iterations=1 --num_parallel_requests=64 --print_results=false