SpeechSquad#

Local Deployment#

This SpeechSquad sample application can be deployed locally. No load balancing is required. ASR, NLP, and TTS Riva services are required to be running.

Installation#

Install and start Riva locally. Refer to the Local (Docker) section.

Pull the SpeechSquad container.

docker pull nvcr.io/nvidia/jarvis/speech_squad:1.0.0-b.1

Download the dataset.

wget https://github.com/NVIDIA/speechsquad/releases/download/v1.0.0-b.1/speechsquad_sample_public_v1.tgz
tar xzf speechsquad_sample_public_v1.tgz

Running the Test Locally#

Start the Riva Speech AI server (if you have not already):

bash riva_start.sh

Start the SpeechSquad server:

docker run -it --net=host \
      nvcr.io/nvidia/jarvis/speech_squad:1.0.0-b.1 \
      speechsquad_server  \
         -tts_service_url=0.0.0.0:50051 \
         -nlp_service_url=0.0.0.0:50051 \
         -asr_service_url=0.0.0.0:50051

Run a performance test:

docker run -it --net=host \
      -v $(pwd)/speechsquad_sample_public_v1:/work/test_files/speech_squad/ \
      nvcr.io/nvidia/jarvis/speech_squad:1.0.0-b.1 \
      speechsquad_perf_client \
         --squad_questions_json=/work/test_files/speech_squad/recorded_questions.jl \
         --squad_dataset_json=/work/test_files/speech_squad/manifest.json \
         --speech_squad_uri=0.0.0.0:1337 \
         --chunk_duration_ms=800 \
         --executor_count=1 \
         --num_iterations=1 \
         --num_parallel_requests=64 \
         --print_results=false

Cloud Deployment#

This SpeechSquad sample application requires a fully set up Riva environment with working L7 load balancing and name resolution. To best validate scaling, we recommend a distinct installation for each service type. To do this, three different invocations of Helm install are needed. This requires a minimum of three GPUs on the intended system. In order to scale up each service, many more GPUs are needed.

SpeechSquad may also run against a Riva Speech AI system with all three services running on a single GPU, but do not expect optimal performance.

Assuming values.yaml has been updated for all other configuration details:

To setup an NLP only service, run:

 helm install nlp-riva-api riva-api \
    --riva.speechServices.asr=false \
    --riva.speechServices.tts=false

To setup an ASR only service, run:

helm install asr-riva-api riva-api \
    --riva.speechServices.nlp=false \
    --riva.speechServices.tts=false

To setup a TTS only service, run:

helm install tts-riva-api riva-api \
   --riva.speechServices.asr=false \
   --riva.speechServices.nlp=false``

Then, each deployment can be scaled independently using kubectl scale per normal.

Installation#

At a minimum, the SpeechSquad server container (sss) expects to be able to route to the Riva Speech AI container. For any sort of testing at scale, the name resolution needs to be functioning due to the use of Layer 7 load balancers. Many issues can be resolved by validating container routing first, then name resolution between pods.

Conceptually, a successful SpeechSquad deployment has one or more of each pod type. Each pod needs proper connectivity (routing and optionally name resolution) to the pods around it.

+---------------+     +----------------+     +---------------+
|               |     |                |     |               |
|Riva Speech AI | --- | SpeechSquad    | --- |  SpeechSquad  |
|               |     |    server      |     |     client    |
+---------------+     +----------------+     +---------------+

Along with checking that each pod is running correctly before moving on to the next, to ease troubleshooting, ensure:

Riva is running and responding to client queries before SpeechSquad is installed.
The SpeechSquad server comes up and connects to Riva (either using IP:PORT or FQDN:PORT).
The SpeechSquad client connects to sss (either using appropriate IP:PORT or FQDN:PORT).

Fetch the Helm chart that is hosted on NGC. Ensure you add authentication options, if needed.
```
helm fetch https://helm.ngc.nvidia.com/nvidia/riva/charts/speechsquad-1.0.0-b.1.tgz
```
Alternatively, the git repository can be pulled from github.com/nvidia/speechsquad.
Update the values.yaml to match the fully qualified domain names (fqdn) for this installation, if using name resolution. SpeechSquad should also have a fqdn for its server (for example, speechsquad.riva.nvda). The SpeechSquad server expects endpoints for each TTS, NLP, and ASR service. For example, by default, the values.yaml contains the following:
- nlp_uri: "riva.nvda"
- asr_uri: "riva.nvda"
- tts_uri: "riva.nvda"

If SpeechSquad is being deployed on a single system, these URIs can be replaced with the IP address.

By default, the ASR model used is defined by passing --asr_model_name="" to the SpeechSquad server. This can be controlled in the values.yaml file using the key sss.asr_model.

Note

If this model is not specified, the Riva Speech AI server tries to select a model to use from its model registry. If an invalid model is specified, the requests fail.

For load balancing, it is required that the SpeechSquad server container can resolve riva.nvda to the endpoint exposing riva-speech. The chart allows for /etc/hosts in the container to be set at launch time by setting the .Values.sss.lb_ip: value to match what riva.nvda should resolve too.

In the following example, we see that the external-ip of the load balancer is 10.42.0.190. Since we want load balancing, set the .Values.sss.lb_ip in the values.yaml file for SpeechSquad to match this ip.

$ kubectl get services
NAME           TYPE           CLUSTER-IP       EXTERNAL-IP   PORT(S)                                                        AGE
riva-api     LoadBalancer   10.97.16.15      10.42.0.191   8000:31308/TCP,8001:30646/TCP,8002:30042/TCP,50051:32669/TCP   3d19h
kubernetes     ClusterIP      10.96.0.1        <none>        443/TCP                                                        50d
traefik        LoadBalancer   10.107.200.114   10.42.0.190   80:32131/TCP,443:31802/TCP

After the values.yaml reflects the environment correctly, install the Helm chart and optionally pass any command-line options for further customization.

helm install speechsquad speechsquad

This should provide two pods in the cluster, one for the server (sss) and the other for the clnt.

kubectl get pods

NAME                           READY   STATUS    RESTARTS   AGE
clnt-ss-5945877dc7-wk9fs       1/1     Running   0          5d16h
riva-api-6947945c67-4f7gt    1/1     Running   0          7d12h
speechsquad-6974455879-spxgt   1/1     Running   0          5d13h

Validate that you have ingressroutes setup correctly.

kubectl get ingressroute

NAME                        AGE
riva-ingressroute         7d12h
speech-squad-ingressroute   5d16h
traefik-dashboard           23d

Issue describe to ensure the correct Host() clause appears. Entry Points is what port that the load balancer accepts traffic from. Here, we use web, which is port 80; the services section forwards traffic that matches the host clause to said service on the port using the specified protocol.

If using name resolution, kubectl describe ingressroute to ensure the correct Host() clause appears. Entry Points is what port that the load balancer accepts traffic from. Here, we use web, which is port 80; the services section forwards traffic that matches the host clause to said service on the port using the specified protocol.

kubectl describe ingressroute riva-ingressroute

...
    web
Routes:
  Kind:   Rule
  Match:  Host(`riva.nvda`)
  Services:
    Name:    riva-api
    Port:    50051
    Scheme:  h2c

and

kubectl describe ingressroute speech-squad-ingressroute

  Entry Points:
  web
Routes:
  Kind:   Rule
  Match:  Host(`speechsquad.riva.nvda`, `speechsquad.riva.nvda.nvidia.com`)
  Services:
    Name:    speech-squad
    Port:    1337
    Scheme:  h2c

Services can also be validated for each ingressroute by pulling the service name from the route and checking its existence.

   kubectl get service `kubectl get ingressroute speech-squad-ingressroute -o=json | jq .spec.routes[0].services[0].name -r`

Containers#

Pull the container with the command below:

docker pull nvcr.io/nvidia/jarvis/speech_squad:1.0.0-b.1

Dataset#

We provide a toy dataset with five examples that can be iterated over an arbitrary number of times for generating load. The data accompanies the official SpeechSquad GitHub Release.

sudo mount -t nfs 10.31.241.13:/mnt/tank/datasets/jarvis_speech_ci/ /mnt/nvdl/datasets/jarvis_speech_ci/
cd /work/test_files/speech_squad/gtc_squad2_asr_data_collection/
rsync -Phrl /mnt/nvdl/datasets/jarvis_speech_ci/gtc_squad2_asr_data_collection/ .

### Running the Test

You can run the client from a Docker container or by `kubectl exec` into the node in the cluster.

1. To execute the client, ensure the dataset above is in your current working directory.

   ```
   ls
   build    client           CREDITS.md  speechsquad_sample_public_v1      LICENSE    reference
   CLA.pdf  CONTRIBUTING.md  Dockerfile  speechsquad_sample_public_v1.tgz  README.md  server
   ```

2. Run the following command to perform the test from the same directory.

   ```bash

   docker run -it --net=host -v $(pwd):/work/test_files/speech_squad/ nvcr.io/nvidia/jarvis/speech_squad:1.0.0-b.1 \
      speechsquad_perf_client \
         --squad_questions_json=/work/test_files/speech_squad/speechsquad_sample_public_v1/recorded_questions.jl \
         --squad_dataset_json=/work/test_files/speech_squad/speechsquad_sample_public_v1/manifest.json \
         --speech_squad_uri=speechsquad.riva.nvda:80 \
         --chunk_duration_ms=800 --executor_count=1 \
         --num_iterations=1 --num_parallel_requests=64 \
         --print_results=false
   ```

   If using `kubectl exec`, the client node must have access to the data volume with the `speechsquad_sample_public_v1` in it.
   This is currently hardcoded to `/sss_data`, so that the `speechsquad_sample_public_v1` directory needs to live in `/sss_data`
   on the Kubernetes host, which is also running the client container pod. This is controlled in the `deployment.yaml` file under
   *Volumes* and *VolumeMounts*.

   ```bash

   export CLIENT_POD=$(kubectl get pods | grep clnt  | awk '{print $1}')
   kubectl exec --stdin --tty $CLIENT_POD -- /bin/bash
   speechsquad_perf_client \
      --squad_questions_json=/work/test_files/speech_squad/speechsquad_sample_public_v1/recorded_questions.jl \
      --squad_dataset_json=/work/test_files/speech_squad/speechsquad_sample_public_v1/manifest.json \
      --speech_squad_uri=speechsquad.riva.nvda:80 \
      --chunk_duration_ms=800 --executor_count=1 \
      --num_iterations=1 --num_parallel_requests=64 --print_results=false
   ```

## License

For applicable licenses, refer to the {ref}`license` section.

NVIDIA Riva

SpeechSquad

Contents

SpeechSquad#

Local Deployment#

Installation#

Running the Test Locally#

Cloud Deployment#

Installation#

Containers#

Dataset#