SpeechSquad¶
Cloud Deployment¶
This SpeechSquad sample application requires a fully setup Jarvis environment with working L7 load balancing and name resolution. To best validate scaling we recommend a distinct installation for each service type. To do this, three different invocations of Helm install are needed. This will require a minimum of 3 GPU’s on the intended system . In order to scale up each service, many more GPU’s will be needed.
SpeechSquad may also be run against a Jarvis speech system with all three services running on a single GPU, but do not expect optimal performance.
Assuming values.yaml
has been updated for all other configuration details:
To setup a NLP only service:
helm install nlp-jarvis-api jarvis-api \ --jarvis.speechServices.asr=false \ --jarvis.speechServices.tts=false
To setup an ASR only service
helm install asr-jarvis-api jarvis-api \ --jarvis.speechServices.nlp=false \ --jarvis.speechServices.tts=false
To setup a TTS only service
helm install tts-jarvis-api jarvis-api \ --jarvis.speechServices.asr=false \ --jarvis.speechServices.nlp=false``
Then, each deployment can be scaled independently using kubectl
scale per normal.
Installation¶
At a minimum, the SpeechSquad Server container (sss) expects to be able to route to the Jarvis Speech container. For any sort of testing at scale, name resolution will need to be functioning due to the use of layer 7 load balancers. Many issues can be resolved by validating container routing first, then name resolution between pods.
Conceptually, a successful SpeechSquad deployment will have one or more of each pod type. Each pod needs proper connectivity (routing and optionally name resolution) to the pods around it.
+---------------+ +----------------+ +---------------+
| | | | | |
| Jarvis Speech | --- | SpeechSquad | --- | SpeechSquad |
| | | Server | | Client |
+---------------+ +----------------+ +---------------+
- Ensuring that each pod is running correctly before moving on to the next, will ease troubleshooting.
Jarvis should be running and responding to client queries before SpeechSquad is installed.
SpeechSquad server should come up and connect to Jarvis (either using IP:PORT or FQDN:PORT).
SpeechSquad client connects to sss (either using approriate IP:PORT or FQDN:PORT)
Fetch the Helm chart that is hosted on NGC. Ensure you add authentication options, if needed.
helm fetch https://helm.ngc.nvidia.com/nvidia/jarvis/charts/speechsquad-1.0.0-b.1.tgz
Alternatively, the git repository can be pulled from github.com/nvidia/speechsquad.
Update the
values.yaml
to match the fully qualified domain names (fqdn)’s for this installation, if using name resolution. SpeechSquad should also have afqdn
for its server (for example,speechsquad.jarvis.nvda
). The SpeechSquad server expects endpoints for each TTS, NLP, and ASR service. For example, by default, thevalues.yaml
containts the following:nlp_uri: "jarvis.nvda"
asr_uri: "jarvis.nvda"
tts_uri: "jarvis.nvda"
If SpeechSquad is being deployed on a single system, these URI’s can be repalced with the IP address.
For load balancing, it is required that the SpeechSquad server container can resolve jarvis.nvda
to the endpoint exposing jarvis-speech
. The chart allows for /etc/hosts
in the container to be set at launch time by setting the .Values.sss.lb_ip:
value to match what jarvis.nvda
should resolve too.
In the following example, we see that the external-ip of the load balancer is 10.42.0.190. Since we want load balancing set the .Values.sss.lb_ip
in the values.yaml
for SpeechSquad to match this ip
.
kubectl get services NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE jarvis-api LoadBalancer 10.97.16.15 10.42.0.191 8000:31308/TCP,8001:30646/TCP,8002:30042/TCP,50051:32669/TCP 3d19h kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 50d traefik LoadBalancer 10.107.200.114 10.42.0.190 80:32131/TCP,443:31802/TCP
After the
values.yaml
reflects the environment correctly, install the Helm chart and optionally pass any command line options for further customization.helm install speechsquad speechsquad
This should provide two pods in the cluster, one for the server (SSS) and the other for the
clnt
.kubectl get pods NAME READY STATUS RESTARTS AGE clnt-ss-5945877dc7-wk9fs 1/1 Running 0 5d16h jarvis-api-6947945c67-4f7gt 1/1 Running 0 7d12h speechsquad-6974455879-spxgt 1/1 Running 0 5d13h
Validate you have
ingressroutes
setup correctly.kubectl get ingressroute NAME AGE jarvis-ingressroute 7d12h speech-squad-ingressroute 5d16h traefik-dashboard 23d
Issue
describe
to ensure the correctHost()
clause appears.Entry Points
is what port the loadbalancer will accept traffic from. Here we useweb
which is port 80; the services section will forward traffic that matches the host clause to said service on the port using the specificed protocol.If using name resolution,
kubectl describe ingressroute
to ensure the correctHost()
clause appears.Entry Points
is what port the loadbalancer will accept traffic from. Here we useweb
which is port 80; the services section will forward traffic that matches the host clause to said service on the port using the specificed protocol.kubectl describe ingressroute jarvis-ingressroute ... web Routes: Kind: Rule Match: Host(`jarvis.nvda`) Services: Name: jarvis-api Port: 50051 Scheme: h2c
and
kubectl describe ingressroute speech-squad-ingressroute Entry Points: web Routes: Kind: Rule Match: Host(`speechsquad.jarvis.nvda`, `speechsquad.jarvis.nvda.nvidia.com`) Services: Name: speech-squad Port: 1337 Scheme: h2c
Services can also be validated for each
ingressroute
by pulling the service name from the route and checking its existance.kubectl get service `kubectl get ingressroute speech-squad-ingressroute -o=json | jq .spec.routes[0].services[0].name -r`
Containers¶
Pull the container with the command below
docker pull nvcr.io/|NgcOrgTeam|/speech_squad:|VersionNum|
Dataset¶
TODO SAMPLE Dataset
Running the Test¶
You can run the client from a Docker container or by kubectl exec
into the node in the cluster.
To execute the client, ensure the dataset above is in your current working directory.
ls build client CREDITS.md gtc_squad2_asr_data_collection LICENSE reference CLA.pdf CONTRIBUTING.md Dockerfile gtc_squad2_asr_data_collection_16000.tgz README.md server
Run the following command to perform the test from the same directory.
docker run -it --net=host -v $(pwd):/work/test_files/speech_squad/ nvcr.io/|NgcOrgTeam|/speech_squad:|VersionNum| speechsquad_perf_client --squad_questions_json=/work/test_files/speech_squad/gtc_squad2_asr_data_collection/recorded_questions.json --squad_dataset_json=/work/test_files/speech_squad/gtc_squad2_asr_data_collection/dev-v2.0.json --speech_squad_uri=speechsquad.jarvis.nvda:80 --chunk_duration_ms=800 --executor_count=1 --num_iterations=1 --num_parallel_requests=64 --print_results=false
If using
kubectl exec
, the client node must have access to the data volume with thegtc_squad2_asr_data_collection
in it. This is currently hardcoded to/sss_data
, so that thegtc_squad2_asr_data_collection
directory needs to live in/sss_data
on the Kubernetes host which is also runing the client container pod. This is controlled in thedeployment.yaml
file under Volumes and VolumeMounts.export CLNT_POD=$(kubectl get pods | grep clnt | awk '{print $1}') kubectl exec --stdin --tty $CLNT_POD -- /bin/bash root@clnt-ss-66c6d657b-9v5kw:/# ls /work/test_files/speech_squad gtc_squad2_asr_data_collection speechsquad_perf_client --squad_questions_json=/work/test_files/speech_squad/gtc_squad2_asr_data_collection/recorded_questions.json --squad_dataset_json=/work/test_files/speech_squad/gtc_squad2_asr_data_collection/dev-v2.0.json --speech_squad_uri=speechsquad.jarvis.nvda:80 --chunk_duration_ms=800 --executor_count=1 --num_iterations=1 --num_parallel_requests=64 --print_results=false
Local Deployment¶
This SpeechSquad sample application can be deployed locally. No load balancing required. ASR, NLP and TTS Jarvis services required to be running.
Installation¶
Install and start Jarvis locally (see Local (Docker)).
Install SpeechSquad:
docker pull nvcr.io/nvidia/jarvis/speech_squad:1.0.0-b.1
Download dataset:
TODO - final dataset location
Running the Test Locally¶
Start Jarvis Speech Server (if you have not already):
bash jarvis_start.sh
Start SpeechSquad Server:
docker run -it --net=host \ -v /work/test_files/speech_squad/:/work/test_files/speech_squad/ \ nvcr.io/nvidia/jarvis/speech_squad:1.0.0-b.1 \ speechsquad_server \ -tts_service_url=0.0.0.0:50051 \ -nlp_service_url=0.0.0.0:50051 \ -asr_service_url=0.0.0.0:50051
Run performance test:
docker run -it --net=host \ -v /work/test_files/speech_squad/:/work/test_files/speech_squad/ \ nvcr.io/nvidia/jarvis/speech_squad:1.0.0-b.1 \ speechsquad_perf_client \ --squad_questions_json=/work/test_files/speech_squad/gtc_squad2_asr_data_collection/recorded_questions.json \ --squad_dataset_json=/work/test_files/speech_squad/gtc_squad2_asr_data_collection/dev-v2.0.json \ --speech_squad_uri=0.0.0.0:1337 \ --chunk_duration_ms=800 \ --executor_count=1 \ --num_iterations=1 \ --num_parallel_requests=64 \ --print_results=false