Command-line Clients#

Data Center

The Riva Speech Docker image contains sample command-line drivers for the Riva Speech AI services. Pull and run the container by using the following commands. The client expects that a Riva server is running with models deployed, and all command-line drivers accept an optional argument to specify the location of the server. No GPU is required to run the sample clients.

docker pull nvcr.io/nvidia/riva/riva-speech:2.19.0
docker run -it --rm nvcr.io/nvidia/riva/riva-speech:2.19.0

Embedded

The sample command-line clients are present in the Riva Speech AI server container. Refer to the Quick Start Guide for steps on how to launch the Riva Speech AI server container.

Speech Recognition#

Both binary and Python clients are included in the Docker image with options to perform ASR inference in streaming as well as offline (nonstreaming) mode.

Binary Streaming Example#

Run the ASR streaming client, where --audio_file specifies the audio file that is to be transcribed. Other options can be found in riva_streaming_asr_client --help.

Note

.wav, .opus, and .ogg (Opus-encoded) containers are currently supported.

riva_streaming_asr_client --audio_file /work/wav/test/1272-135031-0001.wav

Binary Offline/Batch (nonstreaming) Example#

Run the binary ASR offline client, where --audio_file specifies the audio file that is to be transcribed. Other options can be found in riva_asr_client --help.

riva_asr_client --audio_file /work/wav/test/1272-135031-0001.wav

Python Streaming Example#

Run the Python ASR streaming client, where --input-file specifies the audio file that is to be transcribed. Other options can be found in the riva_streaming_asr_client.py script.

python riva_streaming_asr_client.py --input-file=/work/wav/test/1272-135031-0001.wav

The transcribe_mic.py example is the preferred, cross-platform way to interact with Riva. Additionally, using a microphone from an arbitrary remote system running on any operating system and hardware platform is recommended.

python transcribe_mic.py --input-device <device_id>

Where the <device_id> of your audio input device can be obtained from:

python transcribe_mic.py --list-devices

Similarly, the transcribe_file.py example transcribes audio files in streaming mode.

Python Offline Example#

Run the Python ASR offline client, where --input-file specifies the audio file that is to be transcribed. Other options can be found in the transcribe_file_offline.py script.

python transcribe_file_offline.py --input-file=/work/wav/test/1272-135031-0001.wav

HTTP Client Offline Example#

Run the following command to transcribe an audio file with the offline ASR model using the HTTP API, where file specifies the audio file that is to be transcribed, and language specifies the language code of input audio file.

curl -s http://localhost:50000/v1/audio/transcriptions -F file="@/work/wav/test/1272-135031-0000.wav" -F language=en-US

Binary Offline Speaker Diarization Example#

Run the binary ASR offline client, where --audio_file specifies the audio file to be diarized and --speaker-diarization is the flag to enable diarization.

riva_asr_client --audio_file /opt/riva/wav/en-US_sample.wav --speaker-diarization

Python Offline Speaker Diarization Example#

Run the Python ASR offline client, where --input-file specifies the audio file to be diarized and --speaker-diarization is the flag to enable diarization.

python transcribe_file_offline.py --input-file /opt/riva/wav/en-US_sample.wav --speaker-diarization

Binary Streaming Speaker Diarization Example#

Run the binary ASR streaming client, where --audio_file specifies the audio file to be diarized and --speaker-diarization is the flag to enable diarization.

riva_streaming_asr_client --audio_file /opt/riva/wav/en-US_sample.wav --speaker-diarization

Python Streaming Speaker Diarization Example#

Run the Python ASR streaming client, where --input-file specifies the audio file to be diarized and --speaker-diarization is the flag to enable diarization.

python riva_streaming_asr_client.py --input-file /opt/riva/wav/en-US_sample.wav --speaker-diarization

Natural Language Processing#

Both binary and Python clients are included in the Docker image for supported NLP tasks.

Binary Punctuation Example#

Run the Punctuation client, where --queries specifies the file containing inputs to punctuate. Note that if you want to specify --output to write output of the punctuation model to a file, you need to also specify --parallel_requests=1.

riva_nlp_punct --queries=/work/test_files/nlu/punctuation_input.txt

Python Punctuation Example#

Run the Python Punctuation client, where --query specifies the query string.

python punctuation_client.py --query "your query"

Speech Synthesis#

Binary clients are supported in the Docker image for TTS.

Binary TTS Client Example#

Run the binary TTS client, where --text is used to specify the input text for which to synthesize audio. The output of the client is a .wav file, which can be specified by --audio_file. Other options can be found in riva_tts_client --help.

riva_tts_client --text="I had a dream yesterday." --audio_file=/opt/riva/wav/output.wav

Binary TTS Performance Client Example#

Run the binary TTS performance client, which provides information about latency and throughput. Options --text specifies the input text and --text_file specifies the file containing multiple text inputs. Other options can be found in riva_tts_perf_client --help.

riva_tts_perf_client --text_file=/work/test_files/tts/ljs_audio_text_test_filelist_small.txt

Python Client Examples#

The talk.py script is an implementation of a client for performing offline and streaming (with audio being streamed back to the client in chunks) inference.

python talk.py --stream --output-device <device_id>

where the <device_id> of your audio output device can be obtained from:

python talk.py --list-devices

Machine Translation#

Both binary and python clients are supported in the Docker image for neural machine translation (NMT).

Binary Text Translation Example#

Retrieve the available models and language pairs

riva_nmt_t2t_client --list_models
languages {
  key: "en_de_24x6"
  value {
    src_lang: "en"
    tgt_lang: "de"
  }
}

Run the binary client

riva_nmt_t2t_client --model_name=en_de_24x6 --source_language_code="en-US" --target_language_code="de-DE" --text="This will become German words." --riva_uri=0.0.0.0:50051

Where:

--source_language_code is the source language code
--target_language_code is the target language code
--text is the text you want translated
--riva_uri is the IP address of the server
--model_name is the name of the model

To translate a .txt file, ensure every sentence forms a new line, then run:

riva_nmt_t2t_client --source_language_code es-US --target_language_code en-US --text_file /raid/wmt_tests/wmt13-es-en.es --riva_uri=0.0.0.0:50051 --model_name mnmt_deesfr_en_transformer12x2 --batch_size=8

Where:

--text_file is the path to the file that you want translated
batch_size is the size of the batch. The default is 8.

Python Client Examples#

A provided Python client implements the gRPC API.

For example, to translate the sentence Please translate this English to German. into German, run:

Retrieve the available models and language pairs

python3 /opt/riva/examples/nmt.py --list-models
languages {
  key: "en_de_24x6"
  value {
    src_lang: "en"
    tgt_lang: "de"
  }
}

Run the python client

python3 /opt/riva/examples/nmt.py --source-language-code=en-US --target-language-code=de-DE --text="Please translate this English to German." --server=0.0.0.0:50051 --model-name=en_de_24x6

Where:

--source-language-code is the source language code
--target-language-code is the target language code
--text is the text you want translated
--server is the IP address of the server
--model-name is the name of the model

To translate a .txt file, ensure every sentence forms a new line, then run:

python3 /opt/riva/examples/nmt.py --source-language-code es-US --target-language-code en-US --text-file /raid/wmt_tests/wmt13-es-en.es --server=0.0.0.0:50051 --model-name mnmt_deesfr_en_transformer12x2 --batch-size=8

Where:

--text-file is the path to the file that you want translated
batch-size is the size of the batch. The default is 8.

Speech-to-Speech Translation#

The binary client supports the Speech-to-Speech (S2S) service in Docker.

Binary Speech-to-Speech Translation Client Example#

Run the binary client.

riva_nmt_streaming_s2s_client --riva_uri=0.0.0.0:50051 --audio_file=/opt/riva/wav/es-US_sample.wav --source_language_code="es-US" --target_language_code="en-US"

Where:

--audio_file is the input audio file
--source_language_code is the source language
--target_language_code is the target language
--riva_uri is the IP address of the server

Other parameter options can be found in riva_nmt_streaming_s2s_client --help.

Speech-to-Text Translation#

The binary client supports the Speech-to-Translate (S2T) service in Docker.

Binary Speech-to-Text Translation Client Example#

Run the binary client.

riva_nmt_streaming_s2t_client --riva_uri=0.0.0.0:50051 --audio_file=/opt/riva/wav/en-US_sample.wav --source_language_code="en-US" --target_language_code="de-DE"

Where:

--audio_file is the input audio file
--source_language_code is the source language
--target_language_code is the target language
--riva_uri is the IP address of the server

Other parameter options can be found in riva_nmt_streaming_s2t_client --help.

NVIDIA Riva

Command-line Clients

Contents

Command-line Clients#

Speech Recognition#

Binary Streaming Example#

Binary Offline/Batch (nonstreaming) Example#

Python Streaming Example#

Python Offline Example#

HTTP Client Offline Example#

Binary Offline Speaker Diarization Example#

Python Offline Speaker Diarization Example#

Binary Streaming Speaker Diarization Example#

Python Streaming Speaker Diarization Example#

Natural Language Processing#

Binary Punctuation Example#

Python Punctuation Example#

Speech Synthesis#

Binary TTS Client Example#

Binary TTS Performance Client Example#

Python Client Examples#

Machine Translation#

Binary Text Translation Example#

Python Client Examples#

Speech-to-Speech Translation#

Binary Speech-to-Speech Translation Client Example#

Speech-to-Text Translation#

Binary Speech-to-Text Translation Client Example#