Nemotron ASR Streaming#
Nemotron ASR Streaming supports streaming speech-to-text transcription. Two model types are available:
type=en-US: Optimized for English (US) only.type=multi: Multilingual model that identifies the spoken language and provides the transcript in the corresponding language. See Supported Languages for the full list.
For client installation and sample audio instructions, refer to the Deploy and Run ASR Models page.
Note
The following examples set NIM_TAGS_SELECTOR with name= and type= and omit batch_size. When you omit batch_size, the NIM defaults to batch_size=64. For the full set of profiles (type=en-US,batch_size=32, type=en-US,batch_size=64, type=multi,batch_size=32, type=multi,batch_size=64), refer to the ASR support matrix.
Deploy the NIM Container#
English Type#
export CONTAINER_ID=nemotron-asr-streaming
export NIM_TAGS_SELECTOR="name=nemotron-asr-streaming,type=en-US"
docker run -it --rm --name=$CONTAINER_ID \
--runtime=nvidia \
--gpus '"device=0"' \
--shm-size=8GB \
-e NGC_API_KEY \
-e NIM_HTTP_API_PORT=9000 \
-e NIM_GRPC_API_PORT=50051 \
-p 9000:9000 \
-p 50051:50051 \
-e NIM_TAGS_SELECTOR \
nvcr.io/nim/nvidia/$CONTAINER_ID:latest
Multilingual Type#
export CONTAINER_ID=nemotron-asr-streaming
export NIM_TAGS_SELECTOR="name=nemotron-asr-streaming,type=multi"
docker run -it --rm --name=$CONTAINER_ID \
--runtime=nvidia \
--gpus '"device=0"' \
--shm-size=8GB \
-e NGC_API_KEY \
-e NIM_HTTP_API_PORT=9000 \
-e NIM_GRPC_API_PORT=50051 \
-p 9000:9000 \
-p 50051:50051 \
-e NIM_TAGS_SELECTOR \
nvcr.io/nim/nvidia/$CONTAINER_ID:latest
For additional profile options, refer to the ASR support matrix.
Run Inference#
Copy a sample audio file from the NIM container or use your own.
docker cp $CONTAINER_ID:/opt/riva/wav/en-US_sample.wav .
Streaming (gRPC)#
Ensure the NIM with streaming mode is deployed.
python3 python-clients/scripts/asr/transcribe_file.py \
--server 0.0.0.0:50051 \
--list-models
The input speech file is streamed to the service chunk-by-chunk.
For English (type=en-US), run the following command:
python3 python-clients/scripts/asr/transcribe_file.py \
--server 0.0.0.0:50051 \
--language-code en-US --automatic-punctuation \
--input-file en-US_sample.wav
For multilingual transcription (type=multi), run the following command:
For automatic language detection, omit --language-code or set --language-code auto. To constrain decoding to a specific language, pass that code (for example, --language-code fr-FR). See Supported Languages by Model Type for the full list.
# Auto language detection (--language-code is not passed)
python3 python-clients/scripts/asr/transcribe_file.py \
--server 0.0.0.0:50051 \
--automatic-punctuation \
--input-file en-US_sample.wav
# Explicit language code (example: French)
python3 python-clients/scripts/asr/transcribe_file.py \
--server 0.0.0.0:50051 \
--language-code fr-FR --automatic-punctuation \
--input-file fr-FR_sample.wav
Realtime API#
python3 python-clients/scripts/asr/realtime_asr_client.py \
--server 0.0.0.0:9000 \
--language-code en-US --automatic-punctuation \
--input-file en-US_sample.wav
Note
The Nemotron ASR Streaming model supports streaming mode only.