Parakeet CTC English (en-US)#

Deploy the Parakeet CTC English (en-US) model as a NIM container and run streaming or offline transcription.

Deploy the NIM Container#

For the container image, refer to the NGC catalog.

The following command deploys the Parakeet CTC English (en-US) model with the mode=all inference mode, which enables both streaming and offline inference.

export CONTAINER_ID=parakeet-1-1b-ctc-en-us
export NIM_TAGS_SELECTOR="mode=all,vad=default,diarizer=disabled"

docker run -it --rm --name=$CONTAINER_ID \
  --runtime=nvidia \
  --gpus '"device=0"' \
  --shm-size=8GB \
  -e NGC_API_KEY \
  -e NIM_HTTP_API_PORT=9000 \
  -e NIM_GRPC_API_PORT=50051 \
  -p 9000:9000 \
  -p 50051:50051 \
  -e NIM_TAGS_SELECTOR \
  nvcr.io/nim/nvidia/$CONTAINER_ID:latest

For additional profile options, refer to the ASR support matrix.

The NIM bundles the punctuation and capitalization (PnC) model, but punctuation is off by default. To include punctuation and capitalization in the output, pass --automatic-punctuation at inference time as shown in the following examples.

Prepare a Sample Audio File#

To list the sample audio files bundled in the container, run the following command:

docker exec $CONTAINER_ID ls /opt/riva/wav/

This should return a list of sample audio files. For example:

en-US_sample.wav

To copy a sample audio file to your local machine, run the following command:

docker cp $CONTAINER_ID:/opt/riva/wav/en-US_sample.wav .

Run Inference#

Run inference on the sample audio file in streaming or offline mode.

Streaming#

Ensure the NIM is deployed with a streaming mode model. Verify by running:

python3 python-clients/scripts/asr/transcribe_file.py \
  --server 0.0.0.0:50051 \
  --list-models

You should refer to a model with streaming in the name. The input speech file is streamed to the service chunk-by-chunk.

gRPC

python3 python-clients/scripts/asr/transcribe_file.py \
  --server 0.0.0.0:50051 \
  --language-code en-US --automatic-punctuation \
  --input-file en-US_sample.wav

Realtime

python3 python-clients/scripts/asr/realtime_asr_client.py \
  --server 0.0.0.0:9000 \
  --language-code en-US --automatic-punctuation \
  --input-file en-US_sample.wav

Offline#

Ensure the NIM is deployed with an offline mode model. Verify by running:

python3 python-clients/scripts/asr/transcribe_file_offline.py \
  --server 0.0.0.0:50051 \
  --list-models

You should refer to a model with offline in the name. The input speech file is sent to the service in one shot.

gRPC

python3 python-clients/scripts/asr/transcribe_file_offline.py \
  --server 0.0.0.0:50051 \
  --language-code en-US --automatic-punctuation \
  --input-file en-US_sample.wav

HTTP

curl -s http://0.0.0.0:9000/v1/audio/transcriptions -F language=en-US \
  -F file="@en-US_sample.wav"