Canary#

Canary is a multilingual encoder-decoder model that supports transcription and translation.

Note

The Canary model supports offline mode only.

For client installation and sample audio instructions, refer to the Deploy and Run ASR Models page.

Deploy the NIM Container#

For the container image, refer to the NGC catalog.

export CONTAINER_ID=canary-1b
export NIM_TAGS_SELECTOR="mode=ofl"

docker run -it --rm --name=$CONTAINER_ID \
  --runtime=nvidia \
  --gpus '"device=0"' \
  --shm-size=8GB \
  -e NGC_API_KEY \
  -e NIM_HTTP_API_PORT=9000 \
  -e NIM_GRPC_API_PORT=50051 \
  -p 9000:9000 \
  -p 50051:50051 \
  -e NIM_TAGS_SELECTOR \
  nvcr.io/nim/nvidia/$CONTAINER_ID:latest

For additional profile options, refer to the ASR support matrix.

Run Inference#

Transcription#

Canary supports transcription in the following languages: ar-AR, cs-CZ, da-DK, de-DE, en-GB, en-US, es-ES, es-US, fr-CA, fr-FR, he-IL, hi-IN, it-IT, ja-JP, ko-KR, nb-NO, nl-NL, nn-NO, pl-PL, pt-BR, pt-PT, ru-RU, sv-SE, th-TH, tr-TR, zh-CN. Specifying the input language is required. The Canary model has punctuation enabled by default.

Copy a sample audio file from the NIM container or use your own.

docker cp $CONTAINER_ID:/opt/riva/wav/en-US_sample.wav .

Ensure the NIM with the Canary model is deployed.

python3 python-clients/scripts/asr/transcribe_file_offline.py \
  --server 0.0.0.0:50051 \
  --list-models

gRPC

python3 python-clients/scripts/asr/transcribe_file_offline.py \
  --server 0.0.0.0:50051 \
  --language en-US --input-file en-US_sample.wav

HTTP

curl -s http://0.0.0.0:9000/v1/audio/transcriptions -F language=en-US \
  -F file="@en-US_sample.wav"

Translation#

Canary supports translation from en-US to ar-AR, bg-BG, cs-CZ, da-DK, de-DE, el-GR, en-US, et-EE, fi-FI, fr-FR, hi-IN, hr-HR, hu-HU, id-ID, it-IT, ja-JP, ko-KR, lt-LT, lv-LV, nb-NO, nl-NL, pl-PL, pt-BR, pt-PT, ro-RO, ru-RU, sk-SK, sl-SI, sv-SE, th-TH, tr-TR, uk-UA, vi-VN, zh-CN, and from ar-AR, cs-CZ, da-DK, de-DE, es-ES, es-US, fr-CA, fr-FR, he-IL, hi-IN, it-IT, ja-JP, ko-KR, nb-NO, nl-NL, nn-NO, pl-PL, pt-BR, pt-PT, ru-RU, sv-SE, tr-TR, zh-CN to en-US.

Copy sample audio files from the NIM container or use your own.

docker cp $CONTAINER_ID:/opt/riva/wav/fr-FR_sample.wav .
docker cp $CONTAINER_ID:/opt/riva/examples/asr_lib/1272-135031-0000.wav .

Translation to English#

gRPC

python3 python-clients/scripts/asr/transcribe_file_offline.py \
  --server 0.0.0.0:50051 \
  --language fr-FR --input-file fr-FR_sample.wav \
  --custom-configuration target_language:en-US,task:translate

HTTP

curl -s http://0.0.0.0:9000/v1/audio/translations -F language=fr-FR \
  -F target_language=en-US -F file="@fr-FR_sample.wav"

Translation from English#

gRPC

python3 python-clients/scripts/asr/transcribe_file_offline.py \
  --server 0.0.0.0:50051 \
  --language en-US --input-file 1272-135031-0000.wav \
  --custom-configuration target_language:fr-FR,task:translate

HTTP

curl -s http://0.0.0.0:9000/v1/audio/translations -F language=en-US \
  -F target_language=fr-FR -F file="@1272-135031-0000.wav"