Whisper Large V3#
Whisper supports transcription in multiple languages and translation to English. Refer to Supported Languages for all available languages and corresponding codes. Specifying the correct language is recommended because it improves accuracy and latency. The Whisper model has punctuation enabled by default.
Note
The Whisper model supports offline mode only.
For client installation and sample audio instructions, refer to the Deploy and Run ASR Models page.
Deploy the NIM Container#
export CONTAINER_ID=whisper-large-v3
export NIM_TAGS_SELECTOR="name=whisper-large-v3,mode=ofl"
docker run -it --rm --name=$CONTAINER_ID \
--runtime=nvidia \
--gpus '"device=0"' \
--shm-size=8GB \
-e NGC_API_KEY \
-e NIM_HTTP_API_PORT=9000 \
-e NIM_GRPC_API_PORT=50051 \
-p 9000:9000 \
-p 50051:50051 \
-e NIM_TAGS_SELECTOR \
nvcr.io/nim/nvidia/$CONTAINER_ID:latest
For additional profile options, refer to the ASR support matrix.
Run Inference#
Copy a sample audio file from the NIM container or use your own.
docker cp $CONTAINER_ID:/opt/riva/wav/en-US_sample.wav .
Ensure the NIM with the Whisper Large v3 model is deployed.
python3 python-clients/scripts/asr/transcribe_file_offline.py \
--server 0.0.0.0:50051 \
--list-models
Transcription with Known Language#
python3 python-clients/scripts/asr/transcribe_file_offline.py \
--server 0.0.0.0:50051 \
--language en --input-file en-US_sample.wav
curl -s http://0.0.0.0:9000/v1/audio/transcriptions -F language=en \
-F file="@en-US_sample.wav"
Transcription with Auto Language Detection#
When the language code is not known beforehand, pass the language code multi. The model predicts the language for each 30-second chunk and returns it to the client.
python3 python-clients/scripts/asr/transcribe_file_offline.py \
--server 0.0.0.0:50051 \
--language-code multi \
--input-file en-US_sample.wav
curl -s http://0.0.0.0:9000/v1/audio/transcriptions -F language=multi \
-F file="@en-US_sample.wav"
Translation#
Whisper supports translation from multiple languages to English. Specifying the input language as multi enables auto language detection. Specifying the correct input language is recommended because it improves accuracy and latency.
Copy a sample audio file from the NIM container or use your own.
docker cp $CONTAINER_ID:/opt/riva/wav/fr-FR_sample.wav .
python3 python-clients/scripts/asr/transcribe_file_offline.py \
--server 0.0.0.0:50051 \
--language fr --input-file fr-FR_sample.wav \
--custom-configuration task:translate
curl -s http://0.0.0.0:9000/v1/audio/translations -F language=fr \
-F file="@fr-FR_sample.wav"