Is this page helpful?

Whisper Large V3#

Whisper supports transcription in multiple languages and translation to English. Refer to Supported Languages for all available languages and corresponding codes. Specifying the correct language is recommended because it improves accuracy and latency. The Whisper model has punctuation enabled by default.

Note

The Whisper model supports offline mode only.

For client installation and sample audio instructions, refer to the Deploy and Run ASR Models page.

Deploy the NIM Container#

For the container image, refer to the NGC catalog.

export CONTAINER_ID=whisper-large-v3
export NIM_TAGS_SELECTOR="name=whisper-large-v3,mode=ofl"

docker run -it --rm --name=$CONTAINER_ID \
  --runtime=nvidia \
  --gpus '"device=0"' \
  --shm-size=8GB \
  -e NGC_API_KEY \
  -e NIM_HTTP_API_PORT=9000 \
  -e NIM_GRPC_API_PORT=50051 \
  -p 9000:9000 \
  -p 50051:50051 \
  -e NIM_TAGS_SELECTOR \
  nvcr.io/nim/nvidia/$CONTAINER_ID:latest

For additional profile options, refer to the ASR support matrix.

Run Inference#

Copy a sample audio file from the NIM container or use your own.

docker cp $CONTAINER_ID:/opt/riva/wav/en-US_sample.wav .

Ensure the NIM with the Whisper Large v3 model is deployed.

python3 python-clients/scripts/asr/transcribe_file_offline.py \
  --server 0.0.0.0:50051 \
  --list-models

Transcription with Known Language#

gRPC

python3 python-clients/scripts/asr/transcribe_file_offline.py \
  --server 0.0.0.0:50051 \
  --language en --input-file en-US_sample.wav

HTTP

curl -s http://0.0.0.0:9000/v1/audio/transcriptions -F language=en-US \
  -F file="@en-US_sample.wav"

Transcription with Auto Language Detection#

When the language code is not known beforehand, pass the language code multi. The model predicts the language for each 30-second chunk and returns it to the client.

gRPC

python3 python-clients/scripts/asr/transcribe_file_offline.py \
  --server 0.0.0.0:50051 \
  --language-code multi \
  --input-file en-US_sample.wav

HTTP

curl -s http://0.0.0.0:9000/v1/audio/transcriptions -F language=multi \
  -F file="@en-US_sample.wav"

Translation#

Whisper supports translation from multiple languages to English. Specifying the input language as multi enables auto language detection. Specifying the correct input language is recommended because it improves accuracy and latency.

Copy a sample audio file from the NIM container or use your own.

docker cp $CONTAINER_ID:/opt/riva/wav/fr-FR_sample.wav .

gRPC

python3 python-clients/scripts/asr/transcribe_file_offline.py \
  --server 0.0.0.0:50051 \
  --language fr --input-file fr-FR_sample.wav \
  --custom-configuration task:translate

HTTP

curl -s http://0.0.0.0:9000/v1/audio/translations -F language=fr \
  -F file="@fr-FR_sample.wav"