Is this page helpful?

Parakeet RNNT#

Parakeet 1.1b RNNT Multilingual model supports streaming speech-to-text transcription in multiple languages. The model identifies the spoken language and provides the transcript in the corresponding language.

For client installation and sample audio instructions, refer to the Deploy and Run ASR Models page.

Deploy the NIM Container#

For the container image, refer to the NGC catalog.

export CONTAINER_ID=parakeet-1-1b-rnnt-multilingual
export NIM_TAGS_SELECTOR="mode=all,diarizer=disabled"

docker run -it --rm --name=$CONTAINER_ID \
  --runtime=nvidia \
  --gpus '"device=0"' \
  --shm-size=8GB \
  -e NGC_API_KEY \
  -e NIM_HTTP_API_PORT=9000 \
  -e NIM_GRPC_API_PORT=50051 \
  -p 9000:9000 \
  -p 50051:50051 \
  -e NIM_TAGS_SELECTOR \
  nvcr.io/nim/nvidia/$CONTAINER_ID:latest

For additional profile options, refer to the ASR support matrix.

Deploying a Subset of Inference Modes on Memory-Constrained GPUs#

GPUs with less than 50 GB of VRAM cannot run Parakeet 1.1b RNNT Multilingual with mode=all. For example, the NVIDIA L40S (48 GB) cannot host all modes simultaneously. You can work around this by exporting only the modes you need — such as offline (ofl) and streaming-throughput (str-thr) — into a single export bundle and serving from that bundle.

Note

Even with two modes, startup may still fail if runtime overhead pushes total GPU memory usage over the available VRAM. The ofl profile uses roughly 25.56 GB and str-thr roughly 14.75 GB before runtime overhead.

Step 1: Export Each Mode Sequentially#

Export the first mode (ofl) to NIM_EXPORT_PATH. The container downloads the model, exports it, then exits.

export LOCAL_NIM_CACHE=/tmp/nim-cache
export NIM_EXPORT_PATH=/tmp/nim_export

mkdir -p $LOCAL_NIM_CACHE $NIM_EXPORT_PATH
chmod 777 $LOCAL_NIM_CACHE $NIM_EXPORT_PATH

docker run -it --rm --name=parakeet-export-ofl \
  --runtime=nvidia \
  --gpus '"device=0"' \
  --shm-size=8GB \
  -e NGC_API_KEY \
  -e NIM_TAGS_SELECTOR="diarizer=sortformer,mode=ofl,type=default,vad=silero" \
  -e NIM_HTTP_API_PORT=9000 \
  -e NIM_GRPC_API_PORT=50051 \
  -p 9000:9000 \
  -p 50051:50051 \
  -v $NIM_EXPORT_PATH:/opt/nim/export \
  -e NIM_EXPORT_PATH=/opt/nim/export \
  -v $LOCAL_NIM_CACHE:/opt/nim/.cache \
  nvcr.io/nim/nvidia/parakeet-1-1b-rnnt-multilingual:latest

After the first export completes, export the second mode (str-thr) into the same NIM_EXPORT_PATH. Do not delete or recreate the export or cache directories.

docker run -it --rm --name=parakeet-export-str-thr \
  --runtime=nvidia \
  --gpus '"device=0"' \
  --shm-size=8GB \
  -e NGC_API_KEY \
  -e NIM_TAGS_SELECTOR="diarizer=sortformer,mode=str-thr,type=default,vad=silero" \
  -e NIM_HTTP_API_PORT=9000 \
  -e NIM_GRPC_API_PORT=50051 \
  -p 9000:9000 \
  -p 50051:50051 \
  -v $NIM_EXPORT_PATH:/opt/nim/export \
  -e NIM_EXPORT_PATH=/opt/nim/export \
  -v $LOCAL_NIM_CACHE:/opt/nim/.cache \
  nvcr.io/nim/nvidia/parakeet-1-1b-rnnt-multilingual:latest

Step 2: Serve from the Combined Export#

Start the container with NIM_DISABLE_MODEL_DOWNLOAD=true and point it at the same export path. The NIM loads both modes from the export bundle without downloading anything from NGC.

docker run -it --rm --name=parakeet-1-1b-rnnt-multilingual \
  --runtime=nvidia \
  --gpus '"device=0"' \
  --shm-size=8GB \
  -e NGC_API_KEY \
  -e NIM_DISABLE_MODEL_DOWNLOAD=true \
  -e NIM_HTTP_API_PORT=9000 \
  -e NIM_GRPC_API_PORT=50051 \
  -p 9000:9000 \
  -p 50051:50051 \
  -v $NIM_EXPORT_PATH:/opt/nim/export \
  -e NIM_EXPORT_PATH=/opt/nim/export \
  nvcr.io/nim/nvidia/parakeet-1-1b-rnnt-multilingual:latest

Kubernetes / Helm Deployment#

The Kubernetes equivalent pre-populates a PVC-backed export volume using two sequential Kubernetes Jobs, then mounts that volume into the Helm release with model downloads disabled.

Prerequisites:

A ReadWriteMany (RWX) PVC is recommended for EKS/EFS so the export Jobs and serving pod can land on different nodes. ReadWriteOnce (RWO) works for a single replica if you schedule the export Jobs and serving pod on the same node.
Use the same image tag across both export Jobs and the Helm release.
The NGC API key must be available as a Kubernetes secret named ngc-api with key NGC_API_KEY, and the registry pull secret as ngc-secret.

Step 1: Create the PVC (pvc.yaml):

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: parakeet-rnnt-nim-cache
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 100Gi

Step 2: Export the ofl profile (export-ofl-job.yaml):

apiVersion: batch/v1
kind: Job
metadata:
  name: parakeet-export-ofl
spec:
  backoffLimit: 0
  template:
    spec:
      restartPolicy: Never
      imagePullSecrets:
        - name: ngc-secret
      containers:
        - name: export-ofl
          image: nvcr.io/nim/nvidia/parakeet-1-1b-rnnt-multilingual:latest
          imagePullPolicy: IfNotPresent
          env:
            - name: NGC_API_KEY
              valueFrom:
                secretKeyRef:
                  name: ngc-api
                  key: NGC_API_KEY
            - name: NIM_TAGS_SELECTOR
              value: "diarizer=sortformer,mode=ofl,type=default,vad=silero"
            - name: NIM_EXPORT_PATH
              value: /opt/nim/export
            - name: NIM_HTTP_API_PORT
              value: "9000"
            - name: NIM_GRPC_API_PORT
              value: "50051"
          resources:
            limits:
              nvidia.com/gpu: 1
          volumeMounts:
            - name: nim-models
              mountPath: /opt/nim/export
              subPath: export
            - name: nim-models
              mountPath: /opt/nim/.cache
              subPath: cache
            - name: dshm
              mountPath: /dev/shm
      volumes:
        - name: nim-models
          persistentVolumeClaim:
            claimName: parakeet-rnnt-nim-cache
        - name: dshm
          emptyDir:
            medium: Memory
            sizeLimit: 8Gi

Step 3: Export the str-thr profile (export-str-thr-job.yaml):

Create the same Job manifest as above, replacing:

metadata.name: parakeet-export-str-thr
NIM_TAGS_SELECTOR value: "diarizer=sortformer,mode=str-thr,type=default,vad=silero"

Step 4: Deploy the NIM Helm release (custom-values.yaml):

image:
  repository: nvcr.io/nim/nvidia/parakeet-1-1b-rnnt-multilingual
  tag: latest
  pullPolicy: IfNotPresent

nim:
  ngcAPISecret: ngc-api
  httpPort: 9000
  grpcPort: 50051

imagePullSecrets:
  - name: ngc-secret

envVars:
  NIM_EXPORT_PATH: /opt/nim/export
  NIM_DISABLE_MODEL_DOWNLOAD: "true"

resources:
  limits:
    nvidia.com/gpu: 1

extraVolumes:
  nim-export:
    persistentVolumeClaim:
      claimName: parakeet-rnnt-nim-cache
  dshm:
    emptyDir:
      medium: Memory
      sizeLimit: 8Gi

extraVolumeMounts:
  nim-export:
    mountPath: /opt/nim/export
    subPath: export
    readOnly: true
  dshm:
    mountPath: /dev/shm

Step 5: Apply in order:

kubectl apply -f pvc.yaml

kubectl apply -f export-ofl-job.yaml
kubectl wait --for=condition=complete job/parakeet-export-ofl --timeout=60m

kubectl apply -f export-str-thr-job.yaml
kubectl wait --for=condition=complete job/parakeet-export-str-thr --timeout=60m

helm upgrade --install riva-nim riva-nim-<version>.tgz -f custom-values.yaml

Note

Pin the image tag and use the same tag in both export Jobs and the Helm release. Mismatched tags can cause runtime incompatibility.

Run Inference#

Copy sample audio files from the NIM container or use your own.

docker cp $CONTAINER_ID:/opt/riva/wav/en-US_sample.wav .
docker cp $CONTAINER_ID:/opt/riva/wav/fr-FR_sample.wav .

Streaming#

Ensure the NIM is deployed with a streaming mode model.

python3 python-clients/scripts/asr/transcribe_file.py \
  --server 0.0.0.0:50051 \
  --list-models

The input speech file is streamed to the service chunk-by-chunk.

# Transcribe English speech
python3 python-clients/scripts/asr/transcribe_file.py \
  --server 0.0.0.0:50051 \
  --language-code multi --automatic-punctuation \
  --input-file en-US_sample.wav

# Transcribe French speech
python3 python-clients/scripts/asr/transcribe_file.py \
  --server 0.0.0.0:50051 \
  --language-code multi --automatic-punctuation \
  --input-file fr-FR_sample.wav

Offline#

Ensure the NIM is deployed with an offline mode model.

python3 python-clients/scripts/asr/transcribe_file_offline.py \
  --server 0.0.0.0:50051 \
  --list-models

The input speech file is sent to the service in one shot.

gRPC

# Transcribe English speech
python3 python-clients/scripts/asr/transcribe_file_offline.py \
  --server 0.0.0.0:50051 \
  --language-code multi --automatic-punctuation \
  --input-file en-US_sample.wav

# Transcribe French speech
python3 python-clients/scripts/asr/transcribe_file_offline.py \
  --server 0.0.0.0:50051 \
  --language-code multi --automatic-punctuation \
  --input-file fr-FR_sample.wav

HTTP

# Transcribe English speech
curl -s http://0.0.0.0:9000/v1/audio/transcriptions -F language=multi \
  -F file="@en-US_sample.wav"

# Transcribe French speech
curl -s http://0.0.0.0:9000/v1/audio/transcriptions -F language=multi \
  -F file="@fr-FR_sample.wav"

Realtime

# Transcribe English speech
python3 python-clients/scripts/asr/realtime_asr_client.py \
  --server 0.0.0.0:9000 \
  --language-code multi --automatic-punctuation \
  --input-file en-US_sample.wav

# Transcribe French speech
python3 python-clients/scripts/asr/realtime_asr_client.py \
  --server 0.0.0.0:9000 \
  --language-code multi --automatic-punctuation \
  --input-file fr-FR_sample.wav