Automatic Speech Recognition (Latest)
Automatic Speech Recognition (Latest)

Getting Started

Check the Support Matrix to make sure that you have the supported hardware and software stack.

NGC Authentication

Generate an API key

An NGC API key is required to access NGC resources and a key can be generated here: https://org.ngc.nvidia.com/setup/personal-keys.

When creating an NGC API Personal key, ensure that at least “NGC Catalog” is selected from the “Services Included” dropdown. More Services can be included if this key is to be reused for other purposes.

Note

Personal keys allow you to configure an expiration date, revoke or delete the key using an action button, and rotate the key as needed. For more information about key types, please refer the NGC User Guide.

Export the API key

Pass the value of the API key to the docker run command in the next section as the NGC_API_KEY environment variable to download the appropriate models and resources when starting the NIM.

If you’re not familiar with how to create the NGC_API_KEY environment variable, the simplest way is to export it in your terminal:

Copy
Copied!
            

export NGC_API_KEY=<value>

Run one of the following commands to make the key available at startup:

Copy
Copied!
            

# If using bash echo "export NGC_API_KEY=<value>" >> ~/.bashrc # If using zsh echo "export NGC_API_KEY=<value>" >> ~/.zshrc

Note

Other, more secure options include saving the value in a file, so that you can retrieve with cat $NGC_API_KEY_FILE, or using a password manager.

Docker Login to NGC

To pull the NIM container image from NGC, first authenticate with the NVIDIA Container Registry with the following command:

Copy
Copied!
            

echo "$NGC_API_KEY" | docker login nvcr.io --username '$oauthtoken' --password-stdin

Use $oauthtoken as the username and NGC_API_KEY as the password. The $oauthtoken username is a special name that indicates that you will authenticate with an API key and not a user name and password.

The following command launches a container with the generic (non-optimized) model which can work on any of the supported GPUs. GPU specific optimized models are available for select GPUs. For using optimized models, refer to the table of Supported Models and specify NIM_MANIFEST_PROFILE according to your GPU.

Copy
Copied!
            

export CONTAINER_NAME=parakeet-ctc-1.1b-asr docker run -it --rm --name=$CONTAINER_NAME \ --runtime=nvidia \ --gpus '"device=0"' \ --shm-size=8GB \ -e NGC_API_KEY=$NGC_API_KEY \ -e NIM_MANIFEST_PROFILE=9136dd64-4777-11ef-9f27-37cfd56fa6ee \ -e NIM_HTTP_API_PORT=9000 \ -e NIM_GRPC_API_PORT=50051 \ -p 9000:9000 \ -p 50051:50051 \ nvcr.io/nim/nvidia/parakeet-ctc-1.1b-asr:1.0.0

Model

GPU

NIM_MANIFEST_PROFILE

parakeet-ctc-riva-1-1b (en-US) Generic 9136dd64-4777-11ef-9f27-37cfd56fa6ee
parakeet-ctc-riva-1-1b (en-US) H100 7f0287aa-35d0-11ef-9bba-57fc54315ba3
parakeet-ctc-riva-1-1b (en-US) A100 32397eba-43f4-11ef-b63c-1b565d7d9a02
parakeet-ctc-riva-1-1b (en-US) L40 40d7e326-43f4-11ef-87a2-239b5c506ca7
Note

It may take a up to 30 minutes depending on your network speed, for the container to be ready and start accepting requests from the time the docker container is started.

  1. Open a new terminal and run the following command to check if the service is ready to handle inference requests:

Copy
Copied!
            

curl -X 'GET' 'http://localhost:9000/v1/health/ready'

If the service is ready, you get a response similar to the following.

Copy
Copied!
            

{"ready":true}

  1. Install the Riva Python client.

Riva uses gRPC APIs. You can download proto files from Riva gRPC Proto files and compile them to a target language using Protoc compiler. You can find Riva clients in C++ and Python languages at the following locations.

Install Riva Python client

Copy
Copied!
            

sudo apt-get install python3-pip pip install -r https://raw.githubusercontent.com/nvidia-riva/python-clients/main/requirements.txt pip install --force-reinstall git+https://github.com/nvidia-riva/python-clients.git

Download Riva sample client

Copy
Copied!
            

git clone https://github.com/nvidia-riva/python-clients.git

  1. Run Speech-to-Text (STT) inference.

In case you do not have a speech file available, you can use a sample speech file provided in the docker container launched in previous section.

Copy
Copied!
            

# Copy sample wav file from running NIM container to host machine docker cp $CONTAINER_NAME:/opt/riva/wav/en-US_sample.wav .

Riva ASR supports Mono, 16-bit audio in WAV, OPUS and FLAC formats.

  • Streaming mode: Input speech file is streamed to the service chunk-by-chunk.

Copy
Copied!
            

python3 python-clients/scripts/asr/transcribe_file.py --server 0.0.0.0:50051 --input-file en-US_sample.wav --language-code en-US

  • Offline mode: Input speech file is sent to the service in one shot.

Copy
Copied!
            

python3 python-clients/scripts/asr/transcribe_file_offline.py --server 0.0.0.0:50051 --input-file en-US_sample.wav --language-code en-US

Above commands will generate a transcript as shown below.

Copy
Copied!
            

## what is natural language processing

The sample client supports the following options for making a transcription requests to the gRPC endpoint.

  • --input-file - A path to a local file to stream.

  • --language-code - Language of the input audio. Currently, only “en-US” is supported in Riva ASR NIM.

  • --automatic-punctuation - Whether the transcript should be automatically punctuated.

  • --boosted-lm-words - Words to boost when decoding. See customization for further information.

  • --boosted-lm-score - Value by which to boost words. Recommended range for the boost score is 20 to 100. Refer to the customization page for more information.

Flags

Description

-it --interactive + --tty (see Docker docs)
--rm Delete the container after it stops (see Docker docs)
--name=<container-name Give a name to the NIM container. Use any preferred value.
--runtime=nvidia Ensure NVIDIA drivers are accessible in the container.
--gpus '"device=0"' Expose NVIDIA GPU 0 inside the container. If you are running on a host with multiple GPUs, you need to specify which GPU to use. See GPU Enumeration for further information on for mounting specific GPUs.
--shm-size=8GB Allocate host memory for multi-GPU communication.
-e NGC_API_KEY=$NGC_API_KEY Provide the container with the token necessary to download adequate models and resources from NGC. See [above](#NGC Authentication).
-e NIM_MANIFEST_PROFILE=<profile> Specify the model to load. See Supported Models for information about the available models.
-p 9000:9000 Forward the port where the NIM HTTP server is published inside the container to access from the host system. The left-hand side of : is the host system ip:port (9000 here), while the right-hand side is the container port where the NIM HTTP server is published. Container port can be any value except 8000.
-p 50051:50051 Forward the port where the NIM gRPC server is published inside the container to access from the host system. The left-hand side of : is the host system ip:port (50051 here), while the right-hand side is the container port where the NIM gRPC server is published.

On initial startup, the container will download the model from NGC. You can skip this download step on future runs by caching the model locally using a cache directory as in the example below.

Copy
Copied!
            

# Create the cache directory on the host machine export LOCAL_NIM_CACHE=~/.cache/nim_asr mkdir -p "$LOCAL_NIM_CACHE" chmod 777 $LOCAL_NIM_CACHE # Run the container with the cache directory mounted in the appropriate location docker run -it --rm --name=$CONTAINER_NAME \ --runtime=nvidia \ --gpus '"device=0"' \ --shm-size=8GB \ -e NGC_API_KEY=$NGC_API_KEY \ -e NIM_MANIFEST_PROFILE=9136dd64-4777-11ef-9f27-37cfd56fa6ee \ -e NIM_HTTP_API_PORT=9000 \ -e NIM_GRPC_API_PORT=50051 \ -p 9000:9000 \ -p 50051:50051 \ -v "$LOCAL_NIM_CACHE:/home/nvs/.cache/nim" \ nvcr.io/nim/nvidia/parakeet-ctc-1.1b-asr:1.0.0

Note

When using model cache, if you change NIM_MANIFEST_PROFILE for any reason, then ensure to clear the contents of the cache directory on host machine before starting the NIM container. This will ensure that only the requested model profile is loaded.

The following commands stop the container by stopping and removing the running docker container.

Copy
Copied!
            

docker stop $CONTAINER_NAME docker rm $CONTAINER_NAME

Previous Riva ASR NIM Overview
Next Configuration
© Copyright © 2024, NVIDIA Corporation. Last updated on Aug 6, 2024.