Advanced Usage#

This section provides a detailed breakdown of the inferencing script for more advanced users.

Studio Voice NIM uses gRPC endpoints. Import the compiled gRPC protos for invoking the NIM.

import os
import sys
import grpc

sys.path.append(os.path.join(os.getcwd(), "../interfaces/studio_voice"))
# Importing gRPC compiler auto-generated maxine studiovoice library
import studiovoice_pb2, studiovoice_pb2_grpc  # noqa: E402

The NIM invocation uses bi-directional gRPC streaming. To generate the request data stream, define a Python generator function. This is also known as a Python iterator of form, a simple function that yields after a call. The yield returns a chunk to be streamed.

def generate_request_for_inference(input_filepath: os.PathLike) -> None:
    """Generator to produce the request data stream

    Args:
      input_filepath: Path to input file
    """
    DATA_CHUNKS = 64 * 1024  # bytes, we send the wav file in 64KB chunks
    with open(input_filepath, "rb") as fd:
        while True:
            buffer = fd.read(DATA_CHUNKS)
            if buffer == b"":
                break
            yield studiovoice_pb2.EnhanceAudioRequest(audio_stream_data=buffer)

Note

For NIM in streaming mode, the audio_stream_data in request should be PCM 32-bit float audio data with each data chunk of size 10ms for low latency model and 6 secs for high quality models.

Before invoking the NIM, define a function that handles the incoming stream and writes it to an output file.

from typing import Iterator

def write_output_file_from_response(
    response_iter: Iterator[studiovoice_pb2.EnhanceAudioResponse],
    output_filepath: os.PathLike
) -> None:
    """Function to write the output file from the incoming gRPC data stream.

    Args:
      response_iter: Responses from the server to write into output file
      output_filepath: Path to output file
    """
    with open(output_filepath, "wb") as fd:
        for response in response_iter:
            if response.HasField("audio_stream_data"):
                fd.write(response.audio_stream_data)

Note

For NIM in streaming mode, the output audio_stream_data in response is PCM 32-bit float audio data of same length as input or request.

Now that we have the request generator and output iterator setup, connect to the NIM and invoke it. The input file path is stored in the variable input_filepath and the output file is written to the location specified in the variable output_filepath. Wait for a message confirming that the function invocation has completed before checking the output file. Fill in the correct host and port for your target in the code snippet below:

import time

input_filepath = "../assets/studio_voice_48k_input.wav"
output_filepath = "studio_voice_48k_output.wav"

# For connecting to a NIM without SSL, open an `insecure_channel(...)`.
# For connecting to a NIM with TLS/mTLS, open a `secure_channel(...)` 
# with required root certificate, client private key and certificate.
with grpc.insecure_channel(target="localhost:8001") as channel:
    try:
        stub = studiovoice_pb2_grpc.MaxineStudioVoiceStub(channel)
        start_time = time.time()

        responses = stub.EnhanceAudio(
            generate_request_for_inference(input_filepath=input_filepath),
            metadata=None,
        )

        write_output_file_from_response(response_iter=responses, output_filepath=output_filepath)

        end_time = time.time()
        print(
            f"Function invocation completed in {end_time-start_time:.2f}s, the output file is generated."
        )
    except BaseException as e:
        print(e)

Compile the Protos (Optional)#

The NVIDIA Maxine NIM Clients package comes with the pre-compiled protos. However, to compile the protos locally, install the required dependencies.

The compilation script enables developers to generate compiled protos that match their specific programming language (such as Python or C++) and version requirements, ensuring compatibility between client applications and compiled protos in their custom implementations.

Linux#

To compile protos on Linux, run:

# Go to studio-voice/protos folder
cd studio-voice/protos

chmod +x compile_protos.sh
./compile_protos.sh

Windows#

To compile protos on Windows, run:

# Go to studio-voice/protos folder
cd studio-voice/protos

compile_protos.bat

Model Caching#

When the container starts for the first time, it will download the required models from NGC. To avoid downloading the models on subsequent runs, you can cache them locally by using a cache directory:

# Create the cache directory on the host machine
export LOCAL_NIM_CACHE=~/.cache/nim
mkdir -p "$LOCAL_NIM_CACHE"
chmod 777 $LOCAL_NIM_CACHE

# Run the container with the cache directory mounted in the appropriate location
docker run -it --rm --name=studio-voice \
    --runtime=nvidia \
    --gpus all \
    --shm-size=8GB \
    -e NGC_API_KEY=$NGC_API_KEY \
    -e NIM_MODEL_PROFILE=<nim_model_profile> \
    -p 8000:8000 \
    -p 8001:8001 \
    -v "$LOCAL_NIM_CACHE:/opt/nim/.cache" \
    nvcr.io/nim/nvidia/maxine-studio-voice:latest

Ensure the nim_model_profile is compatible with your GPU. For more information about nim_model_profile, refer to the NIM Model Profile Table.