Advanced Usage

This section provides a detailed breakdown of the inferencing script for more advanced users.

Studio Voice NIM uses gRPC endpoints. Import the compiled gRPC protos for invoking the NIM.

import os
import sys
import grpc

sys.path.append(os.path.join(os.getcwd(), "../interfaces/studio_voice"))
# Importing gRPC compiler auto-generated maxine studiovoice library
import studiovoice_pb2, studiovoice_pb2_grpc  # noqa: E402

The NIM invocation uses bi-directional gRPC streaming. To generate the request data stream, define a Python generator function. This is also known as a Python iterator of form, a simple function that yields after a call. The yield returns a chunk to be streamed.

def generate_request_for_inference(input_filepath: os.PathLike) -> None:
    """Generator to produce the request data stream

    Args:
      input_filepath: Path to input file
    """
    DATA_CHUNKS = 64 * 1024  # bytes, we send the wav file in 64KB chunks
    with open(input_filepath, "rb") as fd:
        while True:
            buffer = fd.read(DATA_CHUNKS)
            if buffer == b"":
                break
            yield studiovoice_pb2.EnhanceAudioRequest(audio_stream_data=buffer)

Before invoking the NIM, define a function that handles the incoming stream and writes it to an output file.

from typing import Iterator

def write_output_file_from_response(
    response_iter: Iterator[studiovoice_pb2.EnhanceAudioResponse],
    output_filepath: os.PathLike
) -> None:
    """Function to write the output file from the incoming gRPC data stream.

    Args:
      response_iter: Responses from the server to write into output file
      output_filepath: Path to output file
    """
    with open(output_filepath, "wb") as fd:
        for response in response_iter:
            if response.HasField("audio_stream_data"):
                fd.write(response.audio_stream_data)

Now that we have the request generator and output iterator setup, connect to the NIM and invoke it. The input file path is stored in the variable input_filepath and the output file is written to the location specified in the variable output_filepath. Wait for a message confirming that the function invocation has completed before checking the output file. Fill in the correct host and port for your target in the code snippet below:

import time

input_filepath = "../assets/studio_voice_48k_input.wav"
output_filepath = "studio_voice_48k_output.wav"

with grpc.insecure_channel(target="localhost:8001") as channel:
    try:
        stub = studiovoice_pb2_grpc.MaxineStudioVoiceStub(channel)
        start_time = time.time()

        responses = stub.EnhanceAudio(
            generate_request_for_inference(input_filepath=input_filepath),
            metadata=None,
        )

        write_output_file_from_response(response_iter=responses, output_filepath=output_filepath)

        end_time = time.time()
        print(
            f"Function invocation completed in {end_time-start_time:.2f}s, the output file is generated."
        )
    except BaseException as e:
        print(e)

Compile the Protos

The NVIDIA Maxine NIM Clients package comes with the pre-compiled protos. However, to compile the protos locally, install the required dependencies.

Linux

To compile protos on Linux, run:

# Go to studio-voice/protos folder
cd studio-voice/protos

chmod +x compile_protos.sh
./compile_protos.sh

Windows

To compile protos on Windows, run:

# Go to studio-voice/protos folder
cd studio-voice/protos

compile_protos.bat

Model Caching

When the container starts for the first time, it will download the required models from NGC. To avoid downloading the models on subsequent runs, you can cache them locally by using a cache directory:

# Create the cache directory on the host machine
export LOCAL_NIM_CACHE=~/.cache/nim
mkdir -p "$LOCAL_NIM_CACHE"
chmod 777 $LOCAL_NIM_CACHE

# Run the container with the cache directory mounted in the appropriate location
docker run -it --rm --name=maxine-studio-voice \
    --net host \
    --runtime=nvidia \
    --gpus all \
    --shm-size=8GB \
    -e NGC_API_KEY=$NGC_API_KEY \
    -e NIM_MODEL_PROFILE=<nim_model_profile> \
    -v "$LOCAL_NIM_CACHE:/opt/nim/.cache" \
    nvcr.io/nim/nvidia/maxine-studio-voice:latest

Ensure the nim_model_profile is compatible with your GPU. For more information about nim_model_profile, refer to the NIM Model Profile Table.