Getting Started

Prerequisites

Check the Support Matrix to make sure that you have the supported hardware and software stack.

NGC Authentication

Generate an API Key

An NGC API key is required to access NGC resources and a key can be generated here: https://org.ngc.nvidia.com/setup/personal-keys.

When creating an NGC API Personal key, ensure that at least “NGC Catalog” is selected from the “Services Included” dropdown. More Services can be included if this key is to be reused for other purposes.

Note

Personal keys allow you to configure an expiration date, revoke or delete the key using an action button, and rotate the key as needed. For more information about key types, please refer the NGC User Guide.

Export the API Key

Pass the value of the API key to the docker run command in the next section as the NGC_API_KEY environment variable to download the appropriate models and resources when starting the NIM.

If you are not familiar with how to create the NGC_API_KEY environment variable, the simplest way is to export it in your terminal:

export NGC_API_KEY=<value>

Run one of the following commands to make the key available at startup:

# If using bash
echo "export NGC_API_KEY=<value>" >> ~/.bashrc

# If using zsh
echo "export NGC_API_KEY=<value>" >> ~/.zshrc

Note

Other, more secure options include saving the value in a file, so that you can retrieve with cat $NGC_API_KEY_FILE, or using a password manager.

Docker Login to NGC

To pull the NIM container image from NGC, first authenticate with the NVIDIA Container Registry with the following command:

echo "$NGC_API_KEY" | docker login nvcr.io --username '$oauthtoken' --password-stdin

Use $oauthtoken as the username and NGC_API_KEY as the password. The $oauthtoken username is a special name that indicates that you will authenticate with an API key and not a user name and password.

Launching the NIM Container

The following command launches the Maxine Studio Voice NIM container with the gRPC service. Find reference to runtime parameters for the container here.

docker run -it --rm --name=maxine-studio-voice \
    --net host \
    --runtime=nvidia \
    --gpus all \
    --shm-size=8GB \
    -e NGC_API_KEY=$NGC_API_KEY \
    -e NIM_MODEL_PROFILE=<nim_model_profile> \
    -e FILE_SIZE_LIMIT=36700160 \
    nvcr.io/nim/nvidia/maxine-studio-voice:latest

Ensure the nim_model_profile is compatible with your GPU. For more information about NIM_MODEL_PROFILE, refer to the NIM Model Profile Table.

Note

The flag --gpus all is used to assign all available GPUs to the docker container. This fails on multiple GPU unless all GPUs are same. To assign specific GPU to the docker container (in case of different multiple GPUs available in your machine) use --gpus '"device=0,1,2..."'

If the command runs successfully, you will get a response similar to the following.

+-------------------------------+---------+--------+
| Model                         | Version | Status |
+-------------------------------+---------+--------+
| maxine_nvcf_studiovoice       | 1       | READY  |
| studio_voice_high_quality-48k | 1       | READY  |
+-------------------------------+---------+--------+

I1126 09:22:21.040917 31 metrics.cc:877] "Collecting metrics for GPU 0: NVIDIA GeForce RTX 4090"
I1126 09:22:21.046137 31 metrics.cc:770] "Collecting CPU metrics"
I1126 09:22:21.046253 31 tritonserver.cc:2598] 
+----------------------------------+------------------------------------------------------------------------------------------------+
| Option                           | Value                                                                                          |
+----------------------------------+------------------------------------------------------------------------------------------------+
| server_id                        | triton                                                                                         |
| server_version                   | 2.50.0                                                                                         |
| server_extensions                | classification sequence model_repository model_repository(unload_dependents) schedule_policy m |
|                                  | odel_configuration system_shared_memory cuda_shared_memory binary_tensor_data parameters stati |
|                                  | stics trace logging                                                                            |
| model_repository_path[0]         | /opt/maxine/models                                                                             |
| model_control_mode               | MODE_EXPLICIT                                                                                  |
| startup_models_0                 | maxine_nvcf_studiovoice                                                                        |
| startup_models_1                 | studio_voice_high_quality-48k                                                                  |
| strict_model_config              | 0                                                                                              |
| model_config_name                |                                                                                                |
| rate_limit                       | OFF                                                                                            |
| pinned_memory_pool_byte_size     | 268435456                                                                                      |
| cuda_memory_pool_byte_size{0}    | 67108864                                                                                       |
| min_supported_compute_capability | 6.0                                                                                            |
| strict_readiness                 | 1                                                                                              |
| exit_timeout                     | 30                                                                                             |
| cache_enabled                    | 0                                                                                              |
+----------------------------------+------------------------------------------------------------------------------------------------+

I1126 09:22:21.048202 31 grpc_server.cc:2558] "Started GRPCInferenceService at 127.0.0.1:9001"
I1126 09:22:21.048377 31 http_server.cc:4704] "Started HTTPService at 127.0.0.1:9000"
I1126 09:22:21.089295 31 http_server.cc:362] "Started Metrics Service at 127.0.0.1:9002"
Maxine GRPC Service: Listening to 0.0.0.0:8001

Note

By default Maxine Studio Voice gRPC service is hosted on port 8001. You will have to use this port for inferencing requests.

Environment Variables

The following table describes the environment variables that can be passed into a NIM as a -e argument added to a docker run command:

ENV	Required?	Default	Notes
`NGC_API_KEY`	Yes	None	You must set this variable to the value of your personal NGC API key.
`NIM_CACHE_PATH`	No	`/opt/nim/.cache`	Location (in container) where the container caches model artifacts.
`NIM_MODEL_PROFILE`	Yes	None	You must set this model profile to be able to download the specific model type supported on your GPU. To know more about NIM_MODEL_PROFILE refer NIM Model Profile Table
`FILE_SIZE_LIMIT`	No	36700160	Maximum size limit of the input audio file in bytes. Defaults to 35 MB.

Runtime Parameters for the Container

Flags	Description
`-it`	`--interactive` + `--tty` (see Docker docs)
`--rm`	Delete the container after it stops (see Docker docs)
`--name=<container_name>`	Give a name to the NIM container. Use any preferred value.
`--runtime=nvidia`	Ensure NVIDIA drivers are accessible in the container.
`--gpus all`	Expose NVIDIA GPUs inside the container. If you are running on a host with multiple GPUs, you need to specify which GPU to use, you can also specify multiple GPUs. See GPU Enumeration for further information on for mounting specific GPUs.
`--shm-size=8GB`	Allocate host memory for multi-process communication.
`-e NGC_API_KEY=$NGC_API_KEY`	Provide the container with the token necessary to download adequate models and resources from NGC. See above.
`--net host`	Ports exposed by the container are directly accessible on the host without needing -p or –publish flags.

Stopping the Container

The following commands can be used to stop the container.

docker stop $CONTAINER_NAME
docker rm $CONTAINER_NAME