Getting Started
Check the Support Matrix to make sure that you have the supported hardware and software stack.
NGC Authentication
Generate an API key
An NGC API key is required to access NGC resources and a key can be generated here: https://org.ngc.nvidia.com/setup/personal-keys.
When creating an NGC API Personal key, ensure that at least “NGC Catalog” is selected from the “Services Included” dropdown. More Services can be included if this key is to be reused for other purposes.
Personal keys allow you to configure an expiration date, revoke or delete the key using an action button, and rotate the key as needed. For more information about key types, please refer the NGC User Guide.
Export the API key
Pass the value of the API key to the docker run
command in the next section as the NGC_API_KEY
environment variable to download the appropriate models and resources when starting the NIM.
If you’re not familiar with how to create the NGC_API_KEY
environment variable, the simplest way is to export it in your terminal:
export NGC_API_KEY=<value>
Run one of the following commands to make the key available at startup:
# If using bash
echo "export NGC_API_KEY=<value>" >> ~/.bashrc
# If using zsh
echo "export NGC_API_KEY=<value>" >> ~/.zshrc
Other, more secure options include saving the value in a file, so that you can retrieve with cat $NGC_API_KEY_FILE
, or using a password manager.
Docker Login to NGC
To pull the NIM container image from NGC, first authenticate with the NVIDIA Container Registry with the following command:
echo "$NGC_API_KEY" | docker login nvcr.io --username '$oauthtoken' --password-stdin
Use $oauthtoken
as the username and NGC_API_KEY
as the password. The $oauthtoken
username is a special name that indicates that you will authenticate with an API key and not a user name and password.
The following command launches a Docker container on any of the supported GPUs.
export CONTAINER_NAME=megatron-1b-nmt
docker run -it --rm --name=$CONTAINER_NAME \
--runtime=nvidia \
--gpus '"device=0"' \
--shm-size=8GB \
-e NGC_API_KEY=$NGC_API_KEY \
-e NIM_MANIFEST_PROFILE=89e2a0b4-477e-11ef-b226-cf5f41e3c684 \
-e NIM_HTTP_API_PORT=9000 \
-e NIM_GRPC_API_PORT=50051 \
-p 9000:9000 \
-p 50051:50051 \
nvcr.io/nim/nvidia/megatron-1b-nmt:1.0.0
It may take a up to 30 minutes depending on your network speed, for the container to be ready and start accepting requests from the time the docker container is started.
Open a new terminal and run the following command to check if the service is ready to handle inference requests:
curl -X 'GET' 'http://localhost:9000/v1/health/ready'
If the service is ready, you get a response similar to the following.
{"ready":true}
Install the Riva Python client
Riva uses gRPC APIs. You can download proto files from Riva gRPC Proto files and compile them to a target language using Protoc compiler. You can find Riva clients in C++ and Python languages at the following locations.
Install Riva Python client
sudo apt-get install python3-pip
pip install -r https://raw.githubusercontent.com/nvidia-riva/python-clients/main/requirements.txt
pip install --force-reinstall git+https://github.com/nvidia-riva/python-clients.git
Download Riva sample client
git clone https://github.com/nvidia-riva/python-clients.git
Run Text-to-Text translation inference:
python3 python-clients/scripts/nmt/nmt.py --server 0.0.0.0:50051 --text "This will become German words" --source-language-code en --target-language-code de
Above command will translate the text from English to German and output will be as shown below.
## Das werden deutsche Wörter
The sample client supports a number of options while making a transcription request to the gRPC endpoint, as described below.
--text
- Text to translate.--list-models
- List available models and supported languages.--source-language-code
- Source language code. Refer to the support matrix page for all supported language codes.--target-language-code
- Target language code. Refer to the support matrix page for all supported language codes.
Flags |
Description |
---|---|
-it |
--interactive + --tty (see Docker docs) |
--rm |
Delete the container after it stops (see Docker docs). |
--name=<container-name |
Give a name to the NIM container. Use any preferred value. |
--runtime=nvidia |
Ensure NVIDIA drivers are accessible in the container. |
--gpus '"device=0"' |
Expose NVIDIA GPU 0 inside the container. If you are running on a host with multiple GPUs, you need to specify which GPU to use. See GPU Enumeration for further information on for mounting specific GPUs. |
--shm-size=8GB |
Allocate host memory for multi-GPU communication. |
-e NGC_API_KEY=$NGC_API_KEY |
Provide the container with the token necessary to download adequate models and resources from NGC. See [above](#NGC Authentication). |
-e NIM_MANIFEST_PROFILE=<profile> |
Specify the model to load. Currently only the profile mentioned in the command is supported. |
-p 9000:9000 |
Forward the port where the NIM HTTP server is published inside the container to access from the host system. The left-hand side of : is the host system ip:port (9000 here), while the right-hand side is the container port where the NIM HTTP server is published. Container port can be any value except 8000. |
-p 50051:50051 |
Forward the port where the NIM gRPC server is published inside the container to access from the host system. The left-hand side of : is the host system ip:port (50051 here), while the right-hand side is the container port where the NIM gRPC server is published. |
On initial startup, the container will download the model from NGC. You can skip this download step on future runs by caching the model locally using a cache directory as in the example below.
# Create the cache directory on the host machine
export LOCAL_NIM_CACHE=~/.cache/nim_nmt
mkdir -p "$LOCAL_NIM_CACHE"
chmod 777 $LOCAL_NIM_CACHE
# Run the container with the cache directory mounted in the appropriate location
docker run -it --rm --name=$CONTAINER_NAME \
--runtime=nvidia \
--gpus '"device=0"' \
--shm-size=8GB \
-e NGC_API_KEY=$NGC_API_KEY \
-e NIM_MANIFEST_PROFILE=89e2a0b4-477e-11ef-b226-cf5f41e3c684 \
-e NIM_HTTP_API_PORT=9000 \
-e NIM_GRPC_API_PORT=50051 \
-p 9000:9000 \
-p 50051:50051 \
-v "$LOCAL_NIM_CACHE:/home/nvs/.cache/nim" \
nvcr.io/nim/nvidia/megatron-1b-nmt:1.0.0
When using model cache, if you change NIM_MANIFEST_PROFILE
for any reason, then ensure to clear the contents of the cache directory on host machine before starting the NIM container. This will ensure that only the requested model profile is loaded.
The following commands stop the container by stopping and removing the running docker container.
docker stop $CONTAINER_NAME
docker rm $CONTAINER_NAME