Getting Started
Check the support matrix to make sure that you have the supported hardware and software stack.
NGC Authentication
Generate an API key
An NGC API key is required to access NGC resources and a key can be generated here: https://org.ngc.nvidia.com/setup/personal-keys.
When creating an NGC API Personal key, ensure that at least “NGC Catalog” is selected from the “Services Included” dropdown. More Services can be included if this key is to be reused for other purposes.
Personal keys allow you to configure an expiration date, revoke or delete the key using an action button, and rotate the key as needed. For more information about key types, please refer the NGC User Guide.
Export the API key
Pass the value of the API key to the docker run
command in the next section as the NGC_API_KEY
environment variable to download the appropriate models and resources when starting the NIM.
If you’re not familiar with how to create the NGC_API_KEY
environment variable, the simplest way is to export it in your terminal:
export NGC_API_KEY=<value>
Run one of the following commands to make the key available at startup:
# If using bash
echo "export NGC_API_KEY=<value>" >> ~/.bashrc
# If using zsh
echo "export NGC_API_KEY=<value>" >> ~/.zshrc
Other, more secure options include saving the value in a file, so that you can retrieve with cat $NGC_API_KEY_FILE
, or using a password manager.
Docker Login to NGC
To pull the NIM container image from NGC, first authenticate with the NVIDIA Container Registry with the following command:
echo "$NGC_API_KEY" | docker login nvcr.io --username '$oauthtoken' --password-stdin
Use $oauthtoken
as the username and NGC_API_KEY
as the password. The $oauthtoken
username is a special name that indicates that you will authenticate with an API key and not a user name and password.
Launching the NIM with Docker
The following command launches a Docker container for the nv-rerank-qa-mistral-4b
compiled model.
# Choose a container name for bookkeeping
export NIM_MODEL_NAME=nvidia/nv-rerankqa-mistral-4b-v3
export CONTAINER_NAME=$(basename $NIM_MODEL_NAME)
# Choose a NIM Image from NGC
export IMG_NAME="nvcr.io/nim/$NIM_MODEL_NAME:1.0.0"
# Choose a path on your system to cache the downloaded models
export LOCAL_NIM_CACHE=~/.cache/nim
mkdir -p "$LOCAL_NIM_CACHE"
# Start the NIM
docker run -it --rm --name=$CONTAINER_NAME \
--runtime=nvidia \
--gpus all \
--shm-size=16GB \
-e NGC_API_KEY \
-v "$LOCAL_NIM_CACHE:/opt/nim/.cache" \
-u $(id -u) \
-p 8000:8000 \
$IMG_NAME
Flags |
Description |
---|---|
-it |
--interactive + --tty (see Docker docs) |
--rm |
Delete the container after it stops (see Docker docs) |
--name=nv-rerank-qa-mistral-4b-v3 |
Give a name to the NIM container for bookkeeping (here nv-rerank-qa-mistral-4b-v3 ). Use any preferred value. |
--runtime=nvidia |
Ensure NVIDIA drivers are accessible in the container. |
--gpus all |
Expose all NVIDIA GPUs inside the container. See the configuration page for mounting specific GPUs. |
--shm-size=16GB |
Allocate host memory for multi-GPU communication. Not required for single GPU models or GPUs with NVLink enabled. |
-e NGC_API_KEY |
Provide the container with the token necessary to download adequate models and resources from NGC. See [above](#NGC Authentication). |
-v "$LOCAL_NIM_CACHE:/opt/nim/.cache" |
Mount a cache directory from your system (~/.cache/nim here) inside the NIM (defaults to /opt/nim/.cache ), allowing downloaded models and artifacts to be reused by follow-up runs. |
-u $(id -u) |
Use the same user as your system user inside the NIM container to avoid permission mismatches when downloading models in your local cache directory. |
-p 8000:8000 |
Forward the port where the NIM server is published inside the container to access from the host system. The left-hand side of : is the host system ip:port (8000 here), while the right-hand side is the container port where the NIM server is published (defaults to 8000 ). |
$IMG_NAME |
Name and version of the NIM container from NGC. The NIM server automatically starts if no argument is provided after this. |
If you have an issue with permission mismatches when downloading models in your local cache directory, add the -u $(id -u)
option to the docker run
call.
If you are running on a host with multiple types of GPU, you may need to specify which GPUs to use in the --gpus
argument to docker run
. For example, --gpus '"device=0,2"'
. The device IDs of 0 and 2 are examples only; replace them with the appropriate values for your system. Device IDs can be found by running nvidia-smi
. See GPU Enumeration for further information.
GPU clusters with GPUs in Multi-instance GPU mode (MIG), are currently not supported
NOTE: It may take a few seconds for the container to be ready and start accepting requests from the time the docker container is started.
Confirm the service is ready to handle inference requests:
curl -X 'GET' 'http://localhost:8000/v1/health/ready'
If the service is ready, you will get a response like this:
{"ready":true}
curl -X "POST" \
"http://localhost:8000/v1/ranking" \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"model": "nvidia/nv-rerankqa-mistral-4b-v3",
"query": {"text": "which way should i go?"},
"passages": [
{"text": "two roads diverged in a yellow wood, and sorry i could not travel both and be one traveler, long i stood and looked down one as far as i could to where it bent in the undergrowth;"},
{"text": "then took the other, as just as fair, and having perhaps the better claim because it was grassy and wanted wear, though as for that the passing there had worn them really about the same,"},
{"text": "and both that morning equally lay in leaves no step had trodden black. oh, i marked the first for another day! yet knowing how way leads on to way i doubted if i should ever come back."},
{"text": "i shall be telling this with a sigh somewhere ages and ages hense: two roads diverged in a wood, and i, i took the one less traveled by, and that has made all the difference."}
],
"truncate": "END"
}'
For further information, see the API examples.
The following commands stop the container by stopping and removing the running docker container.
docker stop $CONTAINER_NAME
docker rm $CONTAINER_NAME