Getting Started
Check the Support Matrix to make sure that you have the supported hardware and software stack.
NGC Authentication
Generate an API Key
An NGC API key is required to access NGC resources and a key can be generated here: https://org.ngc.nvidia.com/setup/personal-keys.
When creating an NGC API Personal key, ensure that at least “NGC Catalog” is selected from the “Services Included” dropdown. More Services can be included if this key is to be reused for other purposes.
Personal keys allow you to configure an expiration date, revoke or delete the key using an action button, and rotate the key as needed. For more information about key types, please refer the NGC User Guide.
Export the API Key
Pass the value of the API key to the docker run
command in the next section as the NGC_API_KEY
environment variable to download the appropriate models and resources when starting the NIM.
If you are not familiar with how to create the NGC_API_KEY
environment variable, the simplest way is to export it in your terminal:
export NGC_API_KEY=<value>
Run one of the following commands to make the key available at startup:
# If using bash
echo "export NGC_API_KEY=<value>" >> ~/.bashrc
# If using zsh
echo "export NGC_API_KEY=<value>" >> ~/.zshrc
Other, more secure options include saving the value in a file, so that you can retrieve with cat $NGC_API_KEY_FILE
, or using a password manager.
Docker Login to NGC
To pull the NIM container image from NGC, first authenticate with the NVIDIA Container Registry with the following command:
echo "$NGC_API_KEY" | docker login nvcr.io --username '$oauthtoken' --password-stdin
Use $oauthtoken
as the username and NGC_API_KEY
as the password. The $oauthtoken
username is a special name that indicates that you will authenticate with an API key and not a user name and password.
The following command launches the Maxine Eye Contact NIM container with the gRPC service. Find reference to runtime parameters for the container here
docker run -it --rm --name=maxine-eye-contact-nim \
--net host \
--runtime=nvidia \
--gpus all \
--shm-size=8GB \
-e NGC_API_KEY=$NGC_API_KEY \
-e MAXINE_MAX_CONCURRENCY_PER_GPU=1 \
-e NIM_MANIFEST_PROFILE=7f0287aa-35d0-11ef-9bba-57fc54315ba3 \
-e NIM_HTTP_API_PORT=9000 \
-e NIM_GRPC_API_PORT=50051 \
-p 9000:9000 \
-p 50051:50051 \
nvcr.io/nim/nvidia/maxine-eye-contact:latest
The flag --gpus all
is used to assign all available GPUs to the docker container.
To assign specific GPU to the docker container (in case of multiple GPUs available in your machine) use --gpus '"device=0,1,2..."'
If the command runs successfully, you will get a response similar to the following.
+------------------------+---------+--------+
| Model | Version | Status |
+------------------------+---------+--------+
| GazeRedirectionKey68 | 1 | READY |
| maxine_nvcf_eyecontact | 1 | READY |
+------------------------+---------+--------+
I0903 10:35:41.663046 47 metrics.cc:808] Collecting metrics for GPU 0: GPU Name
I0903 10:35:41.663361 47 metrics.cc:701] Collecting CPU metrics
I0903 10:35:41.663588 47 tritonserver.cc:2385]
+----------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option | Value |
+----------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id | triton |
| server_version | 2.35.0 |
| server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor |
| | _data parameters statistics trace logging |
| model_repository_path[0] | /opt/maxine/models |
| model_control_mode | MODE_NONE |
| strict_model_config | 0 |
| rate_limit | OFF |
| pinned_memory_pool_byte_size | 268435456 |
| cuda_memory_pool_byte_size{0} | 67108864 |
| min_supported_compute_capability | 6.0 |
| strict_readiness | 1 |
| exit_timeout | 30 |
| cache_enabled | 0 |
+----------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
I0903 10:35:41.664874 47 grpc_server.cc:2445] Started GRPCInferenceService at 0.0.0.0:8001
I0903 10:35:41.665204 47 http_server.cc:3555] Started HTTPService at 0.0.0.0:8000
I0903 10:35:41.706437 47 http_server.cc:185] Started Metrics Service at 0.0.0.0:8002
Maxine GRPC Service: Listening to 0.0.0.0:8004
By default Maxine Eye Contact gRPC service is hosted on port 8004
. You will have to use this port for inferencing requests.
The following table describes the environment variables that can be passed into a NIM as a -e
argument added to a docker run
command:
ENV |
Required? |
Default |
Notes |
---|---|---|---|
NGC_API_KEY |
Yes | None | You must set this variable to the value of your personal NGC API key. |
NIM_CACHE_PATH |
Yes | /opt/nim/.cache |
Location (in container) where the container caches model artifacts. |
NIM_HTTP_API_PORT |
Yes | 9000 |
Publish the NIM service to the prescribed port inside the container. Make sure to adjust the port passed to the -p/--publish flag of docker run to reflect that (ex: -p $NIM_HTTP_API_PORT:$NIM_HTTP_API_PORT ). The left-hand side of this : is your host address:port, and does NOT have to match with $NIM_HTTP_API_PORT . The right-hand side of the : is the port inside the container which MUST match NIM_HTTP_API_PORT (or 9000 if not set). Supported endpoints are /v1/license (Returns the license information), /v1/metadata (Returns metadata including asset information, license information, model information, and version) and /v1/metrics (Exposes Prometheus metrics via an ASGI app endpoint). |
NIM_GRPC_API_PORT |
Yes | 50051 |
Make sure to adjust the port passed to the -p/--publish flag of docker run to reflect that (ex: -p $NIM_GRPC_API_PORT:$NIM_GRPC_API_PORT ). The left-hand side of this : is your host address:port, and does NOT have to match with $NIM_GRPC_API_PORT . The right-hand side of the : is the port inside the container which MUST match NIM_GRPC_API_PORT (or 50051 if not set). |
Flags |
Description |
---|---|
-it |
--interactive + --tty (see Docker docs) |
--rm |
Delete the container after it stops (see Docker docs) |
--name=container-name |
Give a name to the NIM container. Use any preferred value. |
--runtime=nvidia |
Ensure NVIDIA drivers are accessible in the container. |
--gpus all |
Expose NVIDIA GPUs inside the container. If you are running on a host with multiple GPUs, you need to specify which GPU to use, you can also specify multiple GPUs. See GPU Enumeration for further information on for mounting specific GPUs. |
--shm-size=8GB |
Allocate host memory for multi-process communication. |
-e NGC_API_KEY=$NGC_API_KEY |
Provide the container with the token necessary to download adequate models and resources from NGC. See above. |
-e MAXINE_MAX_CONCURRENCY_PER_GPU |
Number of concurrent inference requests to be supported by the NIM server per GPU |
-p 9000:9000 |
Forward the port where the NIM HTTP server is published inside the container to access from the host system. The left-hand side of : is the host system ip:port (9000 here), while the right-hand side is the container port where the NIM HTTP server is published. Container port can be any value except 8000. |
-p 50051:50051 |
Forward the port where the NIM gRPC server is published inside the container to access from the host system. The left-hand side of : is the host system ip:port (50051 here), while the right-hand side is the container port where the NIM gRPC server is published. |
The following commands can be used to stop the container.
docker stop $CONTAINER_NAME
docker rm $CONTAINER_NAME