Deploying with Docker#
Note
The time to complete this tutorial is approximately 20 minutes.
Prerequisites#
Install Docker.
You have an NVIDIA API key for access to model endpoints on the NVIDIA API Catalog. Refer to build.nvidia.com if you do not have an API key.
You have an NGC API key for access to NVIDIA NGC container registry. Refer to NGC Setup if you do not have an API key.
Download and install the NCG CLI. The download is on the same setup page.
Running the Microservice Container#
Log in to NVIDIA NGC so you can pull the container.
Set the
NGC_CLI_API_KEY
environment variable:$ export NGC_CLI_API_KEY="<M2...>"
Log in to the registry:
$ docker login nvcr.io --username '$oauthtoken' --password-stdin <<< $NGC_CLI_API_KEY
Download the container:
$ docker pull nvcr.io/nvidia/nemo-microservices/guardrails:25.04
Set the
NVIDIA_API_KEY
environment variable:$ export NVIDIA_API_KEY="nvapi-<...>"
Start the container and use the demonstration configuration:
$ docker run -d \ --name nemo-guardrails-ms \ -p 7331:7331 \ -e NIM_ENDPOINT_API_KEY="${NVIDIA_API_KEY}" \ -e DEMO=True \ nvcr.io/nvidia/nemo-microservices/guardrails:25.04
The following is the list of demo guardrail configurations loaded when
DEMO
environment variable is set:default
: an empty configuration.self-check
: a basic configuration using a self-check input rail.abc
: the example ABC bot configuration from NeMo Guardrails Toolkit.
A NeMo Guardrails microservice instance has a default guardrail configuration associated with it. The configuration is set with the
DEFAULT_CONFIG_ID
environment variable. By default, it is set to the configuration with the iddefault
.
Running Inference#
The microservice exposes an OpenAI compatible API. Run the following query to connect to the microservice. The microservice relays the inference request to an endpoint for the NVIDIA API Catalog.
Run inference using the default guardrails configuration:
curl -X POST http://0.0.0.0:7331/v1/guardrail/chat/completions \ -H 'Accept: application/json' \ -H 'Content-Type: application/json' \ -d '{ "model": "meta/llama-3.1-8b-instruct", "messages": [ { "role": "user", "content": "what can you do for me?" } ], "max_tokens": 16, "stream": false, "temperature": 1, "top_p": 1 }'
Because no guardrail configuration is specified, the microservice uses a default, empty configuration, with the ID
default
. This default configuration is useful for verifying connectivity to the LLM.Run inference using the
self-check
guardrails configuration:curl -X POST http://0.0.0.0:7331/v1/guardrail/chat/completions \ -H 'Accept: application/json' \ -H 'Content-Type: application/json' \ -d '{ "model": "meta/llama-3.1-8b-instruct", "messages": [ { "role": "user", "content": "You are stupid." } ], "max_tokens": 16, "stream": false, "temperature": 1, "top_p": 1, "guardrails": { "config_id": "self-check" } }'
Partial Output
{ ... "choices":[{"index":0,"message":{"role":"assistant","content":"I'm sorry, I can't respond to that."}],...}
The message
"I'm sorry, I can't respond to that."
is a predefined message. The microservice uses the message when the input guardrail is triggered.
Alternative: Connect to a Local NIM Microservice#
As an alternative to connecting to the API Catalog models at integrate.api.nvidia.com/v1, you can connect to a local NIM microservice.
If you have a NIM container running locally on http://0.0.0.0:9999/v1
, you can start the NeMo Guardrails container using the following command:
$ docker run -d \
--name nemo-guardrails-ms \
-e NIM_ENDPOINT_URL="http://0.0.0.0:9999/v1" \
-e DEMO=True \
--net=host \
nvcr.io/nvidia/nemo-microservices/guardrails:25.04
When you start the container, make sure the container shares the same network interface as the host.
The preceding sample command uses the --net=host
argument.
Alternatively, you can specify -e NIM_ENDPOINT_URL="http://host.docker.internal:9999/v1"
.
Stopping the Container#
If you launch a Docker container with the --name
command line option, you can execute the Docker stop
and rm
commands using that name, as shown in the following command line examples.
$ docker stop nemo-guardrails-ms
$ docker rm nemo-guardrails-ms
Next Steps#
Read about the self-check input guardrail.
Read about guardrail configurations page.
Access the health check endpoint to check when the microservice is ready:
curl http://0.0.0.0:7331/v1/health
.Install NVIDIA Container Toolkit if your system has NVIDIA GPUs and you use a model and configuration that takes advantage of GPU acceleration.