Deploying with Docker#

Note

The time to complete this tutorial is approximately 20 minutes.

Prerequisites#

Install Docker.
You have an NVIDIA API key for access to model endpoints on build.nvidia.com. Refer to build.nvidia.com if you do not have an API key.
You have an NGC API key for access to NVIDIA NGC container registry. Refer to NGC Setup if you do not have an API key.

Download and install the NCG CLI. The download is on the same setup page.

Running the Microservice Container#

Log in to NVIDIA NGC so you can pull the container.

Set the NGC_CLI_API_KEY environment variable:
```
$ export NGC_CLI_API_KEY="<M2...>"
```

Log in to the registry:

$ docker login nvcr.io --username '$oauthtoken' --password-stdin <<< $NGC_CLI_API_KEY

Download the container:

$ docker pull nvcr.io/nvidia/nemo-microservices/guardrails:25.06

Set the NVIDIA_API_KEY environment variable:
```
$ export NVIDIA_API_KEY="nvapi-<...>"
```
Start the container and use the demonstration configuration:
```
$ docker run -d \
  --name nemo-guardrails-ms \
  -p 7331:7331 \
  -e NIM_ENDPOINT_API_KEY="${NVIDIA_API_KEY}" \
  -e NVIDIA_API_KEY="${NVIDIA_API_KEY}" \
  -e DEMO=True \
  nvcr.io/nvidia/nemo-microservices/guardrails:25.06
```
The following is the list of demo guardrail configurations loaded when DEMO environment variable is set:
- default: an empty configuration.
- self-check: a basic configuration using a self-check input rail.
- abc: the example ABC bot configuration from NeMo Guardrails Toolkit.
A NeMo Guardrails microservice instance has a default guardrail configuration associated with it. The configuration is set with the DEFAULT_CONFIG_ID environment variable. By default, it is set to the configuration with the id default.

Running Inference#

The microservice exposes an OpenAI compatible API. Run the following query to connect to the microservice. The microservice relays the inference request to an endpoint for build.nvidia.com.

Run inference using the default guardrails configuration:

curl -X POST http://0.0.0.0:7331/v1/guardrail/chat/completions \
  -H 'Accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "model": "meta/llama-3.1-8b-instruct",
  "messages": [
    {
      "role": "user",
      "content": "what can you do for me?"
    }
  ],
  "max_tokens": 16,
  "stream": false,
  "temperature": 1,
  "top_p": 1
 }'

Because no guardrail configuration is specified, the microservice uses a default, empty configuration, with the ID default. This default configuration is useful for verifying connectivity to the LLM.

Run inference using the self-check guardrails configuration:

curl -X POST http://0.0.0.0:7331/v1/guardrail/chat/completions \
  -H 'Accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "model": "meta/llama-3.1-8b-instruct",
  "messages": [
    {
      "role": "user",
      "content": "You are stupid."
    }
  ],
  "max_tokens": 16,
  "stream": false,
  "temperature": 1,
  "top_p": 1,
  "guardrails": {
    "config_id": "self-check"
  }
}'

Partial Output

{ ... "choices":[{"index":0,"message":{"role":"assistant","content":"I'm sorry, I can't respond to that."}],...}

The message "I'm sorry, I can't respond to that." is a predefined message. The microservice uses the message when the input guardrail is triggered.

Alternative: Connect to a Local NIM Microservice#

As an alternative to connecting to the API Catalog models at integrate.api.nvidia.com/v1, you can connect to a local NIM microservice.

If you have a NIM container running locally on http://0.0.0.0:9999/v1, you can start the NeMo Guardrails container using the following command:

$ docker run -d \
  --name nemo-guardrails-ms \
  -e NIM_ENDPOINT_URL="http://0.0.0.0:9999/v1" \
  -e DEMO=True \
  --net=host \
  nvcr.io/nvidia/nemo-microservices/guardrails:25.06

When you start the container, make sure the container shares the same network interface as the host. The preceding sample command uses the --net=host argument. Alternatively, you can specify -e NIM_ENDPOINT_URL="http://host.docker.internal:9999/v1".

Stopping the Container#

If you launch a Docker container with the --name command line option, you can execute the Docker stop and rm commands using that name, as shown in the following command line examples.

$ docker stop nemo-guardrails-ms
$ docker rm nemo-guardrails-ms

Next Steps#

Read about the self-check input guardrail.
Read about guardrail configurations page.
Access the health check endpoint to check when the microservice is ready: curl http://0.0.0.0:7331/v1/health.
Install NVIDIA Container Toolkit if your system has NVIDIA GPUs and you use a model and configuration that takes advantage of GPU acceleration.