Deploy NeMo Guardrails with Docker#

Run the microservice on your local machine using Docker Compose for experimentation.

Note

The time to complete this tutorial is approximately 20 minutes.

Prerequisites#

  • Install Docker.

  • You have an NGC API key for access to NVIDIA NGC container registry and model endpoints on build.nvidia.com. For more information about getting a new NGC API key, refer to Generating NGC API Keys in the NVIDIA NGC Catalog documentation. Specify the NGC Catalog and Public API Endpoints permissions when you generate the key.

  • Download and install the NGC CLI. Refer to Getting Started with the NGC CLI.

Download the Guardrails Docker Compose Stack#

  1. Log in to NVIDIA NGC using your NGC API key.

    1. Set the NGC_CLI_API_KEY environment variable with your NGC API key. The NGC CLI uses this key to authenticate with the NVIDIA NGC container registry:

      $ export NGC_CLI_API_KEY="<your-ngc-api-key>"
      
    2. Log in to the registry:

      $ docker login nvcr.io --username '$oauthtoken' --password-stdin <<< $NGC_CLI_API_KEY
      
  2. Download the Docker Compose configuration from NGC:

    ngc registry resource download-version "nvidia/nemo-microservices/nemo-microservices-quickstart:25.10"
    cd nemo-microservices-quickstart_v25.10
    

Run Guardrails Microservice with NIM Microservices#

The NeMo Guardrails Docker Compose stack includes configuration for running one application LLM NIM microservice (Llama 3.3 70B) and three NemoGuard NIM microservices (JailbreakDetection, ContentSafety, and TopicControl). Choose one of the following options to run the microservice:

  • build.nvidia.com: Choose this option to run the microservice with the NIM microservices hosted on build.nvidia.com. This doesn’t require any local resources.

  • Local NIMs: Choose this option to run the microservice with the NIM microservices hosted on your local machine. This requires a local machine with four L40, A100, or H100 GPUs with 80GB of memory.

  1. Set environment variables that the NIM microservices use to authenticate with the build.nvidia.com API.

  • NIM_API_KEY used to authenticate with your main model when running inference through /v1/guardrail/chat/completions and /v1/guardrail/completions Guardrails endpoints.

  • NVIDIA_API_KEY used to authenticate with guardrail models that evaluate user input and the main model’s output against your configured policies.

Note

Both variables can use the same key. Use separate keys if you need different scopes. You can also place them in a .env file in the same directory as the docker compose file. Docker Compose automatically loads .env from the working directory.

$ export NIM_API_KEY="<your-ngc-api-key>"
$ export NVIDIA_API_KEY="<your-ngc-api-key>"
  1. Start the Guardrails service and use the demonstration configuration:

    $ export NEMO_MICROSERVICES_IMAGE_REGISTRY=nvcr.io/nvidia/nemo-microservices
    $ export NEMO_MICROSERVICES_IMAGE_TAG=25.10
    $ docker compose --profile guardrails up
    
  1. Set the GUARDRAILS_CONFIG_TYPE environment variable to local. Also set the NGC_API_KEY environment variable with your NGC API key. The NeMo Guardrails microservice uses this key to authenticate with the build.nvidia.com API:

    $ export GUARDRAILS_CONFIG_TYPE=local
    $ export NGC_API_KEY="<your-ngc-api-key>"
    
  2. Start the Guardrails service and with the associated NIM microservices.

    $ docker compose --profile guardrails --profile guardrails-nims up
    

Run Inference#

After the Guardrails service starts, you can start sending requests to the Guardrails API endpoints running on http://localhost:8080.

The following example shows how to make an inference request to the Guardrails API endpoint with the quickstart configuration.

curl -X POST http://0.0.0.0:8080/v1/guardrail/chat/completions \
   -H 'Accept: application/json' \
   -H 'Content-Type: application/json' \
   -d '{
      "model": "meta/llama-3.3-70b-instruct",
      "messages": [
         {
            "role": "user",
            "content": "what can you do for me?"
         }
      ],
      "max_tokens": 16,
      "stream": false,
      "temperature": 1,
      "top_p": 1,
      "guardrails": {
         "config_id": "quickstart"
      }
   }'

Use the Python SDK and Configuration

Before running the following code, install the NeMo Microservices Python SDK.

from nemo_microservices import NeMoMicroservices
from huggingface_hub import HfApi


nmp_client = NeMoMicroservices(base_url="http://localhost:8080")

nmp_client.guardrail.chat.completions.create(
   model="meta/llama-3.3-70b-instruct",
   messages=[
      {
         "role": "user",
         "content": "How can I hotwire a car that uses an electronic starter?"
      }
   ],
   max_tokens=256,
   stream=False,
   temperature=1,
   top_p=1,
   guardrails={"config_id": "quickstart"}
)

Pre-defined Configurations in the Docker Compose Artifact#

The Docker Compose artifact contains two configurations: default and quickstart.

  • The default configuration has only the model configuration. This is useful for verifying connectivity to the NIM microservices.

  • The quickstart configuration has the model configuration and rail configurations. This is useful for running inference with guardrails applied.

Stop the Guardrails Service#

Run the following command to stop the Guardrails service:

$ docker compose --profile guardrails down

If you run the microservice with local NIMs, also run:

$ docker compose --profile guardrails-nims down

Next Steps#