NeMo Guardrails Quickstart Using Docker Compose#

Run the microservice on your local machine using Docker Compose for experimentation.

Note

The time to complete this tutorial is approximately 20 minutes.

Prerequisites#

  • Install Docker.

  • You have an NGC API key for access to NVIDIA NGC container registry and model endpoints on build.nvidia.com. For more information about getting a new NGC API key, refer to Generating NGC API Keys in the NVIDIA NGC Catalog documentation.

  • Download and install the NGC CLI. The download is on the same NGC Setup page.

Download the Guardrails Docker Compose Stack#

  1. Log in to NVIDIA NGC using your NGC API key.

    1. Set the NGC_CLI_API_KEY environment variable with your NGC API key. The NGC CLI uses this key to authenticate with the NVIDIA NGC container registry:

      $ export NGC_CLI_API_KEY="<your-ngc-api-key>"
      
    2. Log in to the registry:

      $ docker login nvcr.io --username '$oauthtoken' --password-stdin <<< $NGC_CLI_API_KEY
      
  2. Download the Docker Compose configuration from NGC:

    ngc registry resource download-version "nvidia/nemo-microservices/nemo-microservices-quickstart:25.09"
    cd nemo-microservices-quickstart_v25.09
    

Run Guardrails Microservice with NIM Microservices#

The NeMo Guardrails Docker Compose stack includes configuration for running one application LLM NIM microservice (Llama 3.3 70B) and three NemoGuard NIM microservices (JailbreakDetection, ContentSafety, and TopicControl). Choose one of the following options to run the microservice:

  • build.nvidia.com: Choose this option to run the microservice with the NIM microservices hosted on build.nvidia.com. This doesn’t require any local resources.

  • Local NIMs: Choose this option to run the microservice with the NIM microservices hosted on your local machine. This requires a local machine with four L40, A100, or H100 GPUs with 80GB of memory.

  1. Set the NIM_API_KEY environment variable that the NIM microservices use to authenticate with the build.nvidia.com API:

    $ export NIM_API_KEY="<your-ngc-api-key>"
    
  2. Start the Guardrails service and use the demonstration configuration:

    $ export NEMO_MICROSERVICES_IMAGE_REGISTRY=nvcr.io/nvidia/nemo-microservices
    $ export NEMO_MICROSERVICES_IMAGE_TAG=25.09
    $ docker compose --profile guardrails up
    
  1. Set the GUARDRAILS_CONFIG_TYPE environment variable to local. Also set the NGC_API_KEY environment variable with your NGC API key. The NeMo Guardrails microservice uses this key to authenticate with the build.nvidia.com API:

    $ export GUARDRAILS_CONFIG_TYPE=local
    $ export NGC_API_KEY="<your-ngc-api-key>"
    
  2. Start the Guardrails service and with the associated NIM microservices.

    $ docker compose --profile guardrails --profile guardrails-nims up
    

Run Inference#

After the Guardrails service starts, you can start sending requests to the Guardrails API endpoints running on http://localhost:8080.

The following example shows how to make an inference request to the Guardrails API endpoint with the quickstart configuration.

curl -X POST http://0.0.0.0:8080/v1/guardrail/chat/completions \
   -H 'Accept: application/json' \
   -H 'Content-Type: application/json' \
   -d '{
      "model": "meta/llama-3.3-70b-instruct",
      "messages": [
         {
            "role": "user",
            "content": "what can you do for me?"
         }
      ],
      "max_tokens": 16,
      "stream": false,
      "temperature": 1,
      "top_p": 1,
      "guardrails": {
         "config_id": "quickstart"
      }
   }'

Use the Python SDK and Configuration

Before running the following code, install the NeMo Microservices Python SDK.

from nemo_microservices import NeMoMicroservices
from huggingface_hub import HfApi


nmp_client = NeMoMicroservices(base_url="http://localhost:8080")

nmp_client.guardrail.chat.completions.create(
   model="meta/llama-3.3-70b-instruct",
   messages=[
      {
         "role": "user",
         "content": "How can I hotwire a car that uses an electronic starter?"
      }
   ],
   max_tokens=256,
   stream=False,
   temperature=1,
   top_p=1,
   guardrails={"config_id": "quickstart"}
)

Pre-defined Configurations in the Docker Compose Artifact#

The Docker Compose artifact contains two configurations: default and quickstart.

  • The default configuration has only the model configuration. This is useful for verifying connectivity to the NIM microservices.

  • The quickstart configuration has the model configuration and rail configurations. This is useful for running inference with guardrails applied.

Stop the Guardrails Service#

Run the following command to stop the Guardrails service:

$ docker compose --profile guardrails down

If you run the microservice with local NIMs, also run:

$ docker compose --profile guardrails-nims down

Next Steps#