Manage NeMo Guardrails Access to Models#

How Guardrails Uses Models#

A NeMo Guardrails configuration uses models as the main, or application, model that an end user uses for chat and chat-like interactions. The configuration can also specify task-specific models such as the content safety model provided by Llama 3.1 NemoGuard 8B ContentSafety NIM microservice.

How you configure access to models depends on the value of NIM_ENDPOINT_URL environment variable. The value of the environment variable typically depends on how you install NeMo Guardrails microservice. The following table summarizes the different processes.

NIM_ENDPOINT_URL Value

Guardrails Installation

Management Process

http://nemo-nim-proxy:8000/v1

Installed as part of the NeMo microservices platform.

The NIM_ENDPOINT_URL environment is set to NVIDIA NeMo NIM Proxy in the NeMo Microservices Helm Chart.

Add and remove access to models by using NeMo Deployment Management to deploy and undeploy NIM for LLMs. The management microservice registers and deregisters each model with NIM Proxy.

After a model is registered with the proxy, you can specify the model name in a guardrail configuration.

https://integrate.api.nvidia.com/v1

Installed individually.

The NVIDIA API Catalog URL is the default value for the microservice and typically indicates the microservice runs as a Docker container or is installed in Kubernetes using the individual service Helm chart.

You can manage access to models by sending REST requests to the /v1/guardrail/models endpoint of NeMo Guardrails microservice.

Refer to the information on this page.

Fetching Models at Container Start#

By default, when NIM_ENDPOINT_URL is set to the default value, https://integrate.api.nvidia.com/v1, the microservice does not retrieve the model names available from NVIDIA API Catalog. As a result, a GET request to the /v1/guardrail/models endpoint returns an empty list.

To configure the container to retrieve the model names available from the NVIDIA API Catalog, set the FETCH_NIM_APP_MODELS environment variable to True. When the the environment variable is True, the container starts and retrieves the available model names. Afterward, a GET request to the /v1/guardrail/models endpoint returns a list of model objects that complies with the OpenAI structure:

{
  "object": "list",
  "data": [
    {
      "id": "01-ai/yi-large",
      "object": "model",
      "created": 735790403,
      "owned_by": "system"
    },
    {
      "id": "abacusai/dracarys-llama-3.1-70b-instruct",
      "object": "model",
      "created": 735790403,
      "owned_by": "system"
    },
    // ...

If you enable FETCH_NIM_APP_MODELS to retrieve model names, afterward, you can manage the list of model objects using the API endpoint as shown on this page.

If you do not enable FETCH_NIM_APP_MODELS, you can still access the models from NVIDIA API Catalog. The model names are just not included by default in the response to a GET on the /v1/guardrail/models endpoint. For example, if you know the nvidia/llama-3.1-nemotron-nano-4b-v1.1 model is available from the catalog, you can specify the model name in the model field with an inference request to use the model. Alternatively, you can send a POST request to add the model explicitly, as shown on this page.

Common Actions#

You can send a GET request to the /v1/guardrail/models and /v1/guardrail/models/{model-id} endpoints regardless of how the microservice is installed or the value of the NIM_ENDPOINT_URL environment variable.

Listing All Models#

  • Send a GET request to the /v1/guardrail/models endpoint.

    curl -X GET "${GUARDRAILS_BASE_URL}/v1/guardrail/models" \
      -H 'Accept: application/json' | jq
    
    import os
    import json
    import requests
    
    url = f"{os.environ['GUARDRAILS_BASE_URL']}/v1/guardrail/models"
    
    response = requests.get(url)
    print(json.dumps(response.json(), indent=2))
    

    Example Output

    {
      "object": "list",
      "data": [
        {
          "id": "meta-llama-3.3-70b-instruct",
          "object": "model",
          "created": 1748352890965,
          "owned_by": "system"
        }
      ]
    }
    

Get One Model#

  • Send a GET request to the /v1/guardrail/models/{model-id} endpoint.

    curl -X GET "${GUARDRAILS_BASE_URL}/v1/guardrail/models/meta-llama-3.3-70b-instruct" \
      -H 'Accept: application/json' | jq
    
    import os
    import json
    import requests
    
    url = f"{os.environ['GUARDRAILS_BASE_URL']}/v1/guardrail/models/meta-llama-3.3-70b-instruct"
    
    response = requests.get(url)
    print(json.dumps(response.json(), indent=2))
    

    Example Output

    {
      "model_id": "meta-llama-3.3-70b-instruct",
      "engine": "nimchat",
      "model": "meta/llama-3.3-70b-instruct",
      "base_url": "https://integrate.api.nvidia.com/v1",
      "parameters": {
        "temperature": 0.6,
        "max_tokens": 10,
        "top_p": 0.8,
        "model": "meta/llama-3.3-70b-instruct"
      },
      "created": 1748352890965
    }
    

Actions for Individual Installation#

You can access the following endpoints when the NIM_ENDPOINT_URL environment variable is set to its default value, https://integrate.api.nvidia.com/v1.

Adding a Model#

  • Send a POST request to the /v1/guardrail/models endpoint.

    curl -X POST "${GUARDRAILS_BASE_URL}/v1/guardrail/models" \
      -H "Accept: application/json" \
      -H "Content-Type: application/json" \
      -d '{
        "data": {
          "model_id": "meta-llama-3.3-70b-instruct",
          "engine": "nim",
          "model": "meta/llama-3.3-70b-instruct",
          "base_url": "https://integrate.api.nvidia.com/v1",
          "parameters": {
            "temperature": 0.6,
            "max_tokens": 10,
            "top_p": 0.8
          }
        }
      }' | jq
    
    import os
    import json
    import requests
    
    url = f"{os.environ['GUARDRAILS_BASE_URL']}/v1/guardrail/models"
    
    headers = {"Accept": "application/json", "Content-Type": "application/json"}
    
    data = {
        "data": {
            "model_id": "meta-llama-3.3-70b-instruct",
            "engine": "nim",
            "model": "meta/llama-3.3-70b-instruct",
            "base_url": "https://integrate.api.nvidia.com/v1",
            "parameters": {
                "temperature": 0.6,
                "max_tokens": 10,
                "top_p": 0.8
            }
        }
    }
    
    response = requests.post(url, headers=headers, json=data)
    print(json.dumps(response.json(), indent=2))
    

    For information about the fields in the request body, refer to Guardrails API.

    Example Output

    {
      "model_id": "meta-llama-3.3-70b-instruct",
      "engine": "nimchat",
      "model": "meta/llama-3.3-70b-instruct",
      "base_url": "https://integrate.api.nvidia.com/v1",
      "parameters": {
        "temperature": 0.6,
        "max_tokens": 10,
        "top_p": 0.8,
        "model": "meta/llama-3.3-70b-instruct"
      },
      "created": 1748352890965
    }
    

Update a Model#

  • Send a PATCH request to the /v1/guardrail/models/{model-id} endpoint.

    curl -X PATCH "${GUARDRAILS_BASE_URL}/v1/guardrail/models/meta-llama-3.3-70b-instruct" \
      -H "Accept: application/json" \
      -H "Content-Type: application/json" \
      -d '{
        "data": {
          "engine": "nim",
          "model": "meta/llama-3.3-70b-instruct",
          "base_url": "https://integrate.api.nvidia.com/v1",
          "parameters": {
            "temperature": 0.8,
            "max_tokens": 1024,
            "top_p": 1
          }
        }
      }' | jq
    
    import os
    import json
    import requests
    
    url = f"{os.environ['GUARDRAILS_BASE_URL']}/v1/guardrail/models/meta-llama-3.3-70b-instruct"
    
    headers = {"Accept": "application/json", "Content-Type": "application/json"}
    
    data = {
        "data": {
            "engine": "nim",
            "model": "meta/llama-3.3-70b-instruct",
            "base_url": "https://integrate.api.nvidia.com/v1",
            "parameters": {
                "temperature": 0.8,
                "max_tokens": 1024,
                "top_p": 1
            }
        }
    }
    
    response = requests.patch(url, headers=headers, json=data)
    print(json.dumps(response.json(), indent=2))
    

    Example Output

    {
      "model_id": null,
      "engine": "nimchat",
      "model": "meta/llama-3.3-70b-instruct",
      "base_url": "https://integrate.api.nvidia.com/v1",
      "parameters": {
        "temperature": 0.8,
        "max_tokens": 1024,
        "top_p": 1,
        "model": "meta/llama-3.3-70b-instruct"
      },
      "created": 1748352890986
    }
    

Delete a Model#

  • Send a DELETE request to the /v1/guardrail/models/{model-id} endpoint.

    curl -X DELETE "${GUARDRAILS_BASE_URL}/v1/guardrail/models/meta-llama-3.3-70b-instruct" \
      -H 'Accept: application/json' | jq
    
    import os
    import json
    import requests
    
    url = f"{os.environ['GUARDRAILS_BASE_URL']}/v1/guardrail/models/meta-llama-3.3-70b-instruct"
    
    response = requests.delete(url)
    print(json.dumps(response.json(), indent=2))
    

    Example Output

    {
      "message": "Deleted Application Model ID meta-llama-3.3-70b-instruct",
      "id": "meta-llama-3.3-70b-instruct",
      "deleted_at": "2025-05-27T13:34:53.578698"
    }