Manage NeMo Guardrails Access to Models#

A NeMo Guardrails configuration uses models as the main, or application, model that an end user uses for chat and chat-like interactions. The configuration can also specify task-specific models such as the content safety model provided by the Llama 3.1 NemoGuard 8B ContentSafety NIM microservice.

How you configure access to models depends on the value of the NIM_ENDPOINT_URL environment variable. The value of the environment variable typically depends on how you install the NeMo Guardrails microservice. The following table summarizes the different processes.

NIM_ENDPOINT_URL Value

Guardrails Installation

Management Process

http://nemo-nim-proxy:8000/v1

Installed as part of the NeMo microservices platform.

The NIM_ENDPOINT_URL environment is set to NVIDIA NeMo NIM Proxy in the NeMo Microservices Helm Chart.

Add and remove access to models by using NeMo Deployment Management to deploy and undeploy NIM for LLMs. The management microservice registers and deregisters each model with NIM Proxy.

After a model is registered with the proxy, you can specify the model name in a guardrail configuration.

https://integrate.api.nvidia.com/v1

Installed individually.

The NVIDIA API Catalog URL is the default value for the microservice and typically indicates the microservice runs as a Docker container or is installed in Kubernetes using the individual service Helm chart.

You can manage access to models by sending REST requests to the /v1/guardrail/models endpoint of NeMo Guardrails microservice.

Refer to the information on this page.

Fetching Models at Container Start#

By default, when NIM_ENDPOINT_URL is set to the default value, https://integrate.api.nvidia.com/v1, the microservice does not retrieve the model names available from NVIDIA API Catalog. As a result, a GET request to the /v1/guardrail/models endpoint returns an empty list.

To configure the container to retrieve the model names available from the NVIDIA API Catalog, set the FETCH_NIM_APP_MODELS environment variable to True. When the the environment variable is True, the container starts and retrieves the available model names. Afterward, a GET request to the /v1/guardrail/models endpoint returns a list of model objects that complies with the OpenAI structure:

{
  "object": "list",
  "data": [
    {
      "id": "01-ai/yi-large",
      "object": "model",
      "created": 735790403,
      "owned_by": "system"
    },
    {
      "id": "abacusai/dracarys-llama-3.1-70b-instruct",
      "object": "model",
      "created": 735790403,
      "owned_by": "system"
    },
    // ...

If you enable FETCH_NIM_APP_MODELS to retrieve model names, afterward, you can manage the list of model objects using the API endpoint as shown on this page.

If you do not enable FETCH_NIM_APP_MODELS, you can still access the models from NVIDIA API Catalog. The model names are just not included by default in the response to a GET on the /v1/guardrail/models endpoint. For example, if you know the nvidia/llama-3.1-nemotron-nano-4b-v1.1 model is available from the catalog, you can specify the model name in the model field with an inference request to use the model. Alternatively, you can send a POST request to add the model explicitly, as shown on this page.

Common Actions#

You can send a GET request to the /v1/guardrail/models and /v1/guardrail/models/{model-id} endpoints regardless of how the microservice is installed or the value of the NIM_ENDPOINT_URL environment variable.

To List All Models#

Choose one of the following options of listing all models.

Python SDK

Set up a NeMoMicroservices client instance using the base URL of the NeMo Guardrails microservice and perform the task as follows.

import os
from nemo_microservices import NeMoMicroservices

client = NeMoMicroservices(
    base_url=os.environ["GUARDRAILS_BASE_URL"],
    inference_base_url=os.environ["NIM_BASE_URL"]
)

response = client.guardrail.models.list()
print(response)

cURL

Make a GET request to the /v1/guardrail/models endpoint.

curl -X GET "${GUARDRAILS_BASE_URL}/v1/guardrail/models" \
  -H 'Accept: application/json' | jq

To Get Details of a Model#

Choose one of the following options of getting the details of a model.

Python SDK

Set up a NeMoMicroservices client instance using the base URL of the NeMo Guardrails microservice and perform the task as follows.

import os
from nemo_microservices import NeMoMicroservices

client = NeMoMicroservices(
    base_url=os.environ["GUARDRAILS_BASE_URL"],
    inference_base_url=os.environ["NIM_BASE_URL"]
)

response = client.guardrail.models.retrieve(model_name="meta/llama-3.1-8b-instruct")
print(response)

cURL

Make a GET request to the /v1/guardrail/models/{model-id} endpoint.

curl -X GET "${GUARDRAILS_BASE_URL}/v1/guardrail/models/meta-llama-3.3-70b-instruct" \
  -H 'Accept: application/json' | jq

Actions for Individual Installation#

You can access the following endpoints when the NIM_ENDPOINT_URL environment variable is set to its default value, https://integrate.api.nvidia.com/v1.

To Add a Model#

Choose one of the following options of adding a model.

Python SDK

The NeMo Microservices Python SDK doesn’t provide a method to add a model. You can add a model in the following ways.

Use the NeMo Deployment Management microservice to add a model. Refer to Deploy NIM Microservice.
Send a POST request to the /v1/guardrail/models endpoint. Choose the cURL option to see a sample request.

For information about the fields in the request body, refer to Guardrails API.

```{literalinclude} _snippets/input/config-self-check-output-sdk.py
:language: python
:start-after: "# start-post-model"
:end-before: "# end-post-model"

cURL

Make a POST request to the /v1/guardrail/models endpoint.

curl -X POST "${GUARDRAILS_BASE_URL}/v1/guardrail/models" \
  -H "Accept: application/json" \
  -H "Content-Type: application/json" \
  -d '{
    "data": {
      "model_id": "meta-llama-3.3-70b-instruct",
      "engine": "nim",
      "model": "meta/llama-3.3-70b-instruct",
      "base_url": "https://integrate.api.nvidia.com/v1",
      "parameters": {
        "temperature": 0.6,
        "max_tokens": 10,
        "top_p": 0.8
      }
    }
  }' | jq

For information about the fields in the request body, refer to Guardrails API.

To Update a Model#

Choose one of the following options of updating a model.

Python SDK

Set up a NeMoMicroservices client instance using the base URL of the NeMo Guardrails microservice and perform the task as follows.

import os
from nemo_microservices import NeMoMicroservices

client = NeMoMicroservices(
    base_url=os.environ["GUARDRAILS_BASE_URL"],
    inference_base_url=os.environ["NIM_BASE_URL"]
)

response = client.guardrail.models.update(
    model_name="meta/llama-3.1-8b-instruct",
    base_url="https://integrate.api.nvidia.com/v1",
    guardrails={
        "engine": "nim",
        "model": "meta/llama-3.3-70b-instruct",
        "base_url": "https://integrate.api.nvidia.com/v1",
        "parameters": {
            "temperature": 0.8,
            "max_tokens": 1024,
            "top_p": 1
        }
    }
)
print(response)

cURL

Make a PATCH request to the /v1/guardrail/models/{model-id} endpoint.

curl -X PATCH "${GUARDRAILS_BASE_URL}/v1/guardrail/models/meta-llama-3.3-70b-instruct" \
  -H "Accept: application/json" \
  -H "Content-Type: application/json" \
  -d '{
    "data": {
      "engine": "nim",
      "model": "meta/llama-3.3-70b-instruct",
      "base_url": "https://integrate.api.nvidia.com/v1",
      "parameters": {
        "temperature": 0.8,
        "max_tokens": 1024,
        "top_p": 1
      }
    }
  }' | jq

To Delete a Model#

Choose one of the following options of deleting a model.

Python SDK

Set up a NeMoMicroservices client instance using the base URL of the NeMo Guardrails microservice and perform the task as follows.

import os
from nemo_microservices import NeMoMicroservices

client = NeMoMicroservices(
    base_url=os.environ["GUARDRAILS_BASE_URL"],
    inference_base_url=os.environ["NIM_BASE_URL"]
)

response = client.guardrail.models.delete(model_name="meta/llama-3.1-8b-instruct")
print(response)

cURL

Make a DELETE request to the /v1/guardrail/models/{model-id} endpoint.

curl -X DELETE "${GUARDRAILS_BASE_URL}/v1/guardrail/models/meta-llama-3.3-70b-instruct" \
  -H 'Accept: application/json' | jq