Manage NeMo Guardrails Access to Models#

A NeMo Guardrails configuration uses models as the main, or application, model that an end user uses for chat and chat-like interactions. For example:

{
  "type": "main",
  "engine": "nim",
  "model": "meta/llama-3.1-8b-instruct"
}

The configuration can also specify task-specific models such as the content safety model provided by the Llama 3.1 NemoGuard 8B ContentSafety NIM microservice. For example:

{
  "type": "content_safety",
  "engine": "nim",
  "model": "meta/llama-3.1-nemoguard-8b-content-safety"
}

Configure Access to Main Model#

How you configure access to the main model depends on the value of the NIM_ENDPOINT_URL environment variable. The value of the environment variable typically depends on how you install the NeMo Guardrails microservice. The following table summarizes the different processes.

NIM_ENDPOINT_URL Value

Guardrails Installation

Management Process

http://nemo-nim-proxy:8000/v1

Installed as part of the NeMo microservices platform.

The NIM_ENDPOINT_URL environment is set to NVIDIA NeMo NIM Proxy in the NeMo Microservices Helm Chart.

Add and remove access to models by using NeMo Deployment Management to deploy and undeploy NIM for LLMs. The management microservice registers and deregisters each model with NIM Proxy.

After a model is registered with the proxy, you can specify the model name in a guardrail configuration.

https://integrate.api.nvidia.com/v1

Installed individually.

The NVIDIA API Catalog URL is the default value for the microservice and typically indicates the microservice runs as a Docker container or is installed in Kubernetes using the individual service Helm chart.

Configure Access to Task-Specific Models#

By default, task-specific models will use https://integrate.api.nvidia.com/v1. You can override this by setting parameters.base_url in your configuration. For example, if you’ve installed the NeMo microservices platform Helm chart and deployed the meta/llama-3.1-nemoguard-8b-content-safety model locally:

{
  "type": "content_safety",
  "engine": "nim",
  "model": "meta/llama-3.1-nemoguard-8b-content-safety",
  "parameters": {
    "base_url": "http://nemo-nim-proxy:8000/v1"
  }
}

Common Actions#

You can send a GET request to the /v1/guardrail/models and /v1/guardrail/models/{model-id} endpoints regardless of how the microservice is installed. This endpoint fetches models from NIM_ENDPOINT_URL.

To List All Models#

Choose one of the following options of listing all models.

Python SDK

Set up a NeMoMicroservices client instance using the base URL of the NeMo Guardrails microservice and perform the task as follows.

import os
from nemo_microservices import NeMoMicroservices

client = NeMoMicroservices(
    base_url=os.environ["GUARDRAILS_BASE_URL"],
    inference_base_url=os.environ["NIM_BASE_URL"]
)

response = client.guardrail.models.list()
print(response)

cURL

Make a GET request to the /v1/guardrail/models endpoint.

curl -X GET "${GUARDRAILS_BASE_URL}/v1/guardrail/models" \
  -H 'Accept: application/json' | jq

To Get Details of a Model#

Choose one of the following options of getting the details of a model.

Python SDK

Set up a NeMoMicroservices client instance using the base URL of the NeMo Guardrails microservice and perform the task as follows.

import os
from nemo_microservices import NeMoMicroservices

client = NeMoMicroservices(
    base_url=os.environ["GUARDRAILS_BASE_URL"],
    inference_base_url=os.environ["NIM_BASE_URL"]
)

response = client.guardrail.models.retrieve(model_name="meta/llama-3.1-8b-instruct")
print(response)

cURL

Make a GET request to the /v1/guardrail/models/{model-id} endpoint.

curl -X GET "${GUARDRAILS_BASE_URL}/v1/guardrail/models/meta-llama-3.3-70b-instruct" \
  -H 'Accept: application/json' | jq

Using Models With Guardrails#

All main models use NIM_ENDPOINT_URL for inference, so the model must be accessible from that endpoint. You can make a GET request to the {NIM_ENDPOINT_URL}/v1/models endpoint to see the list of models available through the endpoint.

If your NIM_ENDPOINT_URL points to NIM Proxy, refer to the Deploy NVIDIA NIM tutorial to deploy a new model. If you’d like to use a model hosted at an external endpoint (like OpenAI), you can use NeMo Deployment Management to create a custom deployment configuration; see Deployment with Pre-defined Configurations. This way, you can use the same NIM_ENDPOINT_URL to access external models.