Manage NeMo Guardrails Access to Models#
How Guardrails Uses Models#
A NeMo Guardrails configuration uses models as the main, or application, model that an end user uses for chat and chat-like interactions. The configuration can also specify task-specific models such as the content safety model provided by Llama 3.1 NemoGuard 8B ContentSafety NIM microservice.
How you configure access to models depends on the value of NIM_ENDPOINT_URL
environment variable.
The value of the environment variable typically depends on how you install NeMo Guardrails microservice.
The following table summarizes the different processes.
|
Guardrails Installation |
Management Process |
---|---|---|
http://nemo-nim-proxy:8000/v1 |
Installed as part of the NeMo microservices platform. |
The Add and remove access to models by using NeMo Deployment Management to deploy and undeploy NIM for LLMs. The management microservice registers and deregisters each model with NIM Proxy. After a model is registered with the proxy, you can specify the model name in a guardrail configuration. |
Installed individually. |
The NVIDIA API Catalog URL is the default value for the microservice and typically indicates the microservice runs as a Docker container or is installed in Kubernetes using the individual service Helm chart. You can manage access to models by sending REST requests to the Refer to the information on this page. |
Fetching Models at Container Start#
By default, when NIM_ENDPOINT_URL
is set to the default value, https://integrate.api.nvidia.com/v1,
the microservice does not retrieve the model names available from NVIDIA API Catalog.
As a result, a GET request to the /v1/guardrail/models
endpoint returns an empty list.
To configure the container to retrieve the model names available from the NVIDIA API Catalog,
set the FETCH_NIM_APP_MODELS
environment variable to True
.
When the the environment variable is True
, the container starts and retrieves the available model names.
Afterward, a GET request to the /v1/guardrail/models
endpoint returns a list of model objects that complies with the OpenAI structure:
{
"object": "list",
"data": [
{
"id": "01-ai/yi-large",
"object": "model",
"created": 735790403,
"owned_by": "system"
},
{
"id": "abacusai/dracarys-llama-3.1-70b-instruct",
"object": "model",
"created": 735790403,
"owned_by": "system"
},
// ...
If you enable FETCH_NIM_APP_MODELS
to retrieve model names, afterward, you can manage the list of model objects
using the API endpoint as shown on this page.
If you do not enable FETCH_NIM_APP_MODELS
, you can still access the models from NVIDIA API Catalog.
The model names are just not included by default in the response to a GET on the /v1/guardrail/models
endpoint.
For example, if you know the nvidia/llama-3.1-nemotron-nano-4b-v1.1
model is available from the catalog,
you can specify the model name in the model
field with an inference request to use the model.
Alternatively, you can send a POST request to add the model explicitly, as shown on this page.
Common Actions#
You can send a GET request to the /v1/guardrail/models
and /v1/guardrail/models/{model-id}
endpoints regardless of how the microservice is installed or the value of the NIM_ENDPOINT_URL
environment variable.
Listing All Models#
Send a GET request to the
/v1/guardrail/models
endpoint.curl -X GET "${GUARDRAILS_BASE_URL}/v1/guardrail/models" \ -H 'Accept: application/json' | jq
import os import json import requests url = f"{os.environ['GUARDRAILS_BASE_URL']}/v1/guardrail/models" response = requests.get(url) print(json.dumps(response.json(), indent=2))
Example Output
{ "object": "list", "data": [ { "id": "meta-llama-3.3-70b-instruct", "object": "model", "created": 1748352890965, "owned_by": "system" } ] }
Get One Model#
Send a GET request to the
/v1/guardrail/models/{model-id}
endpoint.curl -X GET "${GUARDRAILS_BASE_URL}/v1/guardrail/models/meta-llama-3.3-70b-instruct" \ -H 'Accept: application/json' | jq
import os import json import requests url = f"{os.environ['GUARDRAILS_BASE_URL']}/v1/guardrail/models/meta-llama-3.3-70b-instruct" response = requests.get(url) print(json.dumps(response.json(), indent=2))
Example Output
{ "model_id": "meta-llama-3.3-70b-instruct", "engine": "nimchat", "model": "meta/llama-3.3-70b-instruct", "base_url": "https://integrate.api.nvidia.com/v1", "parameters": { "temperature": 0.6, "max_tokens": 10, "top_p": 0.8, "model": "meta/llama-3.3-70b-instruct" }, "created": 1748352890965 }
Actions for Individual Installation#
You can access the following endpoints when the NIM_ENDPOINT_URL
environment variable is set to its default value, https://integrate.api.nvidia.com/v1.
Adding a Model#
Send a POST request to the
/v1/guardrail/models
endpoint.curl -X POST "${GUARDRAILS_BASE_URL}/v1/guardrail/models" \ -H "Accept: application/json" \ -H "Content-Type: application/json" \ -d '{ "data": { "model_id": "meta-llama-3.3-70b-instruct", "engine": "nim", "model": "meta/llama-3.3-70b-instruct", "base_url": "https://integrate.api.nvidia.com/v1", "parameters": { "temperature": 0.6, "max_tokens": 10, "top_p": 0.8 } } }' | jq
import os import json import requests url = f"{os.environ['GUARDRAILS_BASE_URL']}/v1/guardrail/models" headers = {"Accept": "application/json", "Content-Type": "application/json"} data = { "data": { "model_id": "meta-llama-3.3-70b-instruct", "engine": "nim", "model": "meta/llama-3.3-70b-instruct", "base_url": "https://integrate.api.nvidia.com/v1", "parameters": { "temperature": 0.6, "max_tokens": 10, "top_p": 0.8 } } } response = requests.post(url, headers=headers, json=data) print(json.dumps(response.json(), indent=2))
For information about the fields in the request body, refer to Guardrails API.
Example Output
{ "model_id": "meta-llama-3.3-70b-instruct", "engine": "nimchat", "model": "meta/llama-3.3-70b-instruct", "base_url": "https://integrate.api.nvidia.com/v1", "parameters": { "temperature": 0.6, "max_tokens": 10, "top_p": 0.8, "model": "meta/llama-3.3-70b-instruct" }, "created": 1748352890965 }
Update a Model#
Send a PATCH request to the
/v1/guardrail/models/{model-id}
endpoint.curl -X PATCH "${GUARDRAILS_BASE_URL}/v1/guardrail/models/meta-llama-3.3-70b-instruct" \ -H "Accept: application/json" \ -H "Content-Type: application/json" \ -d '{ "data": { "engine": "nim", "model": "meta/llama-3.3-70b-instruct", "base_url": "https://integrate.api.nvidia.com/v1", "parameters": { "temperature": 0.8, "max_tokens": 1024, "top_p": 1 } } }' | jq
import os import json import requests url = f"{os.environ['GUARDRAILS_BASE_URL']}/v1/guardrail/models/meta-llama-3.3-70b-instruct" headers = {"Accept": "application/json", "Content-Type": "application/json"} data = { "data": { "engine": "nim", "model": "meta/llama-3.3-70b-instruct", "base_url": "https://integrate.api.nvidia.com/v1", "parameters": { "temperature": 0.8, "max_tokens": 1024, "top_p": 1 } } } response = requests.patch(url, headers=headers, json=data) print(json.dumps(response.json(), indent=2))
Example Output
{ "model_id": null, "engine": "nimchat", "model": "meta/llama-3.3-70b-instruct", "base_url": "https://integrate.api.nvidia.com/v1", "parameters": { "temperature": 0.8, "max_tokens": 1024, "top_p": 1, "model": "meta/llama-3.3-70b-instruct" }, "created": 1748352890986 }
Delete a Model#
Send a DELETE request to the
/v1/guardrail/models/{model-id}
endpoint.curl -X DELETE "${GUARDRAILS_BASE_URL}/v1/guardrail/models/meta-llama-3.3-70b-instruct" \ -H 'Accept: application/json' | jq
import os import json import requests url = f"{os.environ['GUARDRAILS_BASE_URL']}/v1/guardrail/models/meta-llama-3.3-70b-instruct" response = requests.delete(url) print(json.dumps(response.json(), indent=2))
Example Output
{ "message": "Deleted Application Model ID meta-llama-3.3-70b-instruct", "id": "meta-llama-3.3-70b-instruct", "deleted_at": "2025-05-27T13:34:53.578698" }