Manage NeMo Guardrails Access to Models#
A NeMo Guardrails configuration uses models as the main, or application, model that an end user uses for chat and chat-like interactions. The configuration can also specify task-specific models such as the content safety model provided by the Llama 3.1 NemoGuard 8B ContentSafety NIM microservice.
How you configure access to models depends on the value of the NIM_ENDPOINT_URL
environment variable.
The value of the environment variable typically depends on how you install the NeMo Guardrails microservice.
The following table summarizes the different processes.
|
Guardrails Installation |
Management Process |
---|---|---|
http://nemo-nim-proxy:8000/v1 |
Installed as part of the NeMo microservices platform. |
The Add and remove access to models by using NeMo Deployment Management to deploy and undeploy NIM for LLMs. The management microservice registers and deregisters each model with NIM Proxy. After a model is registered with the proxy, you can specify the model name in a guardrail configuration. |
Installed individually. |
The NVIDIA API Catalog URL is the default value for the microservice and typically indicates the microservice runs as a Docker container or is installed in Kubernetes using the individual service Helm chart. You can manage access to models by sending REST requests to the Refer to the information on this page. |
Fetching Models at Container Start#
By default, when NIM_ENDPOINT_URL
is set to the default value, https://integrate.api.nvidia.com/v1,
the microservice does not retrieve the model names available from NVIDIA API Catalog.
As a result, a GET request to the /v1/guardrail/models
endpoint returns an empty list.
To configure the container to retrieve the model names available from the NVIDIA API Catalog,
set the FETCH_NIM_APP_MODELS
environment variable to True
.
When the the environment variable is True
, the container starts and retrieves the available model names.
Afterward, a GET request to the /v1/guardrail/models
endpoint returns a list of model objects that complies with the OpenAI structure:
{
"object": "list",
"data": [
{
"id": "01-ai/yi-large",
"object": "model",
"created": 735790403,
"owned_by": "system"
},
{
"id": "abacusai/dracarys-llama-3.1-70b-instruct",
"object": "model",
"created": 735790403,
"owned_by": "system"
},
// ...
If you enable FETCH_NIM_APP_MODELS
to retrieve model names, afterward, you can manage the list of model objects
using the API endpoint as shown on this page.
If you do not enable FETCH_NIM_APP_MODELS
, you can still access the models from NVIDIA API Catalog.
The model names are just not included by default in the response to a GET on the /v1/guardrail/models
endpoint.
For example, if you know the nvidia/llama-3.1-nemotron-nano-4b-v1.1
model is available from the catalog,
you can specify the model name in the model
field with an inference request to use the model.
Alternatively, you can send a POST request to add the model explicitly, as shown on this page.
Common Actions#
You can send a GET request to the /v1/guardrail/models
and /v1/guardrail/models/{model-id}
endpoints regardless of how the microservice is installed or the value of the NIM_ENDPOINT_URL
environment variable.
To List All Models#
Choose one of the following options of listing all models.
Set up a NeMoMicroservices
client instance using the base URL of the NeMo Guardrails microservice and perform the task as follows.
import os
from nemo_microservices import NeMoMicroservices
client = NeMoMicroservices(
base_url=os.environ["GUARDRAILS_BASE_URL"],
inference_base_url=os.environ["NIM_BASE_URL"]
)
response = client.guardrail.models.list()
print(response)
Make a GET request to the /v1/guardrail/models
endpoint.
curl -X GET "${GUARDRAILS_BASE_URL}/v1/guardrail/models" \
-H 'Accept: application/json' | jq
Example Output
{
"object": "list",
"data": [
{
"id": "meta-llama-3.3-70b-instruct",
"object": "model",
"created": 1748352890965,
"owned_by": "system"
}
]
}
To Get Details of a Model#
Choose one of the following options of getting the details of a model.
Set up a NeMoMicroservices
client instance using the base URL of the NeMo Guardrails microservice and perform the task as follows.
import os
from nemo_microservices import NeMoMicroservices
client = NeMoMicroservices(
base_url=os.environ["GUARDRAILS_BASE_URL"],
inference_base_url=os.environ["NIM_BASE_URL"]
)
response = client.guardrail.models.retrieve(model_name="meta/llama-3.1-8b-instruct")
print(response)
Make a GET request to the /v1/guardrail/models/{model-id}
endpoint.
curl -X GET "${GUARDRAILS_BASE_URL}/v1/guardrail/models/meta-llama-3.3-70b-instruct" \
-H 'Accept: application/json' | jq
Example Output
{
"model_id": "meta-llama-3.3-70b-instruct",
"engine": "nimchat",
"model": "meta/llama-3.3-70b-instruct",
"base_url": "https://integrate.api.nvidia.com/v1",
"parameters": {
"temperature": 0.6,
"max_tokens": 10,
"top_p": 0.8,
"model": "meta/llama-3.3-70b-instruct"
},
"created": 1748352890965
}
Actions for Individual Installation#
You can access the following endpoints when the NIM_ENDPOINT_URL
environment variable is set to its default value, https://integrate.api.nvidia.com/v1.
To Add a Model#
Choose one of the following options of adding a model.
The NeMo Microservices Python SDK doesn’t provide a method to add a model. You can add a model in the following ways.
Use the NeMo Deployment Management microservice to add a model. Refer to Deploy NIM Microservice.
Send a POST request to the
/v1/guardrail/models
endpoint. Choose the cURL option to see a sample request.
For information about the fields in the request body, refer to Guardrails API.
```{literalinclude} _snippets/input/config-self-check-output-sdk.py
:language: python
:start-after: "# start-post-model"
:end-before: "# end-post-model"
Make a POST request to the /v1/guardrail/models
endpoint.
curl -X POST "${GUARDRAILS_BASE_URL}/v1/guardrail/models" \
-H "Accept: application/json" \
-H "Content-Type: application/json" \
-d '{
"data": {
"model_id": "meta-llama-3.3-70b-instruct",
"engine": "nim",
"model": "meta/llama-3.3-70b-instruct",
"base_url": "https://integrate.api.nvidia.com/v1",
"parameters": {
"temperature": 0.6,
"max_tokens": 10,
"top_p": 0.8
}
}
}' | jq
For information about the fields in the request body, refer to Guardrails API.
Example Output
{
"model_id": "meta-llama-3.3-70b-instruct",
"engine": "nimchat",
"model": "meta/llama-3.3-70b-instruct",
"base_url": "https://integrate.api.nvidia.com/v1",
"parameters": {
"temperature": 0.6,
"max_tokens": 10,
"top_p": 0.8,
"model": "meta/llama-3.3-70b-instruct"
},
"created": 1748352890965
}
To Update a Model#
Choose one of the following options of updating a model.
Set up a NeMoMicroservices
client instance using the base URL of the NeMo Guardrails microservice and perform the task as follows.
import os
from nemo_microservices import NeMoMicroservices
client = NeMoMicroservices(
base_url=os.environ["GUARDRAILS_BASE_URL"],
inference_base_url=os.environ["NIM_BASE_URL"]
)
response = client.guardrail.models.update(
model_name="meta/llama-3.1-8b-instruct",
base_url="https://integrate.api.nvidia.com/v1",
guardrails={
"engine": "nim",
"model": "meta/llama-3.3-70b-instruct",
"base_url": "https://integrate.api.nvidia.com/v1",
"parameters": {
"temperature": 0.8,
"max_tokens": 1024,
"top_p": 1
}
}
)
print(response)
Make a PATCH request to the /v1/guardrail/models/{model-id}
endpoint.
curl -X PATCH "${GUARDRAILS_BASE_URL}/v1/guardrail/models/meta-llama-3.3-70b-instruct" \
-H "Accept: application/json" \
-H "Content-Type: application/json" \
-d '{
"data": {
"engine": "nim",
"model": "meta/llama-3.3-70b-instruct",
"base_url": "https://integrate.api.nvidia.com/v1",
"parameters": {
"temperature": 0.8,
"max_tokens": 1024,
"top_p": 1
}
}
}' | jq
Example Output
{
"model_id": null,
"engine": "nimchat",
"model": "meta/llama-3.3-70b-instruct",
"base_url": "https://integrate.api.nvidia.com/v1",
"parameters": {
"temperature": 0.8,
"max_tokens": 1024,
"top_p": 1,
"model": "meta/llama-3.3-70b-instruct"
},
"created": 1748352890986
}
To Delete a Model#
Choose one of the following options of deleting a model.
Set up a NeMoMicroservices
client instance using the base URL of the NeMo Guardrails microservice and perform the task as follows.
import os
from nemo_microservices import NeMoMicroservices
client = NeMoMicroservices(
base_url=os.environ["GUARDRAILS_BASE_URL"],
inference_base_url=os.environ["NIM_BASE_URL"]
)
response = client.guardrail.models.delete(model_name="meta/llama-3.1-8b-instruct")
print(response)
Make a DELETE request to the /v1/guardrail/models/{model-id}
endpoint.
curl -X DELETE "${GUARDRAILS_BASE_URL}/v1/guardrail/models/meta-llama-3.3-70b-instruct" \
-H 'Accept: application/json' | jq
Example Output
{
"message": "Deleted Application Model ID meta-llama-3.3-70b-instruct",
"id": "meta-llama-3.3-70b-instruct",
"deleted_at": "2025-05-27T13:34:53.578698"
}