Model Configuration for NeMo Guardrails

View as Markdown

In this page, learn how to configure the models section in your Guardrails config.yml file. For a complete reference of all configuration options, refer to the Configuration YAML Schema Reference.

NVIDIA NIM Configuration

The NVIDIA NeMo Guardrails library integrates with NVIDIA NIM microservices:

1models:
2 - type: main
3 engine: nim
4 model: meta/llama-3.1-8b-instruct

This provides access to:

  • Locally deployed NIMs. You can run models on your own infrastructure with optimized inference.
  • NVIDIA API Catalog. You can access hosted models on build.nvidia.com.
  • Specialized NIMs. Includes NemoGuard Content Safety, Topic Control, and Jailbreak Detect.

Local NIM Deployment

For locally deployed NIMs, specify the base URL:

1models:
2 - type: main
3 engine: nim
4 model: meta/llama-3.1-8b-instruct
5 parameters:
6 base_url: http://localhost:8000/v1

Task-Specific Models

Configure different models for specific tasks:

1models:
2 - type: main
3 engine: nim
4 model: meta/llama-3.1-8b-instruct
5
6 - type: self_check_input
7 engine: nim
8 model: meta/llama3-8b-instruct
9
10 - type: self_check_output
11 engine: nim
12 model: meta/llama-3.1-70b-instruct
13
14 - type: generate_user_intent
15 engine: nim
16 model: meta/llama-3.1-8b-instruct

Configuration Examples

OpenAI

The following example shows how to configure the OpenAI model as the main application LLM:

1models:
2 - type: main
3 engine: openai
4 model: gpt-4o

Azure OpenAI

The following example shows how to configure the Azure OpenAI model as the main application LLM using the Azure OpenAI API:

1models:
2 - type: main
3 engine: azure
4 model: gpt-4
5 parameters:
6 azure_endpoint: https://my-resource.openai.azure.com/
7 azure_deployment: my-gpt4-deployment
8 api_version: "2024-02-15-preview"

You can supply the resource endpoint as azure_endpoint (preferred, matches the OpenAI Python SDK) or base_url (v0.21-compatibility alias). Both fields accept only the resource URL. The framework composes the deployment path. Setting both raises an error.

Set AZURE_OPENAI_API_KEY in the environment, set api_key_env_var on the model entry, or pass parameters.api_key directly. The framework constructs the deployment URL, sets api-version as a query parameter, and authenticates with the api-key header.

Azure OpenAI is supported natively on the default framework in v0.22 with key-based authentication. For Azure AD or token-based authentication, configure engine: openai manually or use LangChain with NEMOGUARDRAILS_LLM_FRAMEWORK=langchain. Refer to Migrating to 0.22 for both alternatives.

Anthropic

The following example shows how to configure the Anthropic model as the main application LLM:

1models:
2 - type: main
3 engine: anthropic
4 model: claude-3-5-sonnet-20241022

Anthropic’s API is not OpenAI-compatible, so this engine is opt-in. Set NEMOGUARDRAILS_LLM_FRAMEWORK=langchain and install langchain-anthropic. For background, refer to Migrating to 0.22.

vLLM (OpenAI-Compatible)

vLLM exposes an OpenAI-compatible API, so the recommended configuration uses engine: openai pointed at the vLLM endpoint. The built-in client handles it with no LangChain dependency.

1models:
2 - type: main
3 engine: openai
4 model: meta-llama/Llama-3.1-8B-Instruct
5 parameters:
6 base_url: http://localhost:5000/v1
7 api_key: EMPTY

The following example shows how to configure Llama Guard as a guardrail model using the same pattern:

1models:
2 - type: llama_guard
3 engine: openai
4 model: meta-llama/LlamaGuard-7b
5 parameters:
6 base_url: http://localhost:5000/v1
7 api_key: EMPTY

When self-hosted vLLM does not enforce authentication, set parameters.api_key to a non-empty placeholder such as EMPTY. If your deployment requires a real token, replace parameters.api_key with the literal token, or omit it and set api_key_env_var at the top level of the model entry, not inside parameters:

1- type: main
2 engine: openai
3 model: meta-llama/Llama-3.1-8B-Instruct
4 api_key_env_var: MY_VLLM_API_KEY
5 parameters:
6 base_url: http://localhost:5000/v1

Set the referenced environment variable before calling RailsConfig.from_content or RailsConfig.from_path. Otherwise, config loading fails with Model API Key environment variable 'X' not set.. A Pydantic validator on the model schema performs the check eagerly.

The legacy engine: vllm_openai with parameters.openai_api_base form is only needed when running under NEMOGUARDRAILS_LLM_FRAMEWORK=langchain. For new configurations, prefer the form above.

Other OpenAI-Compatible Endpoints

The same engine: openai plus parameters.base_url pattern works for any provider whose wire protocol is OpenAI-compatible. Examples include OpenRouter, Together.ai, Fireworks.ai, Groq, DeepSeek’s hosted API at https://api.deepseek.com/v1, TGI deployments that expose /v1/chat/completions, and the llama.cpp server with --api. Provide parameters.base_url and either parameters.api_key or a top-level api_key_env_var.

Google Vertex AI

The following example shows how to configure the Google Vertex AI model as the main application LLM:

1models:
2 - type: main
3 engine: vertexai
4 model: gemini-1.0-pro

Vertex AI’s API is not OpenAI-compatible, so this engine is opt-in. Set NEMOGUARDRAILS_LLM_FRAMEWORK=langchain and install langchain-google-vertexai. For background, refer to Migrating to 0.22.

Complete Example

The following example shows how to configure the main application LLM, embeddings model, and a dedicated NemoGuard model for input and output checking:

1models:
2 # Main application LLM
3 - type: main
4 engine: nim
5 model: meta/llama-3.1-70b-instruct
6 parameters:
7 temperature: 0.7
8 max_tokens: 2000
9
10 # Embeddings for knowledge base
11 - type: embeddings
12 engine: FastEmbed
13 model: all-MiniLM-L6-v2
14
15 # Dedicated model for input checking
16 - type: self_check_input
17 engine: nim
18 model: nvidia/llama-3.1-nemoguard-8b-content-safety
19
20 # Dedicated model for output checking
21 - type: self_check_output
22 engine: nim
23 model: nvidia/llama-3.1-nemoguard-8b-content-safety

Model Parameters

Pass additional parameters to the underlying LLM client. For engines served by the built-in client, such as any OpenAI-compatible endpoint, the runtime forwards parameters to the OpenAI-compatible HTTP request. Examples include temperature, max_tokens, base_url, api_key, default_query, and default_headers. For LangChain engines, parameters follow the conventions of the underlying LangChain class.

1models:
2 - type: main
3 engine: openai
4 model: gpt-4
5 parameters:
6 temperature: 0.7
7 max_tokens: 1000
8 top_p: 0.9

Common parameters vary by provider. For built-in engines, see the OpenAI-compatible client options. For LangChain engines, refer to the corresponding LangChain provider documentation.