Generation Options Reference | NVIDIA NeMo Guardrails Library Developer Guide

The NeMo Guardrails library exposes a set of generation options that give you fine-grained control over how the LLM generation is performed (for example, what rails are enabled, additional parameters that should be passed to the LLM, what context data should be returned, what logging information should be returned).

To use generation options, provide the options keyword argument to the generate() or generate_async() methods:

1 messages = [{
2     "role": "user",
3     "content": "..."
4 }]
5 rails.generate(messages=messages, options={...})

Generation options are also available through Chat Completions: Control Generation Options.

Disabling Rails

You can choose which categories of rails you want to apply by using the rails generation option. The four supported categories are: input, dialog, retrieval and output. By default, all are enabled.

1 res = rails.generate(messages=messages)

is equivalent to:

1 res = rails.generate(messages=messages, options={
2     "rails": ["input", "dialog", "retrieval", "output"]
3 })

Input Rails Only

If you only want to check a user’s input by running the input rails from a guardrails configuration, you must disable all the others:

1 res = rails.generate(messages=[{
2     "role": "user",
3     "content": "Some user input."
4 }], options={
5     "rails": ["input"]
6 })

The response will be the same string if the input was allowed “as is”:

1 {
2   "role": "assistant",
3   "content": "Some user input."
4 }

If some of the rails alter the input, for example, to mask sensitive information, then the returned value is the altered input.

1 {
2   "role": "assistant",
3   "content": "Some altered user input."
4 }

If the input was blocked, you will get the predefined response bot refuse to respond (by default “I’m sorry, I can’t respond to that”).

1 {
2   "role": "assistant",
3   "content": "I'm sorry, I can't respond to that."
4 }

For more details on what rails was triggered, use the log.activated_rails generation option.

Input and Output Rails Only

If you want to check both the user input and an output that was generated outside of the guardrails configuration, you must disable the dialog rails and the retrieval rails, and provide a bot message as well when making the call:

1 res = rails.generate(messages=[{
2     "role": "user",
3     "content": "Some user input."
4 }, {
5     "role": "assistant",
6     "content": "Some bot output."
7 }], options={
8     "rails": ["input", "output"]
9 })

The response will be the exact bot message provided, if allowed, an altered version if an output rail decides to change it, for example, to remove sensitive information, or the predefined message for bot refuse to respond, if the message was blocked.

For receive details on what rails are triggered, use the log.activated_rails generation option.

Worked Example: Compare All Rails to Input and Output Rails

The topical rails tutorial uses an ABC bot configuration with input, dialog, generation, and output rails. When all rails are enabled, a simple greeting can activate several rails and trigger multiple LLM calls:

{'type': 'input', 'name': 'self check input'}
{'type': 'dialog', 'name': 'generate user intent'}
{'type': 'dialog', 'name': 'generate next step'}
{'type': 'generation', 'name': 'generate bot message'}
{'type': 'output', 'name': 'self check output'}
{'type': 'output', 'name': 'check blocked terms'}

The explain() method can show the corresponding LLM call count:

1 info = rails.explain()
2 info.print_llm_calls_summary()

Summary: 5 LLM call(s) took 3.54 seconds and used 1621 tokens.

If you only need to validate an already-generated assistant message, provide both the user and assistant messages and set options={"rails": ["input", "output"]}. This skips dialog, retrieval, and generation rails while still applying the configured input and output checks.

For validation-only use cases, prefer the check() and check_async() APIs, which run input and output rails without invoking full generation.

Output Rails Only

To apply output rails exclusively to an LLM response, disable the input rails and provide an empty input.

1 res = rails.generate(messages=[{
2     "role": "user",
3     "content": ""
4 }, {
5     "role": "assistant",
6     "content": "Some bot output."
7 }], options={
8     "rails": ["output"]
9 })

Detailed Logging Information

You can obtain detailed information about what happened under the hood during the generation process by setting the log generation option. This option has four different inner-options:

activated_rails: Include detailed information about the rails that were activated during generation.
llm_calls: Include information about all the LLM calls that were made. This includes: prompt, completion, token usage, raw response, etc.
internal_events: Include the array of internal generated events.
colang_history: Include the history of the conversation in Colang format.

1 res = rails.generate(messages=messages, options={
2     "log": {
3         "activated_rails": True,
4         "llm_calls": True,
5         "internal_events": True,
6         "colang_history": True
7     }
8 })

{
  "response": [...],
  "log": {
    "activated_rails": {
      ...
    },
    "stats": {...},
    "llm_calls": [...],
    "internal_events": [...],
    "colang_history": "..."
  }
}

When using the Python API, the log is an object that also has a print_summary method. When called, it will print a simplified version of the log information. Below is a sample output.

1 res.log.print_summary()

1 # General stats
2 
3 - Total time: 2.85s
4   - [0.56s][19.64%]: INPUT Rails
5   - [1.40s][49.02%]: DIALOG Rails
6   - [0.58s][20.22%]: GENERATION Rails
7   - [0.31s][10.98%]: OUTPUT Rails
8 - 5 LLM calls, 2.74s total duration, 1641 total prompt tokens, 103 total completion tokens, 1744 total tokens.
9 
10 # Detailed stats
11 
12 - [0.56s] INPUT (self check input): 1 actions (self_check_input), 1 llm calls [0.56s]
13 - [0.43s] DIALOG (generate user intent): 1 actions (generate_user_intent), 1 llm calls [0.43s]
14 - [0.96s] DIALOG (generate next step): 1 actions (generate_next_step), 1 llm calls [0.95s]
15 - [0.58s] GENERATION (generate bot message): 2 actions (retrieve_relevant_chunks, generate_bot_message), 1 llm calls [0.49s]
16 - [0.31s] OUTPUT (self check output): 1 actions (self_check_output), 1 llm calls [0.31s]

Output Variables

Some rails can store additional information in Colang 1.0 Language Syntax: Variables. You can return the content of these variables by setting the output_vars generation option to the list of names for all the variables that you are interested in. If you want to return the complete context (this will also include some predefined variables), you can set output_vars to True.

1 rails.generate(messages=messages, options={
2     "output_vars": ["some_input_rail_score", "some_output_rail_score"]
3 })

You can find the returned data in the output_data key of the response:

{
  "response": [...],
  "output_data": {
    "some_input_rail_score": 0.7,
    "some_output_rail_score": 0.8
  }
}

Additional LLM Parameters

To supply additional parameters to the LLM call during final message generation, utilize the llm_params option. The following example demonstrates how to apply a lower value for temperature:

1 rails.generate(messages=messages, options={
2     "llm_params": {
3         "temperature": 0.2
4     }
5 })

The available parameters are determined by the specific LLM engine in use. The NeMo Guardrails library transmits values defined in the options parameter without modification.

Additional LLM Output

You can receive additional output from the LLM generation by setting llm_output to True through the options parameter.

1 rails.generate(messages=messages, options={
2     "llm_output": True
3 })

The returned data is highly dependent on the underlying implementation of the LangChain connector for the LLM provider. For example, for OpenAI, it only returns token_usage and model_name.

Limitations

Only supported for the generate/generate_async methods (not for generate_events/generate_events_async).
Specifying which individual rails of a particular type to activate is not yet supported.