Generation Options Reference
The NeMo Guardrails library exposes a set of generation options that give you fine-grained control over how the LLM generation is performed (for example, what rails are enabled, additional parameters that should be passed to the LLM, what context data should be returned, what logging information should be returned).
To use generation options, provide the options keyword argument to the generate() or generate_async() methods:
Generation options are also available through Chat Completions: Control Generation Options.
Disabling Rails
You can choose which categories of rails you want to apply by using the rails generation option. The four supported categories are: input, dialog, retrieval and output. By default, all are enabled.
is equivalent to:
Input Rails Only
If you only want to check a user’s input by running the input rails from a guardrails configuration, you must disable all the others:
The response will be the same string if the input was allowed “as is”:
If some of the rails alter the input, for example, to mask sensitive information, then the returned value is the altered input.
If the input was blocked, you will get the predefined response bot refuse to respond (by default “I’m sorry, I can’t respond to that”).
For more details on what rails was triggered, use the log.activated_rails generation option.
Input and Output Rails Only
If you want to check both the user input and an output that was generated outside of the guardrails configuration, you must disable the dialog rails and the retrieval rails, and provide a bot message as well when making the call:
The response will be the exact bot message provided, if allowed, an altered version if an output rail decides to change it, for example, to remove sensitive information, or the predefined message for bot refuse to respond, if the message was blocked.
For receive details on what rails are triggered, use the log.activated_rails generation option.
Worked Example: Compare All Rails to Input and Output Rails
The topical rails tutorial uses an ABC bot configuration with input, dialog, generation, and output rails. When all rails are enabled, a simple greeting can activate several rails and trigger multiple LLM calls:
The explain() method can show the corresponding LLM call count:
If you only need to validate an already-generated assistant message, provide both the user and assistant messages and set options={"rails": ["input", "output"]}.
This skips dialog, retrieval, and generation rails while still applying the configured input and output checks.
For validation-only use cases, prefer the check() and check_async() APIs, which run input and output rails without invoking full generation.
Output Rails Only
To apply output rails exclusively to an LLM response, disable the input rails and provide an empty input.
Detailed Logging Information
You can obtain detailed information about what happened under the hood during the generation process by setting the log generation option. This option has four different inner-options:
activated_rails: Include detailed information about the rails that were activated during generation.llm_calls: Include information about all the LLM calls that were made. This includes: prompt, completion, token usage, raw response, etc.internal_events: Include the array of internal generated events.colang_history: Include the history of the conversation in Colang format.
When using the Python API, the log is an object that also has a print_summary method. When called, it will print a simplified version of the log information. Below is a sample output.
Output Variables
Some rails can store additional information in Colang 1.0 Language Syntax: Variables. You can return the content of these variables by setting the output_vars generation option to the list of names for all the variables that you are interested in. If you want to return the complete context (this will also include some predefined variables), you can set output_vars to True.
You can find the returned data in the output_data key of the response:
Additional LLM Parameters
To supply additional parameters to the LLM call during final message generation, utilize the llm_params option. The following example demonstrates how to apply a lower value for temperature:
The available parameters are determined by the specific LLM engine in use. The NeMo Guardrails library transmits values defined in the options parameter without modification.
Additional LLM Output
You can receive additional output from the LLM generation by setting llm_output to True through the options parameter.
The returned data is highly dependent on the underlying implementation of the LangChain connector for the LLM provider. For example, for OpenAI, it only returns token_usage and model_name.
Limitations
- Only supported for the
generate/generate_asyncmethods (not forgenerate_events/generate_events_async). - Specifying which individual rails of a particular type to activate is not yet supported.