Generation Options
NeMo Guardrails exposes a set of generation options that give you fine-grained control over how the LLM generation is performed (e.g., what rails are enabled, additional parameters that should be passed to the LLM, what context data should be returned, what logging information should be returned).
The generation options can be used both in the Python API and through the server API.
To use the generation options through the Python API, you must provide the options
keyword argument:
messages = [{
"role": "user",
"content": "..."
}]
rails.generate(messages=messages, options={...})
To use the generation options through the server API, you must provide the options
as part of the request body:
POST /v1/chat/completions
{
"config_id": "...",
"messages": [{
"role":"user",
"content":"..."
}],
"options": {
...
}
}
Output Variables
Some rails can store additional information in context variables. You can return the content of these variables by setting the output_vars
generation option to the list of names for all the variables that you are interested in. If you want to return the complete context (this will also include some predefined variables), you can set output_vars
to True
.
rails.generate(messages=messages, options={
"output_vars": ["some_input_rail_score", "some_output_rail_score"]
})
The returned data will be included in the output_data
key of the response:
{
"response": [...],
"output_data": {
"some_input_rail_score": 0.7,
"some_output_rail_score": 0.8
}
}
Additional LLM Parameters
You can pass additional parameters to the LLM call that is used to generate the final message by using the llm_params
generation option. For example, to use a lower temperature than the default one:
rails.generate(messages=messages, options={
"llm_params": {
"temperature": 0.2
}
})
The supported parameters depend on the underlying LLM engine. NeMo Guardrails passes them “as is”.
Additional LLM Output
You can receive the additional output from the LLM generation by using the llm_output
generation options.
rails.generate(messages=messages, options={
"llm_output": True
})
NOTE: The data that is returned is highly dependent on the underlying implementation of the LangChain connector for the LLM provider. For example, for OpenAI, it only returns token_usage
and model_name
.
Detailed Logging Information
You can obtain detailed information about what happened under the hood during the generation process by setting the log
generation option. This option has four different inner-options:
activated_rails
: Include detailed information about the rails that were activated during generation.llm_calls
: Include information about all the LLM calls that were made. This includes: prompt, completion, token usage, raw response, etc.internal_events
: Include the array of internal generated events.colang_history
: Include the history of the conversation in Colang format.
res = rails.generate(messages=messages, options={
"log": {
"activated_rails": True,
"llm_calls": True,
"internal_events": True,
"colang_history": True
}
})
{
"response": [...],
"log": {
"activated_rails": {
...
},
"stats": {...},
"llm_calls": [...],
"internal_events": [...],
"colang_history": "..."
}
}
When using the Python API, the log
is an object that also has a print_summary
method. When called, it will print a simplified version of the log information. Below is a sample output.
res.log.print_summary()
# General stats
- Total time: 2.85s
- [0.56s][19.64%]: INPUT Rails
- [1.40s][49.02%]: DIALOG Rails
- [0.58s][20.22%]: GENERATION Rails
- [0.31s][10.98%]: OUTPUT Rails
- 5 LLM calls, 2.74s total duration, 1641 total prompt tokens, 103 total completion tokens, 1744 total tokens.
# Detailed stats
- [0.56s] INPUT (self check input): 1 actions (self_check_input), 1 llm calls [0.56s]
- [0.43s] DIALOG (generate user intent): 1 actions (generate_user_intent), 1 llm calls [0.43s]
- [0.96s] DIALOG (generate next step): 1 actions (generate_next_step), 1 llm calls [0.95s]
- [0.58s] GENERATION (generate bot message): 2 actions (retrieve_relevant_chunks, generate_bot_message), 1 llm calls [0.49s]
- [0.31s] OUTPUT (self check output): 1 actions (self_check_output), 1 llm calls [0.31s]
TODO: add more details about the returned data.
Disabling Rails
You can choose which categories of rails you want to apply by using the rails
generation option. The four supported categories are: input
, dialog
, retrieval
and output
. By default, all are enabled.
res = rails.generate(messages=messages)
is equivalent to:
res = rails.generate(messages=messages, options={
"rails": ["input", "dialog", "retrieval", "output"]
})
Input Rails Only
If you only want to check a user’s input by running the input rails from a guardrails configuration, you must disable all the others:
res = rails.generate(messages=[{
"role": "user",
"content": "Some user input."
}], options={
"rails": ["input"]
})
The response will be the same string if the input was allowed “as is”:
{
"role": "assistant",
"content": "Some user input."
}
If some of the rails alter the input, e.g., to mask sensitive information, then the returned value is the altered input.
{
"role": "assistant",
"content": "Some altered user input."
}
If the input was blocked, you will get the predefined response bot refuse to respond
(by default “I’m sorry, I can’t respond to that”).
{
"role": "assistant",
"content": "I'm sorry, I can't respond to that."
}
For more details on what rails was triggered, use the log.activated_rails
generation option.
Input and Output Rails Only
If you want to check both the user input and an output that was generated outside of the guardrails configuration, you must disable the dialog rails and the retrieval rails, and provide a bot message as well when making the call:
res = rails.generate(messages=[{
"role": "user",
"content": "Some user input."
}, {
"role": "bot",
"content": "Some bot output."
}], options={
"rails": ["input", "output"]
})
The response will be the exact bot message provided, if allowed, an altered version if an output rail decides to change it, e.g., to remove sensitive information, or the predefined message for bot refuse to respond
, if the message was blocked.
For more details on what rails was triggered, use the log.activated_rails
generation option.
Output Rails Only
If you want to apply only the output rails to an LLM output, you must disable the input rails as well and provide an empty input.
res = rails.generate(messages=[{
"role": "user",
"content": ""
}, {
"role": "bot",
"content": "Some bot output."
}], options={
"rails": ["output"]
})
Limitations
Only supported for the
generate
/generate_async
methods (not forgenerate_events
/generate_events_async
).Specifying which individual rails of a particular type to activate is not yet supported.