> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/guardrails/llms.txt.
> For full documentation content, see https://docs.nvidia.com/nemo/guardrails/llms-full.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/guardrails/_mcp/server.

# Generation Options Reference

> Configure logging, LLM parameters, and rail selection for generation.

The NeMo Guardrails library exposes a set of **generation options** that give you fine-grained control over how the LLM generation is performed (for example, what rails are enabled, additional parameters that should be passed to the LLM, what context data should be returned, what logging information should be returned).

To use generation options, provide the `options` keyword argument to the `generate()` or `generate_async()` methods:

```python
messages = [{
    "role": "user",
    "content": "..."
}]
rails.generate(messages=messages, options={...})
```

Generation options are also available through [Chat Completions: Control Generation Options](/run-guardrailed-inference/using-fastapi-server/chat-with-guardrailed-model#control-generation-options).

## Disabling Rails

You can choose which categories of rails you want to apply by using the `rails` generation option. The four supported categories are: `input`, `dialog`, `retrieval` and `output`. By default, all are enabled.

```python
res = rails.generate(messages=messages)
```

is equivalent to:

```python
res = rails.generate(messages=messages, options={
    "rails": ["input", "dialog", "retrieval", "output"]
})
```

### Input Rails Only

If you only want to check a user's input by running the input rails from a guardrails configuration, you must disable all the others:

```python
res = rails.generate(messages=[{
    "role": "user",
    "content": "Some user input."
}], options={
    "rails": ["input"]
})
```

The response will be the same string if the input was allowed "as is":

```json
{
  "role": "assistant",
  "content": "Some user input."
}
```

If some of the rails alter the input, for example, to mask sensitive information, then the returned value is the altered input.

```json
{
  "role": "assistant",
  "content": "Some altered user input."
}
```

If the input was blocked, you will get the predefined response `bot refuse to respond` (by default "I'm sorry, I can't respond to that").

```json
{
  "role": "assistant",
  "content": "I'm sorry, I can't respond to that."
}
```

For more details on what rails was triggered, use the `log.activated_rails` generation option.

### Input and Output Rails Only

If you want to check both the user input and an output that was generated outside of the guardrails configuration, you must disable the dialog rails and the retrieval rails, and provide a bot message as well when making the call:

```python
res = rails.generate(messages=[{
    "role": "user",
    "content": "Some user input."
}, {
    "role": "assistant",
    "content": "Some bot output."
}], options={
    "rails": ["input", "output"]
})
```

The response will be the exact bot message provided, if allowed, an altered version if an output rail decides to change it, for example, to remove sensitive information, or the predefined message for `bot refuse to respond`, if the message was blocked.

For receive details on what rails are triggered, use the `log.activated_rails` generation option.

#### Worked Example: Compare All Rails to Input and Output Rails

The [topical rails tutorial](/configure-guardrails/colang/colang-1/tutorials/6-topical-rails) uses an ABC bot configuration with input, dialog, generation, and output rails.
When all rails are enabled, a simple greeting can activate several rails and trigger multiple LLM calls:

```text
{'type': 'input', 'name': 'self check input'}
{'type': 'dialog', 'name': 'generate user intent'}
{'type': 'dialog', 'name': 'generate next step'}
{'type': 'generation', 'name': 'generate bot message'}
{'type': 'output', 'name': 'self check output'}
{'type': 'output', 'name': 'check blocked terms'}
```

The `explain()` method can show the corresponding LLM call count:

```python
info = rails.explain()
info.print_llm_calls_summary()
```

```text
Summary: 5 LLM call(s) took 3.54 seconds and used 1621 tokens.
```

If you only need to validate an already-generated assistant message, provide both the user and assistant messages and set `options={"rails": ["input", "output"]}`.
This skips dialog, retrieval, and generation rails while still applying the configured input and output checks.

For validation-only use cases, prefer the [`check()` and `check_async()` APIs](/run-guardrailed-inference/using-python-apis/check-messages), which run input and output rails without invoking full generation.

### Output Rails Only

To apply output rails exclusively to an LLM response, disable the input rails and provide an empty input.

```python
res = rails.generate(messages=[{
    "role": "user",
    "content": ""
}, {
    "role": "assistant",
    "content": "Some bot output."
}], options={
    "rails": ["output"]
})

```

## Detailed Logging Information

You can obtain detailed information about what happened under the hood during the generation process by setting the `log` generation option. This option has four different inner-options:

* `activated_rails`: Include detailed information about the rails that were activated during generation.
* `llm_calls`: Include information about all the LLM calls that were made. This includes: prompt, completion, token usage, raw response, etc.
* `internal_events`: Include the array of internal generated events.
* `colang_history`: Include the history of the conversation in Colang format.

```python
res = rails.generate(messages=messages, options={
    "log": {
        "activated_rails": True,
        "llm_calls": True,
        "internal_events": True,
        "colang_history": True
    }
})
```

```text
{
  "response": [...],
  "log": {
    "activated_rails": {
      ...
    },
    "stats": {...},
    "llm_calls": [...],
    "internal_events": [...],
    "colang_history": "..."
  }
}
```

When using the Python API, the `log` is an object that also has a `print_summary` method. When called, it will print a simplified version of the log information. Below is a sample output.

```python
res.log.print_summary()
```

```markdown
# General stats

- Total time: 2.85s
  - [0.56s][19.64%]: INPUT Rails
  - [1.40s][49.02%]: DIALOG Rails
  - [0.58s][20.22%]: GENERATION Rails
  - [0.31s][10.98%]: OUTPUT Rails
- 5 LLM calls, 2.74s total duration, 1641 total prompt tokens, 103 total completion tokens, 1744 total tokens.

# Detailed stats

- [0.56s] INPUT (self check input): 1 actions (self_check_input), 1 llm calls [0.56s]
- [0.43s] DIALOG (generate user intent): 1 actions (generate_user_intent), 1 llm calls [0.43s]
- [0.96s] DIALOG (generate next step): 1 actions (generate_next_step), 1 llm calls [0.95s]
- [0.58s] GENERATION (generate bot message): 2 actions (retrieve_relevant_chunks, generate_bot_message), 1 llm calls [0.49s]
- [0.31s] OUTPUT (self check output): 1 actions (self_check_output), 1 llm calls [0.31s]
```

## Output Variables

Some rails can store additional information in [Colang 1.0 Language Syntax: Variables](/configure-guardrails/colang/colang-1/colang-language-syntax-guide#variables). You can return the content of these variables by setting the `output_vars` generation option to the list of names for all the variables that you are interested in. If you want to return the complete context (this will also include some predefined variables), you can set `output_vars` to `True`.

```python
rails.generate(messages=messages, options={
    "output_vars": ["some_input_rail_score", "some_output_rail_score"]
})
```

You can find the returned data in the `output_data` key of the response:

```text
{
  "response": [...],
  "output_data": {
    "some_input_rail_score": 0.7,
    "some_output_rail_score": 0.8
  }
}
```

## Additional LLM Parameters

To supply additional parameters to the LLM call during final message generation, utilize the `llm_params` option. The following example demonstrates how to apply a lower value for `temperature`:

```python
rails.generate(messages=messages, options={
    "llm_params": {
        "temperature": 0.2
    }
})
```

The available parameters are determined by the specific LLM engine in use. The NeMo Guardrails library transmits values defined in the options parameter without modification.

## Additional LLM Output

You can receive additional output from the LLM generation by setting `llm_output` to `True` through the `options` parameter.

```python
rails.generate(messages=messages, options={
    "llm_output": True
})
```

The returned data is highly dependent on the underlying implementation of the LangChain connector for the LLM provider. For example, for OpenAI, it only returns `token_usage` and `model_name`.

## Limitations

* Only supported for the `generate`/`generate_async` methods (not for `generate_events`/`generate_events_async`).
* Specifying which individual rails of a particular type to activate is not yet supported.