> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/guardrails/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/guardrails/_mcp/server.

# nemoguardrails.rails.llm.llmrails

LLM Rails entry point.

## Module Contents

### Classes

| Name                                                      | Description                           |
| --------------------------------------------------------- | ------------------------------------- |
| [`LLMRails`](#nemoguardrails-rails-llm-llmrails-LLMRails) | Rails based on a given configuration. |

### Functions

| Name                                                                                                  | Description |
| ----------------------------------------------------------------------------------------------------- | ----------- |
| [`_determine_rails_from_messages`](#nemoguardrails-rails-llm-llmrails-_determine_rails_from_messages) | -           |
| [`_get_blocking_rail`](#nemoguardrails-rails-llm-llmrails-_get_blocking_rail)                         | -           |
| [`_get_last_content_by_role`](#nemoguardrails-rails-llm-llmrails-_get_last_content_by_role)           | -           |
| [`_get_last_response_content`](#nemoguardrails-rails-llm-llmrails-_get_last_response_content)         | -           |
| [`_normalize_messages_for_rails`](#nemoguardrails-rails-llm-llmrails-_normalize_messages_for_rails)   | -           |
| [`_wrap_legacy_llm`](#nemoguardrails-rails-llm-llmrails-_wrap_legacy_llm)                             | -           |

### Data

[`log`](#nemoguardrails-rails-llm-llmrails-log)

[`process_events_semaphore`](#nemoguardrails-rails-llm-llmrails-process_events_semaphore)

### API

```python
class nemoguardrails.rails.llm.llmrails.LLMRails(
    config: nemoguardrails.rails.llm.config.RailsConfig,
    llm: typing.Optional[nemoguardrails.types.LLMModel] = None,
    verbose: bool = False
)
```

**Bases:** [BaseGuardrails](/guardrails-python-sdk/nemoguardrails/base_guardrails#nemoguardrails-base_guardrails-BaseGuardrails)

Rails based on a given configuration.

The optional passthrough function that bypasses LLM generation.

When set, the rails pipeline calls this function instead of the main LLM
for generating responses. LLMGenerationActions is private, expose only
`passthrough_fn` as a public API

```python
nemoguardrails.rails.llm.llmrails.LLMRails.__getstate__()
```

```python
nemoguardrails.rails.llm.llmrails.LLMRails.__setstate__(
    state
)
```

```python
nemoguardrails.rails.llm.llmrails.LLMRails._create_model_cache(
    model
) -> nemoguardrails.llm.cache.LFUCache
```

Create cache instance for a model based on its configuration.

**Parameters:**

The model configuration object

**Returns:** `LFUCache`

The cache instance

```python
nemoguardrails.rails.llm.llmrails.LLMRails._ensure_explain_info() -> nemoguardrails.logging.explain.ExplainInfo
```

staticmethod

Ensure that the ExplainInfo variable is present in the current context

**Returns:** `ExplainInfo`

A ExplainInfo class containing the llm calls' statistics

```python
nemoguardrails.rails.llm.llmrails.LLMRails._get_embeddings_search_provider_instance(
    esp_config: typing.Optional[nemoguardrails.rails.llm.config.EmbeddingSearchProvider] = None
) -> nemoguardrails.embeddings.index.EmbeddingsIndex
```

```python
nemoguardrails.rails.llm.llmrails.LLMRails._get_events_for_messages(
    messages: typing.List[dict],
    state: typing.Any
)
```

Return the list of events corresponding to the provided messages.

Tries to find a prefix of messages for which we have already a list of events
in the cache. For the rest, they are converted as is.

The reason this cache exists is that we want to benefit from events generated in
previous turns, which can't be computed again because it would be expensive (e.g.,
involving multiple LLM calls).

When an explicit state object will be added, this mechanism can be removed.

**Parameters:**

The list of messages.

**Returns:**

A list of events.

```python
nemoguardrails.rails.llm.llmrails.LLMRails._init_kb()
```

async

Initializes the knowledge base.

```python
nemoguardrails.rails.llm.llmrails.LLMRails._init_llms()
```

Initializes the right LLM engines based on the configuration.
There can be multiple LLM engines and types that can be specified in the config.
The main LLM engine is the one that will be used for all the core guardrails generations.
Other LLM engines can be specified for use in specific actions.

The reason we provide an option for decoupling the main LLM engine from the action LLM
is to allow for flexibility in using specialized LLM engines for specific actions.

**Raises:**

* `ModelInitializationError`: If any model initialization fails

```python
nemoguardrails.rails.llm.llmrails.LLMRails._initialize_model_caches() -> None
```

Initialize caches for configured models.

```python
nemoguardrails.rails.llm.llmrails.LLMRails._prepare_model_kwargs(
    model_config
)
```

Prepare kwargs for model initialization, including API key from environment variable.

**Parameters:**

The model configuration object

**Returns:**

The prepared kwargs for model initialization

```python
nemoguardrails.rails.llm.llmrails.LLMRails._run_output_rails_in_streaming(
    streaming_handler: typing.AsyncIterator[str],
    output_rails_streaming_config: nemoguardrails.rails.llm.config.OutputRailsStreamingConfig,
    prompt: typing.Optional[str] = None,
    messages: typing.Optional[typing.List[dict]] = None,
    stream_first: typing.Optional[bool] = None
) -> typing.AsyncIterator[str]
```

async

1. Buffers tokens from 'streaming\_handler' via BufferStrategy.
2. Runs sequential (parallel for colang 2.0 in future) flows for each chunk.
3. Yields the chunk if not blocked, or STOP if blocked.

```python
nemoguardrails.rails.llm.llmrails.LLMRails._validate_config()
```

Runs additional validation checks on the config.

```python
nemoguardrails.rails.llm.llmrails.LLMRails._validate_public_state(
    state: typing.Optional[typing.Union[dict, nemoguardrails.colang.v2_x.runtime.flows.State]]
) -> None
```

Validate public dict state passed through generate/generate\_async.

```python
nemoguardrails.rails.llm.llmrails.LLMRails._validate_streaming_with_output_rails() -> None
```

```python
nemoguardrails.rails.llm.llmrails.LLMRails.check(
    messages: typing.List[dict],
    rail_types: typing.Optional[typing.List[nemoguardrails.rails.llm.options.RailType]] = None
) -> nemoguardrails.rails.llm.options.RailsResult
```

Run rails on messages based on their content (synchronous).

This is a synchronous wrapper around check\_async().

**Parameters:**

List of message dicts with 'role' and 'content' fields.

Optional list of rail types to run. See check\_async() for details.

**Returns:** `RailsResult`

RailsResult containing status, content, and optional blocking rail name.

```python
nemoguardrails.rails.llm.llmrails.LLMRails.check_async(
    messages: typing.List[dict],
    rail_types: typing.Optional[typing.List[nemoguardrails.rails.llm.options.RailType]] = None
) -> nemoguardrails.rails.llm.options.RailsResult
```

async

Run rails on messages based on their content (asynchronous).

When `rail_types` is not provided, automatically determines which rails
to run based on message roles:

* Only user messages: runs input rails
* Only assistant messages: runs output rails
* Both user and assistant messages: runs both input and output rails
* No user/assistant messages: logs warning and returns passing result

When `rail_types` is provided, runs exactly the specified rail types,
skipping the auto-detection logic.

**Parameters:**

List of message dicts with 'role' and 'content' fields.
Messages can contain any roles, but only user/assistant roles
determine which rails execute when `rail_types` is not provided.

Optional list of rail types to run, e.g.
`[RailType.INPUT]` or `[RailType.OUTPUT]`.
When provided, overrides automatic detection.

**Returns:** `RailsResult`

RailsResult containing:

**Examples:**

```python
Check user input (auto-detected)::

    result = await rails.check_async([{"role": "user", "content": "Hello!"}])
    if result.status == RailStatus.BLOCKED:
        print(f"Blocked by: {result.rail}")

Check bot output with context (auto-detected)::

    result = await rails.check_async([
        {"role": "user", "content": "Hello!"},
        {"role": "assistant", "content": "Hi there!"}
    ])

Run only input rails explicitly::

    result = await rails.check_async(messages, rail_types=[RailType.INPUT])
```

```python
nemoguardrails.rails.llm.llmrails.LLMRails.explain() -> nemoguardrails.logging.explain.ExplainInfo
```

Helper function to return the latest ExplainInfo object.

```python
nemoguardrails.rails.llm.llmrails.LLMRails.generate(
    prompt: typing.Optional[str] = None,
    messages: typing.Optional[typing.List[dict]] = None,
    options: typing.Optional[typing.Union[dict, nemoguardrails.rails.llm.options.GenerationOptions]] = None,
    state: typing.Optional[dict] = None
)
```

Synchronous version of generate\_async.

```python
nemoguardrails.rails.llm.llmrails.LLMRails.generate_async(
    prompt: typing.Optional[str] = None,
    messages: typing.Optional[typing.List[dict]] = None,
    options: typing.Optional[typing.Union[dict, nemoguardrails.rails.llm.options.GenerationOptions]] = None,
    state: typing.Optional[typing.Union[dict, nemoguardrails.colang.v2_x.runtime.flows.State]] = None,
    streaming_handler: typing.Optional[nemoguardrails.streaming.StreamingHandler] = None
) -> typing.Union[str, dict, nemoguardrails.rails.llm.options.GenerationResponse, typing.Tuple[dict, dict]]
```

async

Generate a completion or a next message.

The format for messages is the following::

\[
\{"role": "context", "content": \{"user\_name": "John"}},
\{"role": "user", "content": "Hello! How are you?"},
\{"role": "assistant", "content": "I am fine, thank you!"},
\{"role": "event", "event": \{"type": "UserSilent"}},
...
]

System messages are not yet supported.

**Parameters:**

The prompt to be used for completion.

The history of messages to be used to generate the next message.

Options specific for the generation.

The state object that should be used as the starting point.

If specified, and the config supports streaming, the
provided handler will be used for streaming.

**Returns:** `Union[str, dict, GenerationResponse, Tuple[dict, dict]]`

The completion (when a prompt is provided) or the next message.

```python
nemoguardrails.rails.llm.llmrails.LLMRails.generate_events(
    events: typing.List[dict]
) -> typing.List[dict]
```

Synchronous version of `LLMRails.generate_events_async`.

```python
nemoguardrails.rails.llm.llmrails.LLMRails.generate_events_async(
    events: typing.List[dict]
) -> typing.List[dict]
```

async

Generate the next events based on the provided history.

The format for events is the following::

\[
\{"type": "...", ...},
...
]

**Parameters:**

The history of events to be used to generate the next events.

The options to be used for the generation.

**Returns:** `List[dict]`

The newly generate event(s).

```python
nemoguardrails.rails.llm.llmrails.LLMRails.process_events(
    events: typing.List[dict],
    state: typing.Union[typing.Optional[dict], nemoguardrails.colang.v2_x.runtime.flows.State] = None,
    blocking: bool = False
) -> typing.Tuple[typing.List[dict], typing.Union[dict, nemoguardrails.colang.v2_x.runtime.flows.State]]
```

Synchronous version of `LLMRails.process_events_async`.

```python
nemoguardrails.rails.llm.llmrails.LLMRails.process_events_async(
    events: typing.List[dict],
    state: typing.Union[typing.Optional[dict], nemoguardrails.colang.v2_x.runtime.flows.State] = None,
    blocking: bool = False
) -> typing.Tuple[typing.List[dict], typing.Union[dict, nemoguardrails.colang.v2_x.runtime.flows.State]]
```

async

Process a sequence of events in a given state.

The events will be processed one by one, in the input order.

**Parameters:**

A sequence of events that needs to be processed.

The state that should be used as the starting point. If not provided,
a clean state will be used.

**Returns:** `Tuple[List[dict], Union[dict, State]]`

(output\_events, output\_state) Returns a sequence of output events and an output
state.

```python
nemoguardrails.rails.llm.llmrails.LLMRails.register_action(
    action: typing.Callable,
    name: typing.Optional[str] = None
) -> typing_extensions.Self
```

Register a custom action for the rails configuration.

```python
nemoguardrails.rails.llm.llmrails.LLMRails.register_action_param(
    name: str,
    value: typing.Any
) -> typing_extensions.Self
```

Registers a custom action parameter.

```python
nemoguardrails.rails.llm.llmrails.LLMRails.register_embedding_provider(
    name: typing.Optional[str] = None
) -> typing_extensions.Self
```

Register a custom embedding provider.

**Parameters:**

The embedding model class.

The name of the embedding engine. If available in the model, it will be used.

**Raises:**

* `ValueError`: If the engine name is not provided and the model does not have an engine name.
* `ValueError`: If the model does not have 'encode' or 'encode\_async' methods.

```python
nemoguardrails.rails.llm.llmrails.LLMRails.register_embedding_search_provider(
    name: str
) -> typing_extensions.Self
```

Register a new embedding search provider.

**Parameters:**

The name of the embedding search provider that will be used.

The class that will be used to generate and search embedding

```python
nemoguardrails.rails.llm.llmrails.LLMRails.register_filter(
    filter_fn: typing.Callable,
    name: typing.Optional[str] = None
) -> typing_extensions.Self
```

Register a custom filter for the rails configuration.

```python
nemoguardrails.rails.llm.llmrails.LLMRails.register_output_parser(
    output_parser: typing.Callable,
    name: str
) -> typing_extensions.Self
```

Register a custom output parser for the rails configuration.

```python
nemoguardrails.rails.llm.llmrails.LLMRails.register_prompt_context(
    name: str,
    value_or_fn: typing.Any
) -> typing_extensions.Self
```

Register a value to be included in the prompt context.

:name: The name of the variable or function that will be used.
:value\_or\_fn: The value or function that will be used to generate the value.

```python
nemoguardrails.rails.llm.llmrails.LLMRails.stream_async(
    prompt: typing.Optional[str] = None,
    messages: typing.Optional[typing.List[dict]] = None,
    options: typing.Optional[typing.Union[dict, nemoguardrails.rails.llm.options.GenerationOptions]] = None,
    state: typing.Optional[typing.Union[dict, nemoguardrails.colang.v2_x.runtime.flows.State]] = None,
    include_metadata: typing.Optional[bool] = False,
    generator: typing.Optional[typing.AsyncIterator[str]] = None,
    include_generation_metadata: typing.Optional[bool] = None
) -> typing.AsyncIterator[typing.Union[str, dict]]
```

Simplified interface for getting directly the streamed tokens from the LLM.

```python
nemoguardrails.rails.llm.llmrails.LLMRails.update_llm(
    llm: nemoguardrails.types.LLMModel
)
```

Replace the main LLM with the provided one.

**Parameters:**

The new LLM that should be used.

```python
nemoguardrails.rails.llm.llmrails._determine_rails_from_messages(
    messages: typing.List[dict]
) -> typing.Optional[dict]
```

```python
nemoguardrails.rails.llm.llmrails._get_blocking_rail(
    response: nemoguardrails.rails.llm.options.GenerationResponse
) -> typing.Optional[str]
```

```python
nemoguardrails.rails.llm.llmrails._get_last_content_by_role(
    messages: typing.List[dict],
    role: str
) -> str
```

```python
nemoguardrails.rails.llm.llmrails._get_last_response_content(
    response: nemoguardrails.rails.llm.options.GenerationResponse
) -> str
```

```python
nemoguardrails.rails.llm.llmrails._normalize_messages_for_rails(
    messages: typing.List[dict],
    rails: typing.List[str]
) -> typing.List[dict]
```

```python
nemoguardrails.rails.llm.llmrails._wrap_legacy_llm(
    llm
)
```

```python
nemoguardrails.rails.llm.llmrails.log = logging.getLogger(__name__)
```

```python
nemoguardrails.rails.llm.llmrails.process_events_semaphore = asyncio.Semaphore(1)
```