nemoguardrails.rails.llm.llmrails | NVIDIA NeMo Guardrails Library Developer Guide

LLM Rails entry point.

Module Contents

Classes

Name	Description
`LLMRails`	Rails based on a given configuration.

Functions

Name	Description
`_determine_rails_from_messages`	-
`_get_blocking_rail`	-
`_get_last_content_by_role`	-
`_get_last_response_content`	-
`_normalize_messages_for_rails`	-
`_wrap_legacy_llm`	-

Data

log

process_events_semaphore

API

class nemoguardrails.rails.llm.llmrails.LLMRails(
    config: nemoguardrails.rails.llm.config.RailsConfig,
    llm: typing.Optional[nemoguardrails.types.LLMModel] = None,
    verbose: bool = False
)

Bases: BaseGuardrails

Rails based on a given configuration.

_default_embedding_engine

= 'FastEmbed'

_default_embedding_model

= 'all-MiniLM-L6-v2'

_default_embedding_params

= {}

_embedding_search_providers

= {}

_llm_generation_actions

_log_adapters

= create_log_adapters(config.tracing)

events_history_cache

= {}

llm

Optional[LLMModel]

passthrough_fn

The optional passthrough function that bypasses LLM generation.

When set, the rails pipeline calls this function instead of the main LLM for generating responses. LLMGenerationActions is private, expose only passthrough_fn as a public API

runtime

Runtime

nemoguardrails.rails.llm.llmrails.LLMRails.__getstate__()

nemoguardrails.rails.llm.llmrails.LLMRails.__setstate__(
    state
)

nemoguardrails.rails.llm.llmrails.LLMRails._create_model_cache(
    model
) -> nemoguardrails.llm.cache.LFUCache

Create cache instance for a model based on its configuration.

Parameters:

model

The model configuration object

Returns: LFUCache

The cache instance

nemoguardrails.rails.llm.llmrails.LLMRails._ensure_explain_info() -> nemoguardrails.logging.explain.ExplainInfo

staticmethod

Ensure that the ExplainInfo variable is present in the current context

Returns: ExplainInfo

A ExplainInfo class containing the llm calls’ statistics

nemoguardrails.rails.llm.llmrails.LLMRails._get_embeddings_search_provider_instance(
    esp_config: typing.Optional[nemoguardrails.rails.llm.config.EmbeddingSearchProvider] = None
) -> nemoguardrails.embeddings.index.EmbeddingsIndex

nemoguardrails.rails.llm.llmrails.LLMRails._get_events_for_messages(
    messages: typing.List[dict],
    state: typing.Any
)

Return the list of events corresponding to the provided messages.

Tries to find a prefix of messages for which we have already a list of events in the cache. For the rest, they are converted as is.

The reason this cache exists is that we want to benefit from events generated in previous turns, which can’t be computed again because it would be expensive (e.g., involving multiple LLM calls).

When an explicit state object will be added, this mechanism can be removed.

Parameters:

messages

List[dict]

The list of messages.

Returns:

A list of events.

nemoguardrails.rails.llm.llmrails.LLMRails._init_kb()

async

Initializes the knowledge base.

nemoguardrails.rails.llm.llmrails.LLMRails._init_llms()

Initializes the right LLM engines based on the configuration. There can be multiple LLM engines and types that can be specified in the config. The main LLM engine is the one that will be used for all the core guardrails generations. Other LLM engines can be specified for use in specific actions.

The reason we provide an option for decoupling the main LLM engine from the action LLM is to allow for flexibility in using specialized LLM engines for specific actions.

Raises:

ModelInitializationError: If any model initialization fails

nemoguardrails.rails.llm.llmrails.LLMRails._initialize_model_caches() -> None

Initialize caches for configured models.

nemoguardrails.rails.llm.llmrails.LLMRails._prepare_model_kwargs(
    model_config
)

Prepare kwargs for model initialization, including API key from environment variable.

Parameters:

model_config

The model configuration object

Returns:

The prepared kwargs for model initialization

nemoguardrails.rails.llm.llmrails.LLMRails._run_output_rails_in_streaming(
    streaming_handler: typing.AsyncIterator[str],
    output_rails_streaming_config: nemoguardrails.rails.llm.config.OutputRailsStreamingConfig,
    prompt: typing.Optional[str] = None,
    messages: typing.Optional[typing.List[dict]] = None,
    stream_first: typing.Optional[bool] = None
) -> typing.AsyncIterator[str]

async

Buffers tokens from ‘streaming_handler’ via BufferStrategy.
Runs sequential (parallel for colang 2.0 in future) flows for each chunk.
Yields the chunk if not blocked, or STOP if blocked.

nemoguardrails.rails.llm.llmrails.LLMRails._validate_config()

Runs additional validation checks on the config.

nemoguardrails.rails.llm.llmrails.LLMRails._validate_public_state(
    state: typing.Optional[typing.Union[dict, nemoguardrails.colang.v2_x.runtime.flows.State]]
) -> None

Validate public dict state passed through generate/generate_async.

nemoguardrails.rails.llm.llmrails.LLMRails._validate_streaming_with_output_rails() -> None

nemoguardrails.rails.llm.llmrails.LLMRails.check(
    messages: typing.List[dict],
    rail_types: typing.Optional[typing.List[nemoguardrails.rails.llm.options.RailType]] = None
) -> nemoguardrails.rails.llm.options.RailsResult

Run rails on messages based on their content (synchronous).

This is a synchronous wrapper around check_async().

Parameters:

messages

List[dict]

List of message dicts with ‘role’ and ‘content’ fields.

rail_types

Optional[List[RailType]]Defaults to None

Optional list of rail types to run. See check_async() for details.

Returns: RailsResult

RailsResult containing status, content, and optional blocking rail name.

nemoguardrails.rails.llm.llmrails.LLMRails.check_async(
    messages: typing.List[dict],
    rail_types: typing.Optional[typing.List[nemoguardrails.rails.llm.options.RailType]] = None
) -> nemoguardrails.rails.llm.options.RailsResult

async

Run rails on messages based on their content (asynchronous).

When rail_types is not provided, automatically determines which rails to run based on message roles:

Only user messages: runs input rails
Only assistant messages: runs output rails
Both user and assistant messages: runs both input and output rails
No user/assistant messages: logs warning and returns passing result

When rail_types is provided, runs exactly the specified rail types, skipping the auto-detection logic.

Parameters:

messages

List[dict]

List of message dicts with ‘role’ and ‘content’ fields. Messages can contain any roles, but only user/assistant roles determine which rails execute when rail_types is not provided.

rail_types

Optional[List[RailType]]Defaults to None

Optional list of rail types to run, e.g. [RailType.INPUT] or [RailType.OUTPUT]. When provided, overrides automatic detection.

Returns: RailsResult

RailsResult containing:

Examples:

Check user input (auto-detected)::
    result = await rails.check_async([{"role": "user", "content": "Hello!"}])
    if result.status == RailStatus.BLOCKED:
        print(f"Blocked by: {result.rail}")
Check bot output with context (auto-detected)::
    result = await rails.check_async([
        {"role": "user", "content": "Hello!"},
        {"role": "assistant", "content": "Hi there!"}
    ])
Run only input rails explicitly::
    result = await rails.check_async(messages, rail_types=[RailType.INPUT])

nemoguardrails.rails.llm.llmrails.LLMRails.explain() -> nemoguardrails.logging.explain.ExplainInfo

Helper function to return the latest ExplainInfo object.

nemoguardrails.rails.llm.llmrails.LLMRails.generate(
    prompt: typing.Optional[str] = None,
    messages: typing.Optional[typing.List[dict]] = None,
    options: typing.Optional[typing.Union[dict, nemoguardrails.rails.llm.options.GenerationOptions]] = None,
    state: typing.Optional[dict] = None
)

Synchronous version of generate_async.

nemoguardrails.rails.llm.llmrails.LLMRails.generate_async(
    prompt: typing.Optional[str] = None,
    messages: typing.Optional[typing.List[dict]] = None,
    options: typing.Optional[typing.Union[dict, nemoguardrails.rails.llm.options.GenerationOptions]] = None,
    state: typing.Optional[typing.Union[dict, nemoguardrails.colang.v2_x.runtime.flows.State]] = None,
    streaming_handler: typing.Optional[nemoguardrails.streaming.StreamingHandler] = None
) -> typing.Union[str, dict, nemoguardrails.rails.llm.options.GenerationResponse, typing.Tuple[dict, dict]]

async

Generate a completion or a next message.

The format for messages is the following::

[ {“role”: “context”, “content”: {“user_name”: “John”}}, {“role”: “user”, “content”: “Hello! How are you?”}, {“role”: “assistant”, “content”: “I am fine, thank you!”}, {“role”: “event”, “event”: {“type”: “UserSilent”}}, … ]

System messages are not yet supported.

Parameters:

prompt

Optional[str]Defaults to None

The prompt to be used for completion.

messages

Optional[List[dict]]Defaults to None

The history of messages to be used to generate the next message.

options

Optional[Union[dict, GenerationOptions]]Defaults to None

Options specific for the generation.

state

Optional[Union[dict, State]]Defaults to None

The state object that should be used as the starting point.

streaming_handler

Optional[StreamingHandler]Defaults to None

If specified, and the config supports streaming, the provided handler will be used for streaming.

Returns: Union[str, dict, GenerationResponse, Tuple[dict, dict]]

The completion (when a prompt is provided) or the next message.

nemoguardrails.rails.llm.llmrails.LLMRails.generate_events(
    events: typing.List[dict]
) -> typing.List[dict]

Synchronous version of LLMRails.generate_events_async.

nemoguardrails.rails.llm.llmrails.LLMRails.generate_events_async(
    events: typing.List[dict]
) -> typing.List[dict]

async

Generate the next events based on the provided history.

The format for events is the following::

[ {“type”: ”…”, …}, … ]

Parameters:

events

List[dict]

The history of events to be used to generate the next events.

options

The options to be used for the generation.

Returns: List[dict]

The newly generate event(s).

nemoguardrails.rails.llm.llmrails.LLMRails.process_events(
    events: typing.List[dict],
    state: typing.Union[typing.Optional[dict], nemoguardrails.colang.v2_x.runtime.flows.State] = None,
    blocking: bool = False
) -> typing.Tuple[typing.List[dict], typing.Union[dict, nemoguardrails.colang.v2_x.runtime.flows.State]]

Synchronous version of LLMRails.process_events_async.

nemoguardrails.rails.llm.llmrails.LLMRails.process_events_async(
    events: typing.List[dict],
    state: typing.Union[typing.Optional[dict], nemoguardrails.colang.v2_x.runtime.flows.State] = None,
    blocking: bool = False
) -> typing.Tuple[typing.List[dict], typing.Union[dict, nemoguardrails.colang.v2_x.runtime.flows.State]]

async

Process a sequence of events in a given state.

The events will be processed one by one, in the input order.

Parameters:

events

List[dict]

A sequence of events that needs to be processed.

state

Union[Optional[dict], State]Defaults to None

The state that should be used as the starting point. If not provided, a clean state will be used.

Returns: Tuple[List[dict], Union[dict, State]]

(output_events, output_state) Returns a sequence of output events and an output state.

nemoguardrails.rails.llm.llmrails.LLMRails.register_action(
    action: typing.Callable,
    name: typing.Optional[str] = None
) -> typing_extensions.Self

nemoguardrails.rails.llm.llmrails.LLMRails.register_action_param(
    name: str,
    value: typing.Any
) -> typing_extensions.Self

Registers a custom action parameter.

nemoguardrails.rails.llm.llmrails.LLMRails.register_embedding_provider(
    name: typing.Optional[str] = None
) -> typing_extensions.Self

Parameters:

model

Type[EmbeddingModel]

The embedding model class.

name

strDefaults to None

The name of the embedding engine. If available in the model, it will be used.

Raises:

ValueError: If the engine name is not provided and the model does not have an engine name.
ValueError: If the model does not have ‘encode’ or ‘encode_async’ methods.

nemoguardrails.rails.llm.llmrails.LLMRails.register_embedding_search_provider(
    name: str
) -> typing_extensions.Self

Parameters:

name

str

The name of the embedding search provider that will be used.

cls

Type[EmbeddingsIndex]

The class that will be used to generate and search embedding

nemoguardrails.rails.llm.llmrails.LLMRails.register_filter(
    filter_fn: typing.Callable,
    name: typing.Optional[str] = None
) -> typing_extensions.Self

nemoguardrails.rails.llm.llmrails.LLMRails.register_output_parser(
    output_parser: typing.Callable,
    name: str
) -> typing_extensions.Self

nemoguardrails.rails.llm.llmrails.LLMRails.register_prompt_context(
    name: str,
    value_or_fn: typing.Any
) -> typing_extensions.Self

:name: The name of the variable or function that will be used. :value_or_fn: The value or function that will be used to generate the value.

nemoguardrails.rails.llm.llmrails.LLMRails.stream_async(
    prompt: typing.Optional[str] = None,
    messages: typing.Optional[typing.List[dict]] = None,
    options: typing.Optional[typing.Union[dict, nemoguardrails.rails.llm.options.GenerationOptions]] = None,
    state: typing.Optional[typing.Union[dict, nemoguardrails.colang.v2_x.runtime.flows.State]] = None,
    include_metadata: typing.Optional[bool] = False,
    generator: typing.Optional[typing.AsyncIterator[str]] = None,
    include_generation_metadata: typing.Optional[bool] = None
) -> typing.AsyncIterator[typing.Union[str, dict]]

Simplified interface for getting directly the streamed tokens from the LLM.

nemoguardrails.rails.llm.llmrails.LLMRails.update_llm(
    llm: nemoguardrails.types.LLMModel
)

Replace the main LLM with the provided one.

Parameters:

llm

LLMModel

The new LLM that should be used.

nemoguardrails.rails.llm.llmrails._determine_rails_from_messages(
    messages: typing.List[dict]
) -> typing.Optional[dict]

nemoguardrails.rails.llm.llmrails._get_blocking_rail(
    response: nemoguardrails.rails.llm.options.GenerationResponse
) -> typing.Optional[str]

nemoguardrails.rails.llm.llmrails._get_last_content_by_role(
    messages: typing.List[dict],
    role: str
) -> str

nemoguardrails.rails.llm.llmrails._get_last_response_content(
    response: nemoguardrails.rails.llm.options.GenerationResponse
) -> str

nemoguardrails.rails.llm.llmrails._normalize_messages_for_rails(
    messages: typing.List[dict],
    rails: typing.List[str]
) -> typing.List[dict]

nemoguardrails.rails.llm.llmrails._wrap_legacy_llm(
    llm
)

nemoguardrails.rails.llm.llmrails.log = logging.getLogger(__name__)

nemoguardrails.rails.llm.llmrails.process_events_semaphore = asyncio.Semaphore(1)