nemoguardrails.rails.llm.llmrails

View as Markdown

LLM Rails entry point.

Module Contents

Classes

NameDescription
LLMRailsRails based on a given configuration.

Functions

Data

log

process_events_semaphore

API

class nemoguardrails.rails.llm.llmrails.LLMRails(
config: nemoguardrails.rails.llm.config.RailsConfig,
llm: typing.Optional[nemoguardrails.types.LLMModel] = None,
verbose: bool = False
)

Bases: BaseGuardrails

Rails based on a given configuration.

_default_embedding_engine
= 'FastEmbed'
_default_embedding_model
= 'all-MiniLM-L6-v2'
_default_embedding_params
= {}
_embedding_search_providers
= {}
_llm_generation_actions
_log_adapters
= create_log_adapters(config.tracing)
events_history_cache
= {}
llm
Optional[LLMModel]
passthrough_fn

The optional passthrough function that bypasses LLM generation.

When set, the rails pipeline calls this function instead of the main LLM for generating responses. LLMGenerationActions is private, expose only passthrough_fn as a public API

runtime
Runtime
nemoguardrails.rails.llm.llmrails.LLMRails.__getstate__()
nemoguardrails.rails.llm.llmrails.LLMRails.__setstate__(
state
)
nemoguardrails.rails.llm.llmrails.LLMRails._create_model_cache(
model
) -> nemoguardrails.llm.cache.LFUCache

Create cache instance for a model based on its configuration.

Parameters:

model

The model configuration object

Returns: LFUCache

The cache instance

nemoguardrails.rails.llm.llmrails.LLMRails._ensure_explain_info() -> nemoguardrails.logging.explain.ExplainInfo
staticmethod

Ensure that the ExplainInfo variable is present in the current context

Returns: ExplainInfo

A ExplainInfo class containing the llm calls’ statistics

nemoguardrails.rails.llm.llmrails.LLMRails._get_embeddings_search_provider_instance(
esp_config: typing.Optional[nemoguardrails.rails.llm.config.EmbeddingSearchProvider] = None
) -> nemoguardrails.embeddings.index.EmbeddingsIndex
nemoguardrails.rails.llm.llmrails.LLMRails._get_events_for_messages(
messages: typing.List[dict],
state: typing.Any
)

Return the list of events corresponding to the provided messages.

Tries to find a prefix of messages for which we have already a list of events in the cache. For the rest, they are converted as is.

The reason this cache exists is that we want to benefit from events generated in previous turns, which can’t be computed again because it would be expensive (e.g., involving multiple LLM calls).

When an explicit state object will be added, this mechanism can be removed.

Parameters:

messages
List[dict]

The list of messages.

Returns:

A list of events.

nemoguardrails.rails.llm.llmrails.LLMRails._init_kb()
async

Initializes the knowledge base.

nemoguardrails.rails.llm.llmrails.LLMRails._init_llms()

Initializes the right LLM engines based on the configuration. There can be multiple LLM engines and types that can be specified in the config. The main LLM engine is the one that will be used for all the core guardrails generations. Other LLM engines can be specified for use in specific actions.

The reason we provide an option for decoupling the main LLM engine from the action LLM is to allow for flexibility in using specialized LLM engines for specific actions.

Raises:

  • ModelInitializationError: If any model initialization fails
nemoguardrails.rails.llm.llmrails.LLMRails._initialize_model_caches() -> None

Initialize caches for configured models.

nemoguardrails.rails.llm.llmrails.LLMRails._prepare_model_kwargs(
model_config
)

Prepare kwargs for model initialization, including API key from environment variable.

Parameters:

model_config

The model configuration object

Returns:

The prepared kwargs for model initialization

nemoguardrails.rails.llm.llmrails.LLMRails._run_output_rails_in_streaming(
streaming_handler: typing.AsyncIterator[str],
output_rails_streaming_config: nemoguardrails.rails.llm.config.OutputRailsStreamingConfig,
prompt: typing.Optional[str] = None,
messages: typing.Optional[typing.List[dict]] = None,
stream_first: typing.Optional[bool] = None
) -> typing.AsyncIterator[str]
async
  1. Buffers tokens from ‘streaming_handler’ via BufferStrategy.
  2. Runs sequential (parallel for colang 2.0 in future) flows for each chunk.
  3. Yields the chunk if not blocked, or STOP if blocked.
nemoguardrails.rails.llm.llmrails.LLMRails._validate_config()

Runs additional validation checks on the config.

nemoguardrails.rails.llm.llmrails.LLMRails._validate_public_state(
state: typing.Optional[typing.Union[dict, nemoguardrails.colang.v2_x.runtime.flows.State]]
) -> None

Validate public dict state passed through generate/generate_async.

nemoguardrails.rails.llm.llmrails.LLMRails._validate_streaming_with_output_rails() -> None
nemoguardrails.rails.llm.llmrails.LLMRails.check(
messages: typing.List[dict],
rail_types: typing.Optional[typing.List[nemoguardrails.rails.llm.options.RailType]] = None
) -> nemoguardrails.rails.llm.options.RailsResult

Run rails on messages based on their content (synchronous).

This is a synchronous wrapper around check_async().

Parameters:

messages
List[dict]

List of message dicts with ‘role’ and ‘content’ fields.

rail_types
Optional[List[RailType]]Defaults to None

Optional list of rail types to run. See check_async() for details.

Returns: RailsResult

RailsResult containing status, content, and optional blocking rail name.

nemoguardrails.rails.llm.llmrails.LLMRails.check_async(
messages: typing.List[dict],
rail_types: typing.Optional[typing.List[nemoguardrails.rails.llm.options.RailType]] = None
) -> nemoguardrails.rails.llm.options.RailsResult
async

Run rails on messages based on their content (asynchronous).

When rail_types is not provided, automatically determines which rails to run based on message roles:

  • Only user messages: runs input rails
  • Only assistant messages: runs output rails
  • Both user and assistant messages: runs both input and output rails
  • No user/assistant messages: logs warning and returns passing result

When rail_types is provided, runs exactly the specified rail types, skipping the auto-detection logic.

Parameters:

messages
List[dict]

List of message dicts with ‘role’ and ‘content’ fields. Messages can contain any roles, but only user/assistant roles determine which rails execute when rail_types is not provided.

rail_types
Optional[List[RailType]]Defaults to None

Optional list of rail types to run, e.g. [RailType.INPUT] or [RailType.OUTPUT]. When provided, overrides automatic detection.

Returns: RailsResult

RailsResult containing:

Examples:

Check user input (auto-detected)::
result = await rails.check_async([{"role": "user", "content": "Hello!"}])
if result.status == RailStatus.BLOCKED:
print(f"Blocked by: {result.rail}")
Check bot output with context (auto-detected)::
result = await rails.check_async([
{"role": "user", "content": "Hello!"},
{"role": "assistant", "content": "Hi there!"}
])
Run only input rails explicitly::
result = await rails.check_async(messages, rail_types=[RailType.INPUT])
nemoguardrails.rails.llm.llmrails.LLMRails.explain() -> nemoguardrails.logging.explain.ExplainInfo

Helper function to return the latest ExplainInfo object.

nemoguardrails.rails.llm.llmrails.LLMRails.generate(
prompt: typing.Optional[str] = None,
messages: typing.Optional[typing.List[dict]] = None,
options: typing.Optional[typing.Union[dict, nemoguardrails.rails.llm.options.GenerationOptions]] = None,
state: typing.Optional[dict] = None
)

Synchronous version of generate_async.

nemoguardrails.rails.llm.llmrails.LLMRails.generate_async(
prompt: typing.Optional[str] = None,
messages: typing.Optional[typing.List[dict]] = None,
options: typing.Optional[typing.Union[dict, nemoguardrails.rails.llm.options.GenerationOptions]] = None,
state: typing.Optional[typing.Union[dict, nemoguardrails.colang.v2_x.runtime.flows.State]] = None,
streaming_handler: typing.Optional[nemoguardrails.streaming.StreamingHandler] = None
) -> typing.Union[str, dict, nemoguardrails.rails.llm.options.GenerationResponse, typing.Tuple[dict, dict]]
async

Generate a completion or a next message.

The format for messages is the following::

[ {“role”: “context”, “content”: {“user_name”: “John”}}, {“role”: “user”, “content”: “Hello! How are you?”}, {“role”: “assistant”, “content”: “I am fine, thank you!”}, {“role”: “event”, “event”: {“type”: “UserSilent”}}, … ]

System messages are not yet supported.

Parameters:

prompt
Optional[str]Defaults to None

The prompt to be used for completion.

messages
Optional[List[dict]]Defaults to None

The history of messages to be used to generate the next message.

options
Optional[Union[dict, GenerationOptions]]Defaults to None

Options specific for the generation.

state
Optional[Union[dict, State]]Defaults to None

The state object that should be used as the starting point.

streaming_handler
Optional[StreamingHandler]Defaults to None

If specified, and the config supports streaming, the provided handler will be used for streaming.

Returns: Union[str, dict, GenerationResponse, Tuple[dict, dict]]

The completion (when a prompt is provided) or the next message.

nemoguardrails.rails.llm.llmrails.LLMRails.generate_events(
events: typing.List[dict]
) -> typing.List[dict]

Synchronous version of LLMRails.generate_events_async.

nemoguardrails.rails.llm.llmrails.LLMRails.generate_events_async(
events: typing.List[dict]
) -> typing.List[dict]
async

Generate the next events based on the provided history.

The format for events is the following::

[ {“type”: ”…”, …}, … ]

Parameters:

events
List[dict]

The history of events to be used to generate the next events.

options

The options to be used for the generation.

Returns: List[dict]

The newly generate event(s).

nemoguardrails.rails.llm.llmrails.LLMRails.process_events(
events: typing.List[dict],
state: typing.Union[typing.Optional[dict], nemoguardrails.colang.v2_x.runtime.flows.State] = None,
blocking: bool = False
) -> typing.Tuple[typing.List[dict], typing.Union[dict, nemoguardrails.colang.v2_x.runtime.flows.State]]

Synchronous version of LLMRails.process_events_async.

nemoguardrails.rails.llm.llmrails.LLMRails.process_events_async(
events: typing.List[dict],
state: typing.Union[typing.Optional[dict], nemoguardrails.colang.v2_x.runtime.flows.State] = None,
blocking: bool = False
) -> typing.Tuple[typing.List[dict], typing.Union[dict, nemoguardrails.colang.v2_x.runtime.flows.State]]
async

Process a sequence of events in a given state.

The events will be processed one by one, in the input order.

Parameters:

events
List[dict]

A sequence of events that needs to be processed.

state
Union[Optional[dict], State]Defaults to None

The state that should be used as the starting point. If not provided, a clean state will be used.

Returns: Tuple[List[dict], Union[dict, State]]

(output_events, output_state) Returns a sequence of output events and an output state.

nemoguardrails.rails.llm.llmrails.LLMRails.register_action(
action: typing.Callable,
name: typing.Optional[str] = None
) -> typing_extensions.Self

Register a custom action for the rails configuration.

nemoguardrails.rails.llm.llmrails.LLMRails.register_action_param(
name: str,
value: typing.Any
) -> typing_extensions.Self

Registers a custom action parameter.

nemoguardrails.rails.llm.llmrails.LLMRails.register_embedding_provider(
name: typing.Optional[str] = None
) -> typing_extensions.Self

Register a custom embedding provider.

Parameters:

model
Type[EmbeddingModel]

The embedding model class.

name
strDefaults to None

The name of the embedding engine. If available in the model, it will be used.

Raises:

  • ValueError: If the engine name is not provided and the model does not have an engine name.
  • ValueError: If the model does not have ‘encode’ or ‘encode_async’ methods.
nemoguardrails.rails.llm.llmrails.LLMRails.register_embedding_search_provider(
name: str
) -> typing_extensions.Self

Register a new embedding search provider.

Parameters:

name
str

The name of the embedding search provider that will be used.

cls
Type[EmbeddingsIndex]

The class that will be used to generate and search embedding

nemoguardrails.rails.llm.llmrails.LLMRails.register_filter(
filter_fn: typing.Callable,
name: typing.Optional[str] = None
) -> typing_extensions.Self

Register a custom filter for the rails configuration.

nemoguardrails.rails.llm.llmrails.LLMRails.register_output_parser(
output_parser: typing.Callable,
name: str
) -> typing_extensions.Self

Register a custom output parser for the rails configuration.

nemoguardrails.rails.llm.llmrails.LLMRails.register_prompt_context(
name: str,
value_or_fn: typing.Any
) -> typing_extensions.Self

Register a value to be included in the prompt context.

:name: The name of the variable or function that will be used. :value_or_fn: The value or function that will be used to generate the value.

nemoguardrails.rails.llm.llmrails.LLMRails.stream_async(
prompt: typing.Optional[str] = None,
messages: typing.Optional[typing.List[dict]] = None,
options: typing.Optional[typing.Union[dict, nemoguardrails.rails.llm.options.GenerationOptions]] = None,
state: typing.Optional[typing.Union[dict, nemoguardrails.colang.v2_x.runtime.flows.State]] = None,
include_metadata: typing.Optional[bool] = False,
generator: typing.Optional[typing.AsyncIterator[str]] = None,
include_generation_metadata: typing.Optional[bool] = None
) -> typing.AsyncIterator[typing.Union[str, dict]]

Simplified interface for getting directly the streamed tokens from the LLM.

nemoguardrails.rails.llm.llmrails.LLMRails.update_llm(
llm: nemoguardrails.types.LLMModel
)

Replace the main LLM with the provided one.

Parameters:

llm
LLMModel

The new LLM that should be used.

nemoguardrails.rails.llm.llmrails._determine_rails_from_messages(
messages: typing.List[dict]
) -> typing.Optional[dict]
nemoguardrails.rails.llm.llmrails._get_blocking_rail(
response: nemoguardrails.rails.llm.options.GenerationResponse
) -> typing.Optional[str]
nemoguardrails.rails.llm.llmrails._get_last_content_by_role(
messages: typing.List[dict],
role: str
) -> str
nemoguardrails.rails.llm.llmrails._get_last_response_content(
response: nemoguardrails.rails.llm.options.GenerationResponse
) -> str
nemoguardrails.rails.llm.llmrails._normalize_messages_for_rails(
messages: typing.List[dict],
rails: typing.List[str]
) -> typing.List[dict]
nemoguardrails.rails.llm.llmrails._wrap_legacy_llm(
llm
)
nemoguardrails.rails.llm.llmrails.log = logging.getLogger(__name__)
nemoguardrails.rails.llm.llmrails.process_events_semaphore = asyncio.Semaphore(1)