nemoguardrails.rails.llm.llmrails
LLM Rails entry point.
Module Contents
Classes
Functions
Data
API
Bases: BaseGuardrails
Rails based on a given configuration.
The optional passthrough function that bypasses LLM generation.
When set, the rails pipeline calls this function instead of the main LLM
for generating responses. LLMGenerationActions is private, expose only
passthrough_fn as a public API
Create cache instance for a model based on its configuration.
Parameters:
The model configuration object
Returns: LFUCache
The cache instance
Ensure that the ExplainInfo variable is present in the current context
Returns: ExplainInfo
A ExplainInfo class containing the llm calls’ statistics
Return the list of events corresponding to the provided messages.
Tries to find a prefix of messages for which we have already a list of events in the cache. For the rest, they are converted as is.
The reason this cache exists is that we want to benefit from events generated in previous turns, which can’t be computed again because it would be expensive (e.g., involving multiple LLM calls).
When an explicit state object will be added, this mechanism can be removed.
Parameters:
The list of messages.
Returns:
A list of events.
Initializes the knowledge base.
Initializes the right LLM engines based on the configuration. There can be multiple LLM engines and types that can be specified in the config. The main LLM engine is the one that will be used for all the core guardrails generations. Other LLM engines can be specified for use in specific actions.
The reason we provide an option for decoupling the main LLM engine from the action LLM is to allow for flexibility in using specialized LLM engines for specific actions.
Raises:
ModelInitializationError: If any model initialization fails
Initialize caches for configured models.
Prepare kwargs for model initialization, including API key from environment variable.
Parameters:
The model configuration object
Returns:
The prepared kwargs for model initialization
- Buffers tokens from ‘streaming_handler’ via BufferStrategy.
- Runs sequential (parallel for colang 2.0 in future) flows for each chunk.
- Yields the chunk if not blocked, or STOP if blocked.
Runs additional validation checks on the config.
Validate public dict state passed through generate/generate_async.
Run rails on messages based on their content (synchronous).
This is a synchronous wrapper around check_async().
Parameters:
List of message dicts with ‘role’ and ‘content’ fields.
Optional list of rail types to run. See check_async() for details.
Returns: RailsResult
RailsResult containing status, content, and optional blocking rail name.
Run rails on messages based on their content (asynchronous).
When rail_types is not provided, automatically determines which rails
to run based on message roles:
- Only user messages: runs input rails
- Only assistant messages: runs output rails
- Both user and assistant messages: runs both input and output rails
- No user/assistant messages: logs warning and returns passing result
When rail_types is provided, runs exactly the specified rail types,
skipping the auto-detection logic.
Parameters:
List of message dicts with ‘role’ and ‘content’ fields.
Messages can contain any roles, but only user/assistant roles
determine which rails execute when rail_types is not provided.
Optional list of rail types to run, e.g.
[RailType.INPUT] or [RailType.OUTPUT].
When provided, overrides automatic detection.
Returns: RailsResult
RailsResult containing:
Examples:
Helper function to return the latest ExplainInfo object.
Synchronous version of generate_async.
Generate a completion or a next message.
The format for messages is the following::
[ {“role”: “context”, “content”: {“user_name”: “John”}}, {“role”: “user”, “content”: “Hello! How are you?”}, {“role”: “assistant”, “content”: “I am fine, thank you!”}, {“role”: “event”, “event”: {“type”: “UserSilent”}}, … ]
System messages are not yet supported.
Parameters:
The prompt to be used for completion.
The history of messages to be used to generate the next message.
Options specific for the generation.
The state object that should be used as the starting point.
If specified, and the config supports streaming, the provided handler will be used for streaming.
Returns: Union[str, dict, GenerationResponse, Tuple[dict, dict]]
The completion (when a prompt is provided) or the next message.
Synchronous version of LLMRails.generate_events_async.
Generate the next events based on the provided history.
The format for events is the following::
[ {“type”: ”…”, …}, … ]
Parameters:
The history of events to be used to generate the next events.
The options to be used for the generation.
Returns: List[dict]
The newly generate event(s).
Synchronous version of LLMRails.process_events_async.
Process a sequence of events in a given state.
The events will be processed one by one, in the input order.
Parameters:
A sequence of events that needs to be processed.
The state that should be used as the starting point. If not provided, a clean state will be used.
Returns: Tuple[List[dict], Union[dict, State]]
(output_events, output_state) Returns a sequence of output events and an output state.
Register a custom action for the rails configuration.
Registers a custom action parameter.
Register a custom embedding provider.
Parameters:
The embedding model class.
The name of the embedding engine. If available in the model, it will be used.
Raises:
ValueError: If the engine name is not provided and the model does not have an engine name.ValueError: If the model does not have ‘encode’ or ‘encode_async’ methods.
Register a new embedding search provider.
Parameters:
The name of the embedding search provider that will be used.
The class that will be used to generate and search embedding
Register a custom filter for the rails configuration.
Register a custom output parser for the rails configuration.
Register a value to be included in the prompt context.
:name: The name of the variable or function that will be used. :value_or_fn: The value or function that will be used to generate the value.
Simplified interface for getting directly the streamed tokens from the LLM.
Replace the main LLM with the provided one.
Parameters:
The new LLM that should be used.