Engine Feature Support

View as Markdown

The NVIDIA NeMo Guardrails library supports two engines: LLMRails and IORails. This page explains what each engine is optimized for, how to select one, and which features each engine supports.

The LLMRails and IORails Engines

Both engines read the same RailsConfig object, but they support different feature sets.

LLMRails is designed for flexibility, and it supports all rail types with Colang 1.0 and 2.x so that you can define custom dialog flows. IORails is optimized for low-latency input, output, and tool rails. The Guardrails facade selects the optimal engine to use, based on the Guardrails configuration.

LLMRails

LLMRails is the full-featured, event-driven engine. It runs the complete Colang 1.0 and 2.x runtime, including dialog rails, input and output rails, retrieval (RAG and knowledge base) rails, execution rails (custom Python actions), tool rails, and embeddings. It is optimized for flexibility and complete conversational guardrailing, and it is the engine behind every capability that depends on the Colang runtime, custom actions, embeddings, or a custom LLM.

Instantiate it directly:

1from nemoguardrails import LLMRails, RailsConfig
2
3config = RailsConfig.from_path("path/to/config")
4rails = LLMRails(config)

IORails

IORails is optimized for accelerated input and output rail inference. It includes tool-calling rails. It runs the built-in NeMoGuard safety models (content safety, topic control, and jailbreak detection) and tool validation directly against the model endpoints, with optional parallel rail execution, admission control through an AsyncWorkQueue, OpenTelemetry token metrics, and optional speculative generation. It does not run the Colang dialog runtime, retrieval, custom actions, or accept a custom LLM, and it accepts Colang 1.0 configurations only.

IORails has a start() and stop() lifecycle that initializes and releases the engine’s model clients and work queue. Both generate_async() and stream_async() call start() automatically (it is idempotent), so a bare IORails does not need a manual start() before use; call start() at service startup to warm the clients and stop() at shutdown to release them. When you use the Guardrails facade described below, that lifecycle is managed for you through startup() and shutdown() (or by using Guardrails as an async context manager).

Choosing an Engine

The recommended entry point is the Guardrails facade, which routes a configuration to the appropriate engine automatically.

1from nemoguardrails import Guardrails, RailsConfig
2
3config = RailsConfig.from_path("path/to/config")
4
5# Auto-route: use IORails when the config is supported, otherwise fall back to LLMRails.
6rails = Guardrails(config)
7
8# Always use LLMRails.
9rails = Guardrails(config, use_iorails=False)
10
11# Require IORails and raise if the config is not supported.
12rails = Guardrails(config, require_iorails=True)

Guardrails(config) selects IORails when all of the following hold:

Otherwise the facade falls back to LLMRails and logs the reason. You can inspect that decision directly with IORails.unsupported_reason(config, llm), which returns the human-readable fallback reason, or None when IORails can handle the config.

Feature Support

Each section below covers one capability area, with a support table followed by a comparison of the two engines.

Legend: ✓ supported · ✗ not supported · ◐ partial (see notes).

Rail Types

FeatureLLMRailsIORailsNotes
Input rails
Output rails
Dialog railsRequire the Colang runtime, which IORails does not run
Retrieval (RAG and knowledge base) railsLLMRails only
Execution rails (custom actions)LLMRails only
Tool railsIORails: tool-call and tool-result validation; see Tool calling

LLMRails runs every rail direction through the Colang runtime: input, output, dialog, retrieval, and execution (custom action) rails. Input and output rails wrap the model call, dialog rails drive multi-turn conversation flows, retrieval rails guard a knowledge base, and execution rails run custom Python actions. Execution rails govern those custom actions; validating the model’s own tool calls and tool results is covered separately under Tool calling.

IORails runs input, output, and tool rails only, and it does so without the Colang runtime. Input rails run before the model call and output rails run after it, using a fixed set of built-in flows. Dialog, retrieval, and execution rails are not available on IORails; configurations that use them fall back to LLMRails.

Colang Language Support

FeatureLLMRailsIORailsNotes
Colang 1.0 configurations
Colang 2.x configurationsIORails accepts Colang 1.0 only

LLMRails runs both the Colang 1.0 and Colang 2.x runtimes, selecting the runtime from config.colang_version.

IORails accepts Colang 1.0 configurations only and runs no dialog flows. A Colang 2.x configuration is a fallback condition: Guardrails routes it to LLMRails.

Built-In NeMoGuard Safety Rails

FeatureLLMRailsIORailsNotes
Content safetyIORails: input and output
Topic controlIORails: input only
Jailbreak detection (NIM)IORails: input only

Both engines support the built-in NeMoGuard safety models: content safety, topic control, and jailbreak detection. On LLMRails these run as Colang flows and can be placed on input or output as the configuration allows.

IORails supports a fixed set of these flows per direction. On input it supports content safety, topic control, and jailbreak detection; on output it supports content safety only. A topic-control or jailbreak flow on the output rail is a fallback condition.

Tool Calling

FeatureLLMRailsIORailsNotes
Tool-call passthrough
Tool-call validation railIORails flow: tool call validation
Tool-result validation railIORails flow: tool result validation

Both engines support passing model tool calls through to the caller and validating tool calls and tool results. LLMRails handles these through the Colang runtime and tool rails.

IORails validates tool calls and tool results through directional flows: tool call validation on the tool-output rail and tool result validation on the tool-input rail. Tool calls are returned in the OpenAI-style tool_calls field of the response message.

Generation and Validation API

FeatureLLMRailsIORailsNotes
generate / generate_async
stream_async
Event-based API (generate_events / process_events)Requires the Colang runtime
check / check_async (rails-only validation)LLMRails only
GenerationOptionsIORails uses llm_params and rail toggles; no log or output_data
GenerationResponse (structured response object)IORails returns an OpenAI-style message dict
explain() / ExplainInfoLLMRails only

Both engines expose generate, generate_async, and stream_async. LLMRails can return a rich GenerationResponse and processes the full GenerationOptions object, including rail toggles, llm_params, logging options, and output_data. It also exposes the event-based API (generate_events and process_events), the rails-only validation methods (check and check_async), and explain() for debugging.

IORails returns an OpenAI-style message dictionary with role, content, and optional tool_calls, rather than a GenerationResponse. It accepts GenerationOptions but uses only llm_params and rail toggles. The event-based API, check and check_async, and explain() are not available; on the Guardrails facade these raise NotImplementedError when IORails is the active engine.

Streaming

FeatureLLMRailsIORailsNotes
Output-rail streaming
Streaming usage and metadataIORails: include_metadata=True
Parallel streaming output railsLLMRails streaming-buffer feature

Both engines stream responses through stream_async and support streaming output rails. Both can include streaming metadata; on IORails, pass include_metadata=True to receive dictionary-framed chunks such as {"text": ...} instead of plain strings. IORails does not add a separate metadata field to each streamed text chunk.

Parallel streaming output rails, where the output rail validates streamed chunks using the streaming buffer, is an LLMRails feature. IORails runs output rails over the streamed response but does not use the parallel streaming-buffer path, and speculative generation falls back to sequential execution while streaming.

Parallelism and Concurrency

FeatureLLMRailsIORailsNotes
Parallel rail executionrails.input.parallel / rails.output.parallel
Speculative generationInput rails race generation; non-streaming only
Admission control and concurrency limitsAsyncWorkQueue plus a streaming semaphore

Both engines run multiple rails in the same direction concurrently when rails.input.parallel or rails.output.parallel is set; the first rail to block short-circuits the result. For YAML examples, see Parallel Execution of Input and Output Rails.

IORails adds two concurrency capabilities that LLMRails does not provide. Speculative generation (rails.input.speculative_generation) runs input rails concurrently with model generation and discards the generation if an input rail blocks, reducing latency on the safe path; it applies to non-streaming generation only. For a configuration example, see Speculative Generation. Admission control through an AsyncWorkQueue (and a separate semaphore for streaming) bounds the number of in-flight requests and rejects work when the queue is full.

Reasoning-Model Support

FeatureLLMRailsIORailsNotes
Reasoning trace handling (<think> tags or reasoning field)
reasoning_content in a structured responseRequires GenerationResponse

Both engines preserve model reasoning traces, whether the model returns them in a dedicated reasoning field or inline within <think> tags, and both keep reasoning out of the prompt history sent back to the model.

LLMRails can expose reasoning in the structured response through reasoning_content. Because IORails returns a message dictionary rather than a GenerationResponse, the structured reasoning_content field is an LLMRails capability.

Multimodal

FeatureLLMRailsIORailsNotes
Multimodal (vision) input and output railsLLMRails only

Multimodal (vision) input and output rails, which run safety checks over image content alongside text, are supported by LLMRails.

IORails does not run multimodal safety rails over image content on its input and output rails; multimodal configurations route to LLMRails.

Observability

FeatureLLMRailsIORailsNotes
Tracing (OpenTelemetry spans)
Metrics (OpenTelemetry token and duration)LLMRails surfaces token statistics through logging
Prometheus exportThrough the OpenTelemetry metrics exporter
Logging (verbose and call statistics)
Content capture (span content)

Both engines support OpenTelemetry tracing and content capture on spans, and both emit logs. LLMRails surfaces token usage and timing through its logging and statistics output and verbose mode.

OpenTelemetry token and duration metrics (for example, gen_ai.client.token.usage and gen_ai.client.operation.duration) are an IORails capability, and those metrics can be exported to Prometheus through an OpenTelemetry metrics exporter. For more information, see the Observability documentation.

LLM Frameworks and Providers

FeatureLLMRailsIORailsNotes
Default framework (OpenAI-compatible)
LangChain integration (opt-in)Passing a LangChain LLM routes to LLMRails
Custom LLM injection (llm= / update_llm)A custom llm forces LLMRails

Both engines use the default OpenAI-compatible framework to call models defined in the configuration.

The LangChain integration is opt-in and available on LLMRails. Passing a custom llm to the constructor, including a LangChain model, forces LLMRails, because IORails resolves its models from the configuration rather than from an injected LLM and does not support update_llm.

Knowledge Base and Embeddings

FeatureLLMRailsIORailsNotes
Knowledge base, embeddings, and custom providersLLMRails only

The knowledge base, embedding providers, and custom embedding or embedding-search providers are part of the Colang retrieval pipeline and are supported by LLMRails.

IORails does not initialize a knowledge base or embeddings; configurations that rely on retrieval route to LLMRails.

Community and Third-Party Rail Catalog

FeatureLLMRailsIORailsNotes
Community integrations (PII, AlignScore, ActiveFence, and others)Run as LLMRails actions and flows

The community and third-party integrations in the Guardrail Catalog (for example, PII detection, AlignScore, ActiveFence, Fiddler, Pangea, and others) run as LLMRails actions and flows.

IORails ships only the built-in NeMoGuard safety models and tool validation, so catalog integrations route to LLMRails.

Server and Deployment

FeatureLLMRailsIORailsNotes
Guardrails server (OpenAI-compatible REST API)Bundled server runs LLMRails; use IORails through the Python API
Server-side threads and multi-configLLMRails only

The bundled Guardrails server exposes an OpenAI-compatible REST API and runs on LLMRails. Server-side threads and multi-config serving are provided through that server.

IORails is consumed through the in-process Guardrails Python API rather than the bundled server.

Configuration and Operations

FeatureLLMRailsIORailsNotes
Configuration serialization and conversation stateIORails is stateless
.railsignore and multi-config loadingShared configuration-loading layer

LLMRails supports configuration serialization and maintains conversation state across turns, which the event-based and process_events APIs build on.

IORails is stateless and does not serialize conversation state. Configuration loading, including .railsignore and multi-config loading, is handled by a shared layer and behaves the same for both engines.