Release Notes#
The following sections summarize and highlight the changes for each release. For a complete record of changes in a release, refer to the CHANGELOG.md in the GitHub repository.
0.21.0#
Key Features#
Added the
IORailsclass, a new optimized execution engine that runs NemoGuard input and output rails, such as content-safety, topic-safety, and jailbreak detection, in parallel. The engine is opt-in: setNEMO_GUARDRAILS_IORAILS_ENGINE=1to enable it. When enabled, the configuration is validated for compatibility and falls back to LLMRails if unsupported flows are detected. For more information, refer to IORails Engine.Added the
check_async()andcheck()methods onLLMRailsto enable validating messages against input and output rails without triggering full LLM generation. Returns aRailsResultwithPASSED,MODIFIED, orBLOCKEDstatus. For more information, refer to Checking Messages Against Rails.The guardrails server now exposes a fully OpenAI-compatible REST API. The
/v1/chat/completionsendpoint accepts standardChatCompletionrequests with aguardrailsfield for config selection. A new/v1/modelsendpoint lists available models from the configured provider. Theopenaipackage is now a required component of the optionalserverextra (#1623). For more information, refer to Overview of the NeMo Guardrails Library API Server.Added the
GuardrailsMiddlewareclass, a new middleware that integrates with LangChain’s Agent Middleware protocol, applying input and output rail checks before and after every model call in the agent loop. It includes theInputRailsMiddlewareandOutputRailsMiddlewareconvenience subclasses. For more information, refer to Agent Middleware.Added three new community rails: PolicyAI for policy-based content moderation, CrowdStrike AIDR for AI-powered detection and response, and Regex Detection for pattern-based content filtering on input, output, and retrieval.
Jailbreak detection configuration is now validated at create-time. Invalid thresholds and malformed URLs raise errors immediately. For more information, refer to Configuration Validation.
Embedding indexes are now initialized lazily. FastEmbed models are only downloaded when semantic search is needed, reducing startup time for configurations that use only input and output rails.
Breaking Changes#
Streaming metadata parameter renamed. The
include_generation_metadataparameter onLLMRails.stream_async()andStreamingHandleris deprecated in favor ofinclude_metadata. Thegeneration_infofield in streaming chunk dicts is renamed tometadata. The deprecated parameter still works and emits aDeprecationWarning.# Before (deprecated) async for chunk in rails.stream_async(messages=messages, include_generation_metadata=True): info = chunk["generation_info"] # After async for chunk in rails.stream_async(messages=messages, include_metadata=True): info = chunk["metadata"]
StreamingHandlerno longer inherits from LangChainAsyncCallbackHandler. Streaming now usesllm.astream()with directpush_chunk()calls. If your code depends onStreamingHandleras a LangChain callback, update it to use the newpush_chunk()interface.Removed the
stream_usageparameter. Thestream_usage=Trueparameter is no longer automatically added to LLM call kwargs. Streaming metadata is now captured throughresponse_metadataandusage_metadataon final chunks.Server request and response format changed. The
/v1/chat/completionsendpoint now uses OpenAI-compatible request and response schemas. The previousRequestBodyandResponseBodyclasses are removed. For the new format, refer to Overview of the NeMo Guardrails Library API Server.ChatNVIDIA streaming patch removed. The custom
_langchain_nvidia_ai_endpoints_patch.pymodule is removed. The standardChatNVIDIAfromlangchain_nvidia_ai_endpointsis used directly.
Bug Fixes#
Fixed a naming mismatch where the
generate_next_stepaction did not match thegenerate_next_stepstask enum value, which prevented task-specific LLM configuration from working correctly (#1603).Added the
validalias to action results in the GuardrailsAI integration so that Colang flows checking$result["valid"]work as expected (#1611).Filtered the
stopparameter for OpenAI reasoning models (such as GPT-5) that do not accept it, preventing400errors during dialogue rail execution (#1653).Fixed GLiNER PII detection to use “bot refuse to respond” instead of “bot inform answer unknown”, which returned a misleading “I don’t know” message (#1671).
Fixed a
TypeErrorwhenstop=Noneis passed toStreamingHandlerby coercingNoneto an empty list (#1685).Fixed a
TypeErrorinRollingBuffer.format_chunkswheninclude_metadata=Trueis used with output rail streaming enabled. Dict chunks are now normalized to strings at the input boundary (#1687).Fixed
GuardrailsMiddlewaresilently dropping content when rails returnMODIFIEDstatus. Input rails now replace the last user message and output rails replace the last AI message with the sanitized content (#1714).Cache hit statistics are now visible in the Stats log line. Cache stats are also visible in verbose mode (#1666, #1667).
Other Changes#
Updated the Fiddler Guardrails API to match the new specification: the
promptfield is renamed toinput, faithfulness uses strings instead of lists, and a newfdl_roleplayingcategory is added (#1619).Updated the Trend Micro Vision One AI Guard integration from the beta endpoint to the officially released GA endpoint. A required
TMV1-Application-Nameheader is added and the request key is changed fromguardtoprompt(#1546).Added a Locust stress-test benchmark for load testing (#1629).
Removed the
multi_kbexample (#1673).Removed the AI Virtual Assistant Blueprint notebook (#1682).
Updated the Pangea User-Agent repo URL (#1610).
Updated dependencies for the jailbreak detection Docker container (#1596).
Major documentation revamp with improved structure and navigation.