nemoguardrails.guardrails.engine_registry
nemoguardrails.guardrails.engine_registry
Engine registry for IORails engine.
Manages a collection of ModelEngine and APIEngine instances, one per configured model type. Each engine owns its own RetryClient with per-model settings.
Module Contents
Classes
Data
API
Registry of ModelEngine and APIEngine instances for IORails.
Creates one engine per configured model or API service, keyed by name. Each engine owns its own HTTP client with per-model retry and timeout settings.
Async context manager entry: start all engine clients.
Async context manager exit: stop all engine clients.
Look up an engine by name, verifying its type.
Route an API request to the named API engine.
Raises:
KeyError: If no engine is registered with the given name.TypeError: If the named engine is not an APIEngine.
Route a chat completion request to the named model engine.
Returns the structured LLMResponse from the engine — content,
reasoning (when the provider exposes it), usage, finish reason.
Callers that only want the assistant text should access .content.
When metrics are enabled, emits gen_ai.client.operation.duration
(with error.type on exception) and gen_ai.client.token.usage
(one observation each for input and output token types,
only when LLMResponse.usage is populated).
Raises:
KeyError: If no engine is registered with the given name.TypeError: If the named engine is not a ModelEngine.
Start all engine clients. Call this during service startup.
Stop all engine clients. Call this during service shutdown.
Stream chat completion chunks from the named model engine.
Yields LLMResponseChunk objects. The surrounding
llm_call_span wraps the full generator lifetime: it opens
before the first chunk and closes when the generator exhausts or
raises.
When metrics are enabled, emits gen_ai.client.operation.duration
for the full stream lifetime (with error.type on exception)
and gen_ai.client.token.usage after stream completion using
the UsageInfo carried on the terminal SSE chunk (when the
provider returns one — controlled by include_usage_in_stream,
defaults to True for OpenAI-compatible engines). No token
observation is emitted on early consumer cancellation or on
provider error mid-stream.
Raises:
KeyError: If no engine is registered with the given name.TypeError: If the named engine is not a ModelEngine.