nemoguardrails.guardrails.model_engine | NVIDIA NeMo Guardrails Library Developer Guide

Model engine for IORails.

Wraps a single Model config and makes raw HTTP calls to its OpenAI-compatible /v1/chat/completions endpoint via aiohttp. Retries are handled by aiohttp-retry (ExponentialRetry).

Module Contents

Classes

Name	Description
`ModelEngine`	Wraps a single Model config and makes HTTP calls to its endpoint.
`ModelEngineError`	Raised when a model engine call fails.
`_RequestParams`	Pre-built parameters for an HTTP request to the completions endpoint.

Functions

Name	Description
`_accumulate_tool_call_delta`	Update the tool-call accumulator with any tool_call deltas from a raw SSE chunk.
`_extract_tool_exchanges_nim`	Extract NIM tool exchanges. NIM uses the OpenAI Chat Completions shape.
`_extract_tool_exchanges_openai`	Group an OpenAI Chat Completions conversation into per-turn `ToolExchange`es.
`_extract_tool_results_nim`	Extract NIM tool results. NIM uses the OpenAI Chat Completions shape.
`_extract_tool_results_openai`	Extract OpenAI Chat Completions tool results into `ToolResult` objects.
`_finalize_tool_calls`	Assemble accumulated tool-call fragments into ToolCall objects.
`_parse_chat_completion`	Convert a /v1/chat/completions response dict into an LLMResponse.
`_parse_chat_completion_chunk`	Build an LLMResponseChunk from an SSE chunk dict.
`_parse_tools_nim`	Parse NIM tool definitions. NIM uses the OpenAI Chat Completions tool shape.
`_parse_tools_openai`	Parse OpenAI Chat Completions tool definitions into `Tool` objects.
`_parse_usage`	Build UsageInfo from an OpenAI-format usage dict.
`_tool_calls_from_message`	Extract the tool calls from one assistant message into `ToolCall` objects.
`_tool_result_from_message`	Normalize one OpenAI Chat Completions `role:"tool"` message into a `ToolResult`.

Data

_CHAT_COMPLETIONS_ENDPOINT

_ENGINE_BASE_URLS

_RESERVED_LLM_PARAMETERS

_RESULT_EXTRACTORS

_TOOL_EXCHANGE_EXTRACTORS

_TOOL_PARSERS

log

API

class nemoguardrails.guardrails.model_engine.ModelEngine(
    model_config: nemoguardrails.rails.llm.config.Model
)

Bases: BaseEngine

Wraps a single Model config and makes HTTP calls to its endpoint.

Each ModelEngine owns its own RetryClient with per-model timeout, retry, and connection pool settings.

api_key

Optional[str] = self._resolve_api_key(model_config.engine)

base_url

str = self._resolve_base_url()

body_param_defaults

Mapping[str, Any]

model_name

str = model_config.model or ''

nemoguardrails.guardrails.model_engine.ModelEngine._ensure_running() -> None

Raise if the engine has not been started.

nemoguardrails.guardrails.model_engine.ModelEngine._get_environment_variable(
    variable_name: str
) -> str | None

Return the value stored in environment variable variable_name.

nemoguardrails.guardrails.model_engine.ModelEngine._prepare_request(
    messages: nemoguardrails.guardrails.guardrails_types.LLMMessages,
    kwargs: typing.Any = {}
) -> nemoguardrails.guardrails.model_engine._RequestParams

Build the client, URL, headers, and body common to every request.

nemoguardrails.guardrails.model_engine.ModelEngine._raise_for_status(
    response: aiohttp.ClientResponse,
    req_id: str,
    t0: float
) -> None

async

Raise ModelEngineError if the HTTP status indicates an error.

nemoguardrails.guardrails.model_engine.ModelEngine._resolve_api_key(
    engine: str | None
) -> typing.Optional[str]

Resolve the API key from model config or environment.

nemoguardrails.guardrails.model_engine.ModelEngine._resolve_base_url() -> str

Resolve the base URL from model parameters or engine type.

Strips an optional trailing “/v1” so users can follow the OpenAI / LLMRails convention of including “/v1” in base_url without producing a doubled “/v1/v1/chat/completions” path when _CHAT_COMPLETIONS_ENDPOINT is appended.

nemoguardrails.guardrails.model_engine.ModelEngine._wrap_exception(
    exc: Exception,
    req_id: str,
    t0: float,
    label: str = 'Request'
) -> nemoguardrails.guardrails.model_engine.ModelEngineError

Wrap an unexpected exception in a ModelEngineError.

nemoguardrails.guardrails.model_engine.ModelEngine.call(
    messages: nemoguardrails.guardrails.guardrails_types.LLMMessages,
    kwargs: typing.Any = {}
) -> dict

async

Make a POST request to the /v1/chat/completions endpoint.

Retries on transient failures (429, 5xx, connection errors) are handled automatically by the RetryClient with exponential backoff.

Parameters:

messages

LLMMessages

List of message dicts in OpenAI format.

**kwargs

AnyDefaults to {}

Additional parameters for the request body (temperature, max_tokens, etc.)

Returns: dict

The parsed JSON response dict from the API.

Raises:

ModelEngineError: If the request fails after all retries.

nemoguardrails.guardrails.model_engine.ModelEngine.chat_completion(
    messages: nemoguardrails.guardrails.guardrails_types.LLMMessages,
    kwargs: typing.Any = {}
) -> nemoguardrails.types.LLMResponse

async

Generate a chat completion and return a structured LLMResponse.

Calls the /v1/chat/completions endpoint and parses the OpenAI-format response into an LLMResponse carrying content, reasoning (when the provider exposes reasoning_content), usage, finish reason, and request id.

Raises:

ModelEngineError: If the request fails or the response format is unexpected.

nemoguardrails.guardrails.model_engine.ModelEngine.extract_tool_exchanges(
    messages: nemoguardrails.guardrails.guardrails_types.LLMMessages
) -> list[nemoguardrails.guardrails.tool_schema.ToolExchange]

Group messages into per-turn (tool_calls, tool_results) exchanges.

Each exchange pairs one assistant turn’s tool calls with the tool results that answer it, so RailsManager.are_tool_results_safe can validate call_id linkage turn-locally rather than across the whole flattened history (the latter falsely flags ids reused across turns, which the OpenAI spec permits). Keyed on the model’s engine (_TOOL_EXCHANGE_EXTRACTORS); OpenAI and NIM share the Chat Completions shape and an engine with no registered extractor falls back to it.

nemoguardrails.guardrails.model_engine.ModelEngine.extract_tool_results(
    messages: nemoguardrails.guardrails.guardrails_types.LLMMessages
) -> list[nemoguardrails.guardrails.tool_schema.ToolResult]

Extract incoming tool results from messages into ToolResult objects.

Pulls the provider’s tool-result messages out of the conversation and normalizes them into the internal ToolResult shape the ToolResultRail consumes, keyed on the model’s engine (_RESULT_EXTRACTORS). OpenAI and NIM share the Chat Completions shape (role:"tool" messages); an engine with no registered extractor falls back to it. Returns an empty list when there are no tool results.

nemoguardrails.guardrails.model_engine.ModelEngine.parse_tools(
    llm_params: typing.Optional[dict]
) -> nemoguardrails.guardrails.tool_schema.Toolset

Parse the provider tool block in llm_params into a Toolset.

Reads the opaque tools block forwarded via GenerationOptions.llm_params and normalizes it into the internal Toolset the tool rails validate against, keyed on the model’s engine (_TOOL_PARSERS). OpenAI and NIM share the Chat Completions shape; an engine with no registered parser falls back to it. Returns an empty Toolset when no tools are declared.

nemoguardrails.guardrails.model_engine.ModelEngine.stream_call(
    messages: nemoguardrails.guardrails.guardrails_types.LLMMessages,
    kwargs: typing.Any = {}
) -> collections.abc.AsyncGenerator[nemoguardrails.types.LLMResponseChunk, None]

async

Make a streaming POST request to the /v1/chat/completions endpoint.

Sends stream=True and yields one LLMResponseChunk per SSE event that carries a content delta, reasoning delta, OR a usage payload. Role-only, finish-only, and empty-choices events without usage are skipped. Retries are handled by the RetryClient (same as call()).

Note: when the upstream payload includes stream_options.include_usage=true (default for the OpenAI-compatible client), the provider sends a final usage-only chunk with empty choices after the last content chunk. That terminal chunk is yielded as LLMResponseChunk(usage=...) with both delta_content and delta_reasoning unset — callers that only care about content should gate on chunk.delta_content rather than assuming every yielded chunk carries one.

Tool calls (when the request declared tools) are accumulated from streamed delta.tool_calls fragments and surfaced as a single LLMResponseChunk whose delta_tool_calls carries the COMPLETE finalized list exactly once — on the first chunk with a finish_reason ("tool_calls" for a free choice, "stop" for a forced tool_choice), or via a post-loop safety net if the provider omits a parseable finish frame. No other chunk carries delta_tool_calls, so consumers may treat it as last-write-wins.

Parameters:

messages

LLMMessages

List of message dicts in OpenAI format.

**kwargs

AnyDefaults to {}

Additional parameters for the request body (temperature, max_tokens, etc.)

Raises:

ModelEngineError: If the request fails after all retries.

nemoguardrails.guardrails.model_engine.ModelEngine.stream_chat_completion(
    messages: nemoguardrails.guardrails.guardrails_types.LLMMessages,
    kwargs: typing.Any = {}
) -> collections.abc.AsyncGenerator[nemoguardrails.types.LLMResponseChunk, None]

async

Stream a chat completion and yield LLMResponseChunk objects.

Thin pass-through over stream_call — see that method’s docstring for the contract, including the terminal usage-only chunk emitted when stream_options.include_usage is on.

Raises:

ModelEngineError: If the request fails after all retries.

class nemoguardrails.guardrails.model_engine.ModelEngineError(
    message: str,
    model_name: str,
    status: int | None = None
)

Exception

Bases: Exception

Raised when a model engine call fails.

class nemoguardrails.guardrails.model_engine._RequestParams()

Bases: NamedTuple

Pre-built parameters for an HTTP request to the completions endpoint.

body

dict[str, Any]

client

RetryClient

headers

dict[str, str]

url

str

nemoguardrails.guardrails.model_engine._accumulate_tool_call_delta(
    tool_calls: dict[int, dict],
    raw_chunk: dict
) -> None

Update the tool-call accumulator with any tool_call deltas from a raw SSE chunk.

OpenAI streams argument JSON as fragments across many chunks; NIM delivers complete arguments in one delta. Both are handled uniformly: tool_calls is keyed by the OpenAI index field and mutated in place on every call. Finalize with _finalize_tool_calls once finish_reason=="tool_calls".

nemoguardrails.guardrails.model_engine._extract_tool_exchanges_nim(
    messages: nemoguardrails.guardrails.guardrails_types.LLMMessages
) -> list[nemoguardrails.guardrails.tool_schema.ToolExchange]

Extract NIM tool exchanges. NIM uses the OpenAI Chat Completions shape.

nemoguardrails.guardrails.model_engine._extract_tool_exchanges_openai(
    messages: nemoguardrails.guardrails.guardrails_types.LLMMessages
) -> list[nemoguardrails.guardrails.tool_schema.ToolExchange]

Group an OpenAI Chat Completions conversation into per-turn ToolExchangees.

nemoguardrails.guardrails.model_engine._extract_tool_results_nim(
    messages: nemoguardrails.guardrails.guardrails_types.LLMMessages
) -> list[nemoguardrails.guardrails.tool_schema.ToolResult]

Extract NIM tool results. NIM uses the OpenAI Chat Completions shape.

nemoguardrails.guardrails.model_engine._extract_tool_results_openai(
    messages: nemoguardrails.guardrails.guardrails_types.LLMMessages
) -> list[nemoguardrails.guardrails.tool_schema.ToolResult]

Extract OpenAI Chat Completions tool results into ToolResult objects.

Chat Completions carries each tool result as a top-level {"role": "tool", "tool_call_id", "content"} message (optionally name).

nemoguardrails.guardrails.model_engine._finalize_tool_calls(
    tool_calls: dict[int, dict]
) -> list[nemoguardrails.types.ToolCall]

Assemble accumulated tool-call fragments into ToolCall objects.

Called once when the stream emits finish_reason=‘tool_calls’. An empty buffer (no argument fragments streamed) is a no-argument call and becomes {}; a non-empty buffer that is not a valid JSON object (e.g. arguments truncated mid-stream) raises ValueError so the malformed call fails closed rather than silently degrading to empty arguments that could pass the tool-call rail. This mirrors the non-streaming parser (ChatMessage.from_dict), which raises on the same bytes; stream_call wraps the error into ModelEngineError exactly as the non-streaming path does.

nemoguardrails.guardrails.model_engine._parse_chat_completion(
    response: dict
) -> nemoguardrails.types.LLMResponse

Convert a /v1/chat/completions response dict into an LLMResponse.

Reasoning is read from message.reasoning_content when the provider exposes it (NIM, DeepSeek-style). Tool calls are parsed from message.tool_calls (OpenAI shape) into LLMResponse.tool_calls via ChatMessage.from_dict, which normalizes JSON-string arguments into a dict. content is None on a tool-call-only response and is normalized to an empty string; a None content with no tool calls is treated as a malformed response.

nemoguardrails.guardrails.model_engine._parse_chat_completion_chunk(
    chunk: dict
) -> typing.Optional[nemoguardrails.types.LLMResponseChunk]

Build an LLMResponseChunk from an SSE chunk dict.

Returns None for chunks without one of: content delta, reasoning delta, a usage payload, or a finish_reason. Role-only first events map to None.

Finish-only frames are preserved: a delta with no content/reasoning (OpenAI sends delta: {}, NIM sends delta: {"content": ""}) and no usage, carrying only a finish_reason. Dropping them would strip gen_ai.response.finish_reasons from the LLM span. (Some providers instead attach finish_reason to the final content chunk — that case is already captured, since content keeps the chunk alive.) When stream_options.include_usage=true the usage payload arrives in a separate later frame with empty choices — so finish_reason and usage do not share a frame.

Last chunk from OpenAI-compatible providers has a usage field when stream_options.include_usage=true. This is passed through to capture the token usage metadata.

nemoguardrails.guardrails.model_engine._parse_tools_nim(
    tools: list
) -> list[nemoguardrails.guardrails.tool_schema.Tool]

Parse NIM tool definitions. NIM uses the OpenAI Chat Completions tool shape.

nemoguardrails.guardrails.model_engine._parse_tools_openai(
    tools: list
) -> list[nemoguardrails.guardrails.tool_schema.Tool]

Parse OpenAI Chat Completions tool definitions into Tool objects.

Each entry has the nested shape {"type": "function", "function": {"name", "description", "parameters", "strict"}}; function.parameters (the JSON Schema) maps to Tool.arguments_schema. Entries that are not a dict, lack a function block, or whose function has no non-empty name are skipped.

nemoguardrails.guardrails.model_engine._parse_usage(
    usage_dict: dict
) -> nemoguardrails.types.UsageInfo

Build UsageInfo from an OpenAI-format usage dict.

Picks up reasoning_tokens from completion_tokens_details (OpenAI reasoning models) and cached_tokens from prompt_tokens_details when present.

nemoguardrails.guardrails.model_engine._tool_calls_from_message(
    message: dict
) -> list[nemoguardrails.types.ToolCall]

Extract the tool calls from one assistant message into ToolCall objects. Malformed tool calls fall back to just id, type, function

nemoguardrails.guardrails.model_engine._tool_result_from_message(
    message: dict
) -> nemoguardrails.guardrails.tool_schema.ToolResult

Normalize one OpenAI Chat Completions role:"tool" message into a ToolResult.

This shape has no error flag, so is_error is always False.

nemoguardrails.guardrails.model_engine._CHAT_COMPLETIONS_ENDPOINT = '/v1/chat/completions'

nemoguardrails.guardrails.model_engine._ENGINE_BASE_URLS = {'nim': 'https://integrate.api.nvidia.com', 'openai': 'https://api.openai.com'}

nemoguardrails.guardrails.model_engine._RESERVED_LLM_PARAMETERS = frozenset({'base_url', 'timeout', 'timeout_connect', 'max_attempts', 'api_key', ...

nemoguardrails.guardrails.model_engine._RESULT_EXTRACTORS = {'openai': _extract_tool_results_openai, 'nim': _extract_tool_results_nim}

nemoguardrails.guardrails.model_engine._TOOL_EXCHANGE_EXTRACTORS = {'openai': _extract_tool_exchanges_openai, 'nim': _extract_tool_exchanges_nim}

nemoguardrails.guardrails.model_engine._TOOL_PARSERS = {'openai': _parse_tools_openai, 'nim': _parse_tools_nim}

nemoguardrails.guardrails.model_engine.log = logging.getLogger(__name__)