nemoguardrails.guardrails.telemetry
Inline OpenTelemetry instrumentation for the IORails engine.
All OpenTelemetry API imports are isolated in this module so the rest of the
guardrails package never imports opentelemetry directly. When the
opentelemetry-api package is not installed, the public entry points
is_tracing_enabled, get_tracer, get_meter, and traced_request
degrade gracefully (returning False, None, or a no-span / no-metric
passthrough respectively). Lower-level helpers like request_span and
trace_id_to_request_id require OTEL to be available and are only
reachable through traced_request when a non-None tracer is provided.
Module Contents
Classes
Functions
Data
API
Request-level OTEL instruments for the IORails engine.
Field names mirror the emitted metric names (minus the guardrails.
prefix). The saturation-metric group covers the full request lifecycle:
- Aggregate:
requests_active(guardrails.requests.active) - Non-streaming path:
nonstream_rejections(guardrails.nonstream.rejections); the two gaugesnonstream.queuedandnonstream.activeare registered separately viaregister_nonstream_saturation_gaugesbecause ObservableGauges need a live queue reference. - Streaming path:
stream_active(guardrails.stream.active) andstream_rejections(guardrails.stream.rejections).
Bases: NamedTuple
Handle yielded by traced_request.
span is the IORails guardrails.request span when tracing is
enabled, or None when it is not. request_id is always a
16-char hex string. Unpacks as (span, request_id) for callers
that prefer positional access.
Reset the request-ID ContextVar from a cleanup path, tolerating the
one expected ValueError.
ContextVar.reset() raises ValueError("... was created in a different Context") when called from a different asyncio Context
than where .set() was called. That happens during async-generator
cleanup (aclose() running in an outer task’s context) and is the
only ValueError that reset_request_id raises today. Any
other ValueError indicates an unexpected bug in the helper and is
re-raised so callers see it.
Lazily create the request-level instruments and return them as a
:class:RequestInstruments. Returns None when the OTEL API is not
installed.
Return the OTEL GenAI gen_ai.input.messages form for non-system messages.
Each non-system message is role-wrapped as {"role": role, "parts": [{"type": "text", "content": content}]}. Named for the attribute it
populates rather than “parts” because — unlike
:func:_system_parts_from_messages — it keeps the role wrapper.
Example::
>>> _non_system_input_messages([ … {“role”: “system”, “content”: “be helpful”}, … {“role”: “user”, “content”: “hi”}, … ]) [{“role”: “user”, “parts”: [{“type”: “text”, “content”: “hi”}]}]
Legacy-event branch of :func:set_llm_call_content.
Adds one span event per input message (gen_ai.system.message /
gen_ai.user.message / gen_ai.assistant.message /
gen_ai.tool.message) plus a gen_ai.choice event for the
assistant output. Roles not in :data:_LEGACY_EVENT_BY_ROLE
(e.g. function) are skipped silently.
JSON-attribute branch of :func:set_llm_call_content.
Sets gen_ai.input.messages, gen_ai.output.messages, and
gen_ai.system_instructions as JSON-encoded span attributes per
the latest experimental OTEL GenAI semantic conventions. Attributes
are only set when non-empty so backends can distinguish “no system
instructions” from “system instructions == ””.
Return the bare OTEL GenAI parts for system messages only.
Feeds gen_ai.system_instructions, which the spec defines as a flat
list of parts with no role wrapper (every entry is implicitly system).
Asymmetric with :func:_non_system_input_messages, which keeps the role
wrapper — the two attributes have different shapes by spec. Entries
missing role or content are skipped silently.
Example::
>>> _system_parts_from_messages([ … {“role”: “system”, “content”: “be helpful”}, … {“role”: “user”, “content”: “hi”}, … ]) [{“type”: “text”, “content”: “be helpful”}]
Return True iff OTEL_SEMCONV_STABILITY_OPT_IN selects JSON span attrs.
The env var holds a comma-separated list of opt-in tokens. When
gen_ai_latest_experimental is present, content is emitted as
JSON-encoded span attributes, otherwise as legacy per-message span events.
Read fresh each call so runtime changes to the env var take effect
immediately.
Create a guardrails.action INTERNAL span for a rail action execution.
Yields the span (or None when tracer is None).
Create a CLIENT span for a non-LLM API call (e.g., jailbreak detection).
Uses the api.name attribute rather than gen_ai.operation.name
because these APIs are plain HTTP endpoints, not GenAI operations.
http.* transport attributes can be added additively later without
conflict. Yields the span (or None when tracer is None).
Return True when inline OTEL metrics should be emitted.
Requires the opentelemetry-api package to be installed and
config.metrics.enabled to be True. Independent of
:func:is_tracing_enabled — OTEL signals (traces, metrics, logs) are
designed to be toggled independently so customers can, for example,
run metrics-only for cost-optimized SLO dashboards without the
overhead of full trace export.
Return a cached OpenTelemetry meter for nemo-guardrails, or None.
The meter is obtained via the OTEL API (not SDK), following the library
instrumentation best practice. The application is responsible for
configuring a MeterProvider before any metrics are recorded; without
one, the API returns a no-op meter and all emissions are silently
discarded.
Return a cached OpenTelemetry tracer for nemo-guardrails, or None.
The tracer is obtained via the OTEL API (not SDK), following the library
instrumentation best practice. The application is responsible for
configuring a TracerProvider before any spans are created.
Return True when message content should be captured onto spans.
OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT is the
primary control — when set, it overrides any config-file value so
operators have a single OTEL-standard env var that flips capture
across all services regardless of what the deployed config says.
Recognized values (case-insensitive, surrounding whitespace
stripped): true / 1 enable; false / 0 disable; any
other value falls through to the config field.
When the env var is absent or unrecognized, capture is on iff
config.tracing.enable_content_capture is True.
Callers should ALSO require :func:is_tracing_enabled before
treating capture as active — there is no point capturing content
onto spans that won’t be exported. This helper deliberately does
not perform that check itself so it stays orthogonal to the
tracing-enabled signal (and so tests can exercise each independently).
Return True when inline OTEL tracing should be active.
Requires the opentelemetry-api package to be installed and
config.tracing.enabled to be True. Other TracingConfig
fields (adapters, span_format) are used by the LLMRails
post-hoc tracing path and are ignored here.
Create a CLIENT span for an LLM call following GenAI semantic conventions.
Span name follows the OTEL pattern: "{operation_name} {model_name}".
operation_name defaults to "chat" because IORails only issues
chat completions. In the future if any other non-chat LLM operations are
supported, callers should pass an explicit operation_name from the
OTEL GenAI semantic conventions.
Yields the span (or None when tracer is None).
Set rail.stop=True on a rail span when the rail blocked the request.
Safe to call with None (no-op) so callers don’t have to branch on
whether a real span was produced — matches the record_span_error
idiom. Only marks stop when is_safe is False; a passing rail
leaves the attribute unset.
Create a guardrails.rail INTERNAL span for a single rail execution.
Yields the span (or None when tracer is None).
The caller should set rail.stop on the span after execution if the
rail blocked the request.
Increment guardrails.nonstream.rejections by 1.
Called from the non-streaming path when the admission queue rejects a
submission with asyncio.QueueFull (the queue’s reject_on_full
behaviour, triggered when NONSTREAM_QUEUE_DEPTH is exceeded).
Increment guardrails.requests.blocked with a rail.type label.
Fires at the block sites in iorails.py (_do_generate for the
non-streaming path, _generation_task for streaming) whenever the
request returns REFUSAL_MESSAGE because an input or output rail
flagged it. The counter is cumulative over the process lifetime; a
per-rail grain (rail.name) will be added in split-2 alongside
guardrails.rail.blocked.
No-op when the OTEL API is unavailable or instruments cannot be created.
Increment guardrails.requests.errors with an error.type label.
request_metrics already bumps this counter when an exception
propagates through its except branch (the non-streaming path).
Streaming code paths catch-and-swallow exceptions inside
_generation_task — converting them to error-payload chunks —
so the counter never sees them via propagation. Those paths should
call this helper explicitly so the errors counter reflects ALL failed
requests, not just those whose exceptions bubble up.
No-op when the OTEL API is unavailable or instruments cannot be created. Best-effort: a failure inside the meter SDK is swallowed so it can never mask the original exception the caller is about to re-raise.
Record an exception on an OTEL span and set its status to ERROR.
Also sets the error.type attribute to the exception’s class name
(per OTEL GenAI conditional-required convention). Safe to call with
None (no-op). Use from every span helper’s except block and
from callers that swallow exceptions before they can propagate.
Best-effort: any failure while annotating the span (e.g. a broken
exporter or SDK) is swallowed so it can never mask the original
exception the caller is about to re-raise — notably CancelledError
/ GeneratorExit on a cancelled stream. Only Exception is
suppressed, so a BaseException raised inside the SDK still
propagates.
Increment guardrails.stream.rejections by 1.
Called from the streaming path when a request arrives while the stream
concurrency semaphore is fully occupied (_stream_semaphore.locked()).
Register guardrails.nonstream.queued + guardrails.nonstream.active
ObservableGauges on the module-level Meter.
ObservableGauges read live state at collection time, so both metrics reflect the current non-streaming queue + worker occupancy with no drift risk vs. an UpDownCounter lineage.
is_running is a zero-arg callable returning bool, deferred
so each collection re-reads the current state (passing the bool
directly would bake its start-time value into the closure). The
callbacks return an empty observation list when it returns False
— the state the flag holds after IORails.stop() flips
self._running back to False. OTEL Python has no public
unregister API for observable instruments, so this “no data points”
fallback is the only way to stop a dead IORails instance from
polluting collection.
No-op when the OTEL API is unavailable or no MeterProvider is configured.
Emit request-level OTEL metrics around the wrapped block.
Increments guardrails.requests on entry, bumps
guardrails.requests.active (UpDownCounter) for the duration of
the block, records guardrails.request.duration in seconds on
exit, and increments guardrails.requests.errors with an
error.type attribute when the block raises.
requests.active covers both non-streaming (queue-wait + execution)
and streaming (semaphore hold) requests.
Summing the per-path saturation metrics
(nonstream.queued, nonstream.active, stream.active)
should approximate this value at any collection instant.
Instruments are created lazily on first use. No-op when the OTEL API is not installed or instruments cannot be created.
Create a live guardrails.request SERVER span.
Yields (span, request_id) where request_id is derived from the
OTEL trace ID. The span is ended automatically when the block exits.
If an exception propagates, the span records it and sets ERROR status
before re-raising.
Capture input/output messages on a span representing a model interaction.
Used for both gen_ai.* CLIENT spans (LLM calls) and the
guardrails.request SERVER span — the OTEL GenAI semconv
attribute names apply to any span that represents a model
interaction, so reusing the names lets backends correlate the outer
guardrails request with the inner LLM call by attribute name alone.
Dispatches on :func:_use_json_span_format:
- JSON attrs (
OTEL_SEMCONV_STABILITY_OPT_INincludesgen_ai_latest_experimental): :func:_set_llm_call_content_jsonsets the JSON-encodedgen_ai.input.messages,gen_ai.output.messages, andgen_ai.system_instructionsspan attributes per the latest experimental OTEL GenAI semantic conventions. - Legacy events (default): :func:
_set_llm_call_content_eventsadds one span event per input message plus agen_ai.choiceevent for the assistant output.
Safe to call with span=None (no-op) so callers don’t have to
branch on whether tracing is enabled. Caller is responsible for
checking the content-capture flag — this helper does NOT re-check
:func:is_content_capture_enabled so it stays cheap on hot paths.
Capture rail input + (optionally) block reason on a guardrails.rail span.
Sets guardrails.rail.input to the JSON-encoded rail_input dict
(typically {"messages": [...], "bot_response": ...}). When
reason is non-None, also sets guardrails.rail.reason — caller
passes the human-readable block reason from the failing rail (or
None when the rail passed, in which case only the input
attribute is recorded).
Safe to call with span=None (no-op). No GenAI semconv covers
rail spans, so these attributes live under the guardrails.* namespace
alongside rail.type / rail.name / rail.stop.
Capture caller-facing input/output on the guardrails.request SERVER span.
Uses guardrails.request.input (JSON-encoded input messages) and
guardrails.request.output (the text actually returned to the caller)
rather than the gen_ai.* attribute names used on LLM CLIENT spans.
This distinction matters on block paths: the LLM CLIENT span records the
raw model response, while the SERVER span records the refusal message —
the same gen_ai.output.messages name on both spans would carry
different values and confuse backends correlating the two.
guardrails.request.input is always a JSON-encoded list of role/content
message objects matching the caller’s input. guardrails.request.output
is the plain string that IORails returned (REFUSAL_MESSAGE on block paths,
the model’s response text on the success path). output_text=None
suppresses the output attribute entirely — used by the streaming path when
the stream produced no content, so an empty output is not falsely recorded.
Safe to call with span=None (no-op).
Stamp speculative-generation outcome attributes on a request span.
Records which branch of the speculative race finished first
(input rails vs. main LLM generation) and which one ultimately
rejected the request, on the IORails guardrails.request span.
Safe to call with None (no-op) so callers don’t have to branch
on whether tracing is enabled — matches the record_span_error /
mark_rail_stop idiom.
Context manager that tracks a stream as active for its full lifetime.
+1 on enter / -1 on exit (finally) on
guardrails.stream.active (UpDownCounter). No-op when metrics are
unavailable. Wrap the block where the stream holds a semaphore permit.
Derive a human-readable request ID from the span’s OTEL trace ID.
Returns the last REQUEST_ID_HEX_CHARS hex characters of the 128-bit
trace ID (the low 64 bits, which carry the highest entropy). When the
trace ID is zero (e.g. a NoOpTracerProvider is active) a random
fallback is used.
Unified request context: sets request ID, optionally creates a span and/or emits request-level metrics.
The two signals are gated independently:
tracer is not None→ a liveguardrails.requestSERVER span is created and the request ID is derived from its trace ID.metrics_enabled=True→ emit request-level OTEL metrics
All four combinations are valid. Metrics-only (tracer=None, metrics_enabled=True) is a supported setup for customers running
cheap SLO dashboards without full trace export.
Yields a :class:TracedRequest (span, request_id). Callers
that want to mark the request span ERROR from a deeply-nested scope
should capture the yielded span and pass it explicitly to
record_span_error — never rely on trace.get_current_span()
which can return the host app’s ambient span when IORails tracing is
disabled.
The request-ID ContextVar is always cleaned up on exit via
:func:_cleanup_request_id, which tolerates the expected
cross-context ValueError that async-generator cleanup can raise.