nemoguardrails.guardrails.telemetry

View as Markdown

Inline OpenTelemetry instrumentation for the IORails engine.

All OpenTelemetry API imports are isolated in this module so the rest of the guardrails package never imports opentelemetry directly. When the opentelemetry-api package is not installed, the public entry points is_tracing_enabled, get_tracer, get_meter, and traced_request degrade gracefully (returning False, None, or a no-span / no-metric passthrough respectively). Lower-level helpers like request_span and trace_id_to_request_id require OTEL to be available and are only reachable through traced_request when a non-None tracer is provided.

Module Contents

Classes

NameDescription
RequestInstrumentsRequest-level OTEL instruments for the IORails engine.
TracedRequestHandle yielded by traced_request.

Functions

NameDescription
_cleanup_request_idReset the request-ID ContextVar from a cleanup path, tolerating the
_ensure_request_instrumentsLazily create the request-level instruments and return them as a
_non_system_input_messagesReturn the OTEL GenAI gen_ai.input.messages form for non-system messages.
_set_llm_call_content_eventsLegacy-event branch of :func:set_llm_call_content.
_set_llm_call_content_jsonJSON-attribute branch of :func:set_llm_call_content.
_system_parts_from_messagesReturn the bare OTEL GenAI parts for system messages only.
_use_json_span_formatReturn True iff OTEL_SEMCONV_STABILITY_OPT_IN selects JSON span attrs.
action_spanCreate a guardrails.action INTERNAL span for a rail action execution.
api_call_spanCreate a CLIENT span for a non-LLM API call (e.g., jailbreak detection).
are_metrics_enabledReturn True when inline OTEL metrics should be emitted.
get_meterReturn a cached OpenTelemetry meter for nemo-guardrails, or None.
get_tracerReturn a cached OpenTelemetry tracer for nemo-guardrails, or None.
is_content_capture_enabledReturn True when message content should be captured onto spans.
is_tracing_enabledReturn True when inline OTEL tracing should be active.
llm_call_spanCreate a CLIENT span for an LLM call following GenAI semantic conventions.
mark_rail_stopSet rail.stop=True on a rail span when the rail blocked the request.
rail_spanCreate a guardrails.rail INTERNAL span for a single rail execution.
record_nonstream_rejectedIncrement guardrails.nonstream.rejections by 1.
record_request_blockedIncrement guardrails.requests.blocked with a rail.type label.
record_request_errorIncrement guardrails.requests.errors with an error.type label.
record_span_errorRecord an exception on an OTEL span and set its status to ERROR.
record_stream_rejectedIncrement guardrails.stream.rejections by 1.
register_nonstream_saturation_gaugesRegister guardrails.nonstream.queued + guardrails.nonstream.active
request_metricsEmit request-level OTEL metrics around the wrapped block.
request_spanCreate a live guardrails.request SERVER span.
set_llm_call_contentCapture input/output messages on a span representing a model interaction.
set_rail_contentCapture rail input + (optionally) block reason on a guardrails.rail span.
set_request_contentCapture caller-facing input/output on the guardrails.request SERVER span.
set_speculative_span_attrsStamp speculative-generation outcome attributes on a request span.
stream_active_metricContext manager that tracks a stream as active for its full lifetime.
trace_id_to_request_idDerive a human-readable request ID from the span’s OTEL trace ID.
traced_requestUnified request context: sets request ID, optionally creates a span

Data

_INVALID_TRACE_ID

_LEGACY_EVENT_BY_ROLE

_OTEL_AVAILABLE

_meter

_request_instruments

_tracer

log

API

class nemoguardrails.guardrails.telemetry.RequestInstruments(
requests: opentelemetry.metrics.Counter,
errors: opentelemetry.metrics.Counter,
blocked: opentelemetry.metrics.Counter,
duration: opentelemetry.metrics.Histogram,
requests_active: opentelemetry.metrics.UpDownCounter,
nonstream_rejections: opentelemetry.metrics.Counter,
stream_active: opentelemetry.metrics.UpDownCounter,
stream_rejections: opentelemetry.metrics.Counter
)
Dataclass

Request-level OTEL instruments for the IORails engine.

Field names mirror the emitted metric names (minus the guardrails. prefix). The saturation-metric group covers the full request lifecycle:

  • Aggregate: requests_active (guardrails.requests.active)
  • Non-streaming path: nonstream_rejections (guardrails.nonstream.rejections); the two gauges nonstream.queued and nonstream.active are registered separately via register_nonstream_saturation_gauges because ObservableGauges need a live queue reference.
  • Streaming path: stream_active (guardrails.stream.active) and stream_rejections (guardrails.stream.rejections).
blocked
Counter
duration
Histogram
errors
Counter
nonstream_rejections
Counter
requests
Counter
requests_active
UpDownCounter
stream_active
UpDownCounter
stream_rejections
Counter
class nemoguardrails.guardrails.telemetry.TracedRequest()

Bases: NamedTuple

Handle yielded by traced_request.

span is the IORails guardrails.request span when tracing is enabled, or None when it is not. request_id is always a 16-char hex string. Unpacks as (span, request_id) for callers that prefer positional access.

request_id
str
span
Optional[Span]
nemoguardrails.guardrails.telemetry._cleanup_request_id(
token
) -> None

Reset the request-ID ContextVar from a cleanup path, tolerating the one expected ValueError.

ContextVar.reset() raises ValueError("... was created in a different Context") when called from a different asyncio Context than where .set() was called. That happens during async-generator cleanup (aclose() running in an outer task’s context) and is the only ValueError that reset_request_id raises today. Any other ValueError indicates an unexpected bug in the helper and is re-raised so callers see it.

nemoguardrails.guardrails.telemetry._ensure_request_instruments() -> typing.Optional[nemoguardrails.guardrails.telemetry.RequestInstruments]

Lazily create the request-level instruments and return them as a :class:RequestInstruments. Returns None when the OTEL API is not installed.

nemoguardrails.guardrails.telemetry._non_system_input_messages(
messages: nemoguardrails.guardrails.guardrails_types.LLMMessages
) -> list[dict]

Return the OTEL GenAI gen_ai.input.messages form for non-system messages.

Each non-system message is role-wrapped as {"role": role, "parts": [{"type": "text", "content": content}]}. Named for the attribute it populates rather than “parts” because — unlike :func:_system_parts_from_messages — it keeps the role wrapper.

Example::

>>> _non_system_input_messages([ … {“role”: “system”, “content”: “be helpful”}, … {“role”: “user”, “content”: “hi”}, … ]) [{“role”: “user”, “parts”: [{“type”: “text”, “content”: “hi”}]}]

nemoguardrails.guardrails.telemetry._set_llm_call_content_events(
span: opentelemetry.trace.Span,
input_messages: nemoguardrails.guardrails.guardrails_types.LLMMessages,
output_text: typing.Optional[str]
) -> None

Legacy-event branch of :func:set_llm_call_content.

Adds one span event per input message (gen_ai.system.message / gen_ai.user.message / gen_ai.assistant.message / gen_ai.tool.message) plus a gen_ai.choice event for the assistant output. Roles not in :data:_LEGACY_EVENT_BY_ROLE (e.g. function) are skipped silently.

nemoguardrails.guardrails.telemetry._set_llm_call_content_json(
span: opentelemetry.trace.Span,
input_messages: nemoguardrails.guardrails.guardrails_types.LLMMessages,
output_text: typing.Optional[str]
) -> None

JSON-attribute branch of :func:set_llm_call_content.

Sets gen_ai.input.messages, gen_ai.output.messages, and gen_ai.system_instructions as JSON-encoded span attributes per the latest experimental OTEL GenAI semantic conventions. Attributes are only set when non-empty so backends can distinguish “no system instructions” from “system instructions == ””.

nemoguardrails.guardrails.telemetry._system_parts_from_messages(
messages: nemoguardrails.guardrails.guardrails_types.LLMMessages
) -> list[dict]

Return the bare OTEL GenAI parts for system messages only.

Feeds gen_ai.system_instructions, which the spec defines as a flat list of parts with no role wrapper (every entry is implicitly system). Asymmetric with :func:_non_system_input_messages, which keeps the role wrapper — the two attributes have different shapes by spec. Entries missing role or content are skipped silently.

Example::

>>> _system_parts_from_messages([ … {“role”: “system”, “content”: “be helpful”}, … {“role”: “user”, “content”: “hi”}, … ]) [{“type”: “text”, “content”: “be helpful”}]

nemoguardrails.guardrails.telemetry._use_json_span_format() -> bool

Return True iff OTEL_SEMCONV_STABILITY_OPT_IN selects JSON span attrs.

The env var holds a comma-separated list of opt-in tokens. When gen_ai_latest_experimental is present, content is emitted as JSON-encoded span attributes, otherwise as legacy per-message span events. Read fresh each call so runtime changes to the env var take effect immediately.

nemoguardrails.guardrails.telemetry.action_span(
tracer: typing.Optional[opentelemetry.trace.Tracer],
action_name: str
) -> typing.Generator[typing.Optional[opentelemetry.trace.Span], None, None]

Create a guardrails.action INTERNAL span for a rail action execution.

Yields the span (or None when tracer is None).

nemoguardrails.guardrails.telemetry.api_call_span(
tracer: typing.Optional[opentelemetry.trace.Tracer],
api_name: str
) -> typing.Generator[typing.Optional[opentelemetry.trace.Span], None, None]

Create a CLIENT span for a non-LLM API call (e.g., jailbreak detection).

Uses the api.name attribute rather than gen_ai.operation.name because these APIs are plain HTTP endpoints, not GenAI operations. http.* transport attributes can be added additively later without conflict. Yields the span (or None when tracer is None).

nemoguardrails.guardrails.telemetry.are_metrics_enabled(
config_metrics: typing.Optional[nemoguardrails.rails.llm.config.MetricsConfig]
) -> bool

Return True when inline OTEL metrics should be emitted.

Requires the opentelemetry-api package to be installed and config.metrics.enabled to be True. Independent of :func:is_tracing_enabled — OTEL signals (traces, metrics, logs) are designed to be toggled independently so customers can, for example, run metrics-only for cost-optimized SLO dashboards without the overhead of full trace export.

nemoguardrails.guardrails.telemetry.get_meter() -> typing.Optional[opentelemetry.metrics.Meter]

Return a cached OpenTelemetry meter for nemo-guardrails, or None.

The meter is obtained via the OTEL API (not SDK), following the library instrumentation best practice. The application is responsible for configuring a MeterProvider before any metrics are recorded; without one, the API returns a no-op meter and all emissions are silently discarded.

nemoguardrails.guardrails.telemetry.get_tracer() -> typing.Optional[opentelemetry.trace.Tracer]

Return a cached OpenTelemetry tracer for nemo-guardrails, or None.

The tracer is obtained via the OTEL API (not SDK), following the library instrumentation best practice. The application is responsible for configuring a TracerProvider before any spans are created.

nemoguardrails.guardrails.telemetry.is_content_capture_enabled(
config_tracing: typing.Optional[nemoguardrails.rails.llm.config.TracingConfig]
) -> bool

Return True when message content should be captured onto spans.

OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT is the primary control — when set, it overrides any config-file value so operators have a single OTEL-standard env var that flips capture across all services regardless of what the deployed config says. Recognized values (case-insensitive, surrounding whitespace stripped): true / 1 enable; false / 0 disable; any other value falls through to the config field.

When the env var is absent or unrecognized, capture is on iff config.tracing.enable_content_capture is True.

Callers should ALSO require :func:is_tracing_enabled before treating capture as active — there is no point capturing content onto spans that won’t be exported. This helper deliberately does not perform that check itself so it stays orthogonal to the tracing-enabled signal (and so tests can exercise each independently).

nemoguardrails.guardrails.telemetry.is_tracing_enabled(
config_tracing: typing.Optional[nemoguardrails.rails.llm.config.TracingConfig]
) -> bool

Return True when inline OTEL tracing should be active.

Requires the opentelemetry-api package to be installed and config.tracing.enabled to be True. Other TracingConfig fields (adapters, span_format) are used by the LLMRails post-hoc tracing path and are ignored here.

nemoguardrails.guardrails.telemetry.llm_call_span(
tracer: typing.Optional[opentelemetry.trace.Tracer],
model_name: str,
provider_name: str,
operation_name: str = 'chat'
) -> typing.Generator[typing.Optional[opentelemetry.trace.Span], None, None]

Create a CLIENT span for an LLM call following GenAI semantic conventions.

Span name follows the OTEL pattern: "{operation_name} {model_name}".

operation_name defaults to "chat" because IORails only issues chat completions. In the future if any other non-chat LLM operations are supported, callers should pass an explicit operation_name from the OTEL GenAI semantic conventions.

Yields the span (or None when tracer is None).

nemoguardrails.guardrails.telemetry.mark_rail_stop(
span: typing.Optional[opentelemetry.trace.Span],
is_safe: bool
) -> None

Set rail.stop=True on a rail span when the rail blocked the request.

Safe to call with None (no-op) so callers don’t have to branch on whether a real span was produced — matches the record_span_error idiom. Only marks stop when is_safe is False; a passing rail leaves the attribute unset.

nemoguardrails.guardrails.telemetry.rail_span(
tracer: typing.Optional[opentelemetry.trace.Tracer],
flow: str,
direction: nemoguardrails.guardrails.guardrails_types.RailDirection
) -> typing.Generator[typing.Optional[opentelemetry.trace.Span], None, None]

Create a guardrails.rail INTERNAL span for a single rail execution.

Yields the span (or None when tracer is None). The caller should set rail.stop on the span after execution if the rail blocked the request.

nemoguardrails.guardrails.telemetry.record_nonstream_rejected() -> None

Increment guardrails.nonstream.rejections by 1.

Called from the non-streaming path when the admission queue rejects a submission with asyncio.QueueFull (the queue’s reject_on_full behaviour, triggered when NONSTREAM_QUEUE_DEPTH is exceeded).

nemoguardrails.guardrails.telemetry.record_request_blocked(
direction: nemoguardrails.guardrails.guardrails_types.RailDirection
) -> None

Increment guardrails.requests.blocked with a rail.type label.

Fires at the block sites in iorails.py (_do_generate for the non-streaming path, _generation_task for streaming) whenever the request returns REFUSAL_MESSAGE because an input or output rail flagged it. The counter is cumulative over the process lifetime; a per-rail grain (rail.name) will be added in split-2 alongside guardrails.rail.blocked.

No-op when the OTEL API is unavailable or instruments cannot be created.

nemoguardrails.guardrails.telemetry.record_request_error(
exc: BaseException
) -> None

Increment guardrails.requests.errors with an error.type label.

request_metrics already bumps this counter when an exception propagates through its except branch (the non-streaming path). Streaming code paths catch-and-swallow exceptions inside _generation_task — converting them to error-payload chunks — so the counter never sees them via propagation. Those paths should call this helper explicitly so the errors counter reflects ALL failed requests, not just those whose exceptions bubble up.

No-op when the OTEL API is unavailable or instruments cannot be created. Best-effort: a failure inside the meter SDK is swallowed so it can never mask the original exception the caller is about to re-raise.

nemoguardrails.guardrails.telemetry.record_span_error(
span: typing.Optional[opentelemetry.trace.Span],
exc: BaseException
) -> None

Record an exception on an OTEL span and set its status to ERROR.

Also sets the error.type attribute to the exception’s class name (per OTEL GenAI conditional-required convention). Safe to call with None (no-op). Use from every span helper’s except block and from callers that swallow exceptions before they can propagate.

Best-effort: any failure while annotating the span (e.g. a broken exporter or SDK) is swallowed so it can never mask the original exception the caller is about to re-raise — notably CancelledError / GeneratorExit on a cancelled stream. Only Exception is suppressed, so a BaseException raised inside the SDK still propagates.

nemoguardrails.guardrails.telemetry.record_stream_rejected() -> None

Increment guardrails.stream.rejections by 1.

Called from the streaming path when a request arrives while the stream concurrency semaphore is fully occupied (_stream_semaphore.locked()).

nemoguardrails.guardrails.telemetry.register_nonstream_saturation_gauges(
queue: nemoguardrails.guardrails.async_work_queue.AsyncWorkQueue,
is_running: typing.Callable[[], bool]
) -> None

Register guardrails.nonstream.queued + guardrails.nonstream.active ObservableGauges on the module-level Meter.

ObservableGauges read live state at collection time, so both metrics reflect the current non-streaming queue + worker occupancy with no drift risk vs. an UpDownCounter lineage.

is_running is a zero-arg callable returning bool, deferred so each collection re-reads the current state (passing the bool directly would bake its start-time value into the closure). The callbacks return an empty observation list when it returns False — the state the flag holds after IORails.stop() flips self._running back to False. OTEL Python has no public unregister API for observable instruments, so this “no data points” fallback is the only way to stop a dead IORails instance from polluting collection.

No-op when the OTEL API is unavailable or no MeterProvider is configured.

nemoguardrails.guardrails.telemetry.request_metrics() -> typing.Generator[None, None, None]

Emit request-level OTEL metrics around the wrapped block.

Increments guardrails.requests on entry, bumps guardrails.requests.active (UpDownCounter) for the duration of the block, records guardrails.request.duration in seconds on exit, and increments guardrails.requests.errors with an error.type attribute when the block raises.

requests.active covers both non-streaming (queue-wait + execution) and streaming (semaphore hold) requests. Summing the per-path saturation metrics (nonstream.queued, nonstream.active, stream.active) should approximate this value at any collection instant.

Instruments are created lazily on first use. No-op when the OTEL API is not installed or instruments cannot be created.

nemoguardrails.guardrails.telemetry.request_span(
tracer: opentelemetry.trace.Tracer
) -> typing.Generator[typing.Tuple[opentelemetry.trace.Span, str], None, None]

Create a live guardrails.request SERVER span.

Yields (span, request_id) where request_id is derived from the OTEL trace ID. The span is ended automatically when the block exits. If an exception propagates, the span records it and sets ERROR status before re-raising.

nemoguardrails.guardrails.telemetry.set_llm_call_content(
span: typing.Optional[opentelemetry.trace.Span],
input_messages: nemoguardrails.guardrails.guardrails_types.LLMMessages,
output_text: typing.Optional[str] = None
) -> None

Capture input/output messages on a span representing a model interaction.

Used for both gen_ai.* CLIENT spans (LLM calls) and the guardrails.request SERVER span — the OTEL GenAI semconv attribute names apply to any span that represents a model interaction, so reusing the names lets backends correlate the outer guardrails request with the inner LLM call by attribute name alone.

Dispatches on :func:_use_json_span_format:

  • JSON attrs (OTEL_SEMCONV_STABILITY_OPT_IN includes gen_ai_latest_experimental): :func:_set_llm_call_content_json sets the JSON-encoded gen_ai.input.messages, gen_ai.output.messages, and gen_ai.system_instructions span attributes per the latest experimental OTEL GenAI semantic conventions.
  • Legacy events (default): :func:_set_llm_call_content_events adds one span event per input message plus a gen_ai.choice event for the assistant output.

Safe to call with span=None (no-op) so callers don’t have to branch on whether tracing is enabled. Caller is responsible for checking the content-capture flag — this helper does NOT re-check :func:is_content_capture_enabled so it stays cheap on hot paths.

nemoguardrails.guardrails.telemetry.set_rail_content(
span: typing.Optional[opentelemetry.trace.Span],
rail_input: dict[str, typing.Any],
reason: typing.Optional[str] = None
) -> None

Capture rail input + (optionally) block reason on a guardrails.rail span.

Sets guardrails.rail.input to the JSON-encoded rail_input dict (typically {"messages": [...], "bot_response": ...}). When reason is non-None, also sets guardrails.rail.reason — caller passes the human-readable block reason from the failing rail (or None when the rail passed, in which case only the input attribute is recorded).

Safe to call with span=None (no-op). No GenAI semconv covers rail spans, so these attributes live under the guardrails.* namespace alongside rail.type / rail.name / rail.stop.

nemoguardrails.guardrails.telemetry.set_request_content(
span: typing.Optional[opentelemetry.trace.Span],
input_messages: nemoguardrails.guardrails.guardrails_types.LLMMessages,
output_text: typing.Optional[str] = None
) -> None

Capture caller-facing input/output on the guardrails.request SERVER span.

Uses guardrails.request.input (JSON-encoded input messages) and guardrails.request.output (the text actually returned to the caller) rather than the gen_ai.* attribute names used on LLM CLIENT spans. This distinction matters on block paths: the LLM CLIENT span records the raw model response, while the SERVER span records the refusal message — the same gen_ai.output.messages name on both spans would carry different values and confuse backends correlating the two.

guardrails.request.input is always a JSON-encoded list of role/content message objects matching the caller’s input. guardrails.request.output is the plain string that IORails returned (REFUSAL_MESSAGE on block paths, the model’s response text on the success path). output_text=None suppresses the output attribute entirely — used by the streaming path when the stream produced no content, so an empty output is not falsely recorded.

Safe to call with span=None (no-op).

nemoguardrails.guardrails.telemetry.set_speculative_span_attrs(
span: typing.Optional[opentelemetry.trace.Span],
first_completed: str,
first_rejector: str
) -> None

Stamp speculative-generation outcome attributes on a request span.

Records which branch of the speculative race finished first (input rails vs. main LLM generation) and which one ultimately rejected the request, on the IORails guardrails.request span. Safe to call with None (no-op) so callers don’t have to branch on whether tracing is enabled — matches the record_span_error / mark_rail_stop idiom.

nemoguardrails.guardrails.telemetry.stream_active_metric() -> typing.Generator[None, None, None]

Context manager that tracks a stream as active for its full lifetime.

+1 on enter / -1 on exit (finally) on guardrails.stream.active (UpDownCounter). No-op when metrics are unavailable. Wrap the block where the stream holds a semaphore permit.

nemoguardrails.guardrails.telemetry.trace_id_to_request_id(
span: opentelemetry.trace.Span
) -> str

Derive a human-readable request ID from the span’s OTEL trace ID.

Returns the last REQUEST_ID_HEX_CHARS hex characters of the 128-bit trace ID (the low 64 bits, which carry the highest entropy). When the trace ID is zero (e.g. a NoOpTracerProvider is active) a random fallback is used.

nemoguardrails.guardrails.telemetry.traced_request(
tracer: typing.Optional[opentelemetry.trace.Tracer],
metrics_enabled: bool = False
) -> typing.Generator[nemoguardrails.guardrails.telemetry.TracedRequest, None, None]

Unified request context: sets request ID, optionally creates a span and/or emits request-level metrics.

The two signals are gated independently:

  • tracer is not None → a live guardrails.request SERVER span is created and the request ID is derived from its trace ID.
  • metrics_enabled=True → emit request-level OTEL metrics

All four combinations are valid. Metrics-only (tracer=None, metrics_enabled=True) is a supported setup for customers running cheap SLO dashboards without full trace export.

Yields a :class:TracedRequest (span, request_id). Callers that want to mark the request span ERROR from a deeply-nested scope should capture the yielded span and pass it explicitly to record_span_error — never rely on trace.get_current_span() which can return the host app’s ambient span when IORails tracing is disabled.

The request-ID ContextVar is always cleaned up on exit via :func:_cleanup_request_id, which tolerates the expected cross-context ValueError that async-generator cleanup can raise.

nemoguardrails.guardrails.telemetry._INVALID_TRACE_ID = 0
nemoguardrails.guardrails.telemetry._LEGACY_EVENT_BY_ROLE = {'system': EventNames.GEN_AI_SYSTEM_MESSAGE, 'user': EventNames.GEN_AI_USER_MESS...
nemoguardrails.guardrails.telemetry._OTEL_AVAILABLE: bool = True
nemoguardrails.guardrails.telemetry._meter = None
nemoguardrails.guardrails.telemetry._request_instruments: Optional[RequestInstruments] = None
nemoguardrails.guardrails.telemetry._tracer = None
nemoguardrails.guardrails.telemetry.log = logging.getLogger(__name__)