> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/guardrails/llms.txt.
> For full documentation content, see https://docs.nvidia.com/nemo/guardrails/llms-full.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/guardrails/_mcp/server.

# nemoguardrails.guardrails.telemetry

Inline OpenTelemetry instrumentation for the IORails engine.

All OpenTelemetry API imports are isolated in this module so the rest of the
guardrails package never imports `opentelemetry` directly.  When the
`opentelemetry-api` package is not installed, the public entry points
`is_tracing_enabled`, `get_tracer`, `get_meter`, and `traced_request`
degrade gracefully (returning `False`, `None`, or a no-span / no-metric
passthrough respectively).  Lower-level helpers like `request_span` and
`trace_id_to_request_id` require OTEL to be available and are only
reachable through `traced_request` when a non-`None` tracer is provided.

## Module Contents

### Classes

| Name                                                                            | Description                                            |
| ------------------------------------------------------------------------------- | ------------------------------------------------------ |
| [`RequestInstruments`](#nemoguardrails-guardrails-telemetry-RequestInstruments) | Request-level OTEL instruments for the IORails engine. |
| [`TracedRequest`](#nemoguardrails-guardrails-telemetry-TracedRequest)           | Handle yielded by `traced_request`.                    |

### Functions

| Name                                                                                                                | Description                                                                 |
| ------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------- |
| [`_cleanup_request_id`](#nemoguardrails-guardrails-telemetry-_cleanup_request_id)                                   | Reset the request-ID ContextVar from a cleanup path, tolerating the         |
| [`_ensure_request_instruments`](#nemoguardrails-guardrails-telemetry-_ensure_request_instruments)                   | Lazily create the request-level instruments and return them as a            |
| [`_non_system_input_messages`](#nemoguardrails-guardrails-telemetry-_non_system_input_messages)                     | Return the OTEL GenAI `gen_ai.input.messages` form for non-system messages. |
| [`_set_llm_call_content_events`](#nemoguardrails-guardrails-telemetry-_set_llm_call_content_events)                 | Legacy-event branch of :func:`set_llm_call_content`.                        |
| [`_set_llm_call_content_json`](#nemoguardrails-guardrails-telemetry-_set_llm_call_content_json)                     | JSON-attribute branch of :func:`set_llm_call_content`.                      |
| [`_system_parts_from_messages`](#nemoguardrails-guardrails-telemetry-_system_parts_from_messages)                   | Return the bare OTEL GenAI `parts` for system messages only.                |
| [`_use_json_span_format`](#nemoguardrails-guardrails-telemetry-_use_json_span_format)                               | Return True iff OTEL\_SEMCONV\_STABILITY\_OPT\_IN selects JSON span attrs.  |
| [`action_span`](#nemoguardrails-guardrails-telemetry-action_span)                                                   | Create a `guardrails.action` INTERNAL span for a rail action execution.     |
| [`api_call_span`](#nemoguardrails-guardrails-telemetry-api_call_span)                                               | Create a CLIENT span for a non-LLM API call (e.g., jailbreak detection).    |
| [`are_metrics_enabled`](#nemoguardrails-guardrails-telemetry-are_metrics_enabled)                                   | Return `True` when inline OTEL metrics should be emitted.                   |
| [`get_meter`](#nemoguardrails-guardrails-telemetry-get_meter)                                                       | Return a cached OpenTelemetry meter for nemo-guardrails, or `None`.         |
| [`get_tracer`](#nemoguardrails-guardrails-telemetry-get_tracer)                                                     | Return a cached OpenTelemetry tracer for nemo-guardrails, or `None`.        |
| [`is_content_capture_enabled`](#nemoguardrails-guardrails-telemetry-is_content_capture_enabled)                     | Return True when message content should be captured onto spans.             |
| [`is_tracing_enabled`](#nemoguardrails-guardrails-telemetry-is_tracing_enabled)                                     | Return `True` when inline OTEL tracing should be active.                    |
| [`llm_call_span`](#nemoguardrails-guardrails-telemetry-llm_call_span)                                               | Create a CLIENT span for an LLM call following GenAI semantic conventions.  |
| [`mark_rail_stop`](#nemoguardrails-guardrails-telemetry-mark_rail_stop)                                             | Set `rail.stop=True` on a rail span when the rail blocked the request.      |
| [`rail_span`](#nemoguardrails-guardrails-telemetry-rail_span)                                                       | Create a `guardrails.rail` INTERNAL span for a single rail execution.       |
| [`record_nonstream_rejected`](#nemoguardrails-guardrails-telemetry-record_nonstream_rejected)                       | Increment `guardrails.nonstream.rejections` by 1.                           |
| [`record_request_blocked`](#nemoguardrails-guardrails-telemetry-record_request_blocked)                             | Increment `guardrails.requests.blocked` with a `rail.type` label.           |
| [`record_request_error`](#nemoguardrails-guardrails-telemetry-record_request_error)                                 | Increment `guardrails.requests.errors` with an `error.type` label.          |
| [`record_span_error`](#nemoguardrails-guardrails-telemetry-record_span_error)                                       | Record an exception on an OTEL span and set its status to ERROR.            |
| [`record_stream_rejected`](#nemoguardrails-guardrails-telemetry-record_stream_rejected)                             | Increment `guardrails.stream.rejections` by 1.                              |
| [`register_nonstream_saturation_gauges`](#nemoguardrails-guardrails-telemetry-register_nonstream_saturation_gauges) | Register `guardrails.nonstream.queued` + `guardrails.nonstream.active`      |
| [`request_metrics`](#nemoguardrails-guardrails-telemetry-request_metrics)                                           | Emit request-level OTEL metrics around the wrapped block.                   |
| [`request_span`](#nemoguardrails-guardrails-telemetry-request_span)                                                 | Create a live `guardrails.request` SERVER span.                             |
| [`set_llm_call_content`](#nemoguardrails-guardrails-telemetry-set_llm_call_content)                                 | Capture input/output messages on a span representing a model interaction.   |
| [`set_rail_content`](#nemoguardrails-guardrails-telemetry-set_rail_content)                                         | Capture rail input + (optionally) block reason on a `guardrails.rail` span. |
| [`set_request_content`](#nemoguardrails-guardrails-telemetry-set_request_content)                                   | Capture caller-facing input/output on the `guardrails.request` SERVER span. |
| [`set_speculative_span_attrs`](#nemoguardrails-guardrails-telemetry-set_speculative_span_attrs)                     | Stamp speculative-generation outcome attributes on a request span.          |
| [`stream_active_metric`](#nemoguardrails-guardrails-telemetry-stream_active_metric)                                 | Context manager that tracks a stream as active for its full lifetime.       |
| [`trace_id_to_request_id`](#nemoguardrails-guardrails-telemetry-trace_id_to_request_id)                             | Derive a human-readable request ID from the span's OTEL trace ID.           |
| [`traced_request`](#nemoguardrails-guardrails-telemetry-traced_request)                                             | Unified request context: sets request ID, optionally creates a span         |

### Data

[`_INVALID_TRACE_ID`](#nemoguardrails-guardrails-telemetry-_INVALID_TRACE_ID)

[`_LEGACY_EVENT_BY_ROLE`](#nemoguardrails-guardrails-telemetry-_LEGACY_EVENT_BY_ROLE)

[`_OTEL_AVAILABLE`](#nemoguardrails-guardrails-telemetry-_OTEL_AVAILABLE)

[`_meter`](#nemoguardrails-guardrails-telemetry-_meter)

[`_request_instruments`](#nemoguardrails-guardrails-telemetry-_request_instruments)

[`_tracer`](#nemoguardrails-guardrails-telemetry-_tracer)

[`log`](#nemoguardrails-guardrails-telemetry-log)

### API

```python
class nemoguardrails.guardrails.telemetry.RequestInstruments(
    requests: opentelemetry.metrics.Counter,
    errors: opentelemetry.metrics.Counter,
    blocked: opentelemetry.metrics.Counter,
    duration: opentelemetry.metrics.Histogram,
    requests_active: opentelemetry.metrics.UpDownCounter,
    nonstream_rejections: opentelemetry.metrics.Counter,
    stream_active: opentelemetry.metrics.UpDownCounter,
    stream_rejections: opentelemetry.metrics.Counter
)
```

Dataclass

Request-level OTEL instruments for the IORails engine.

Field names mirror the emitted metric names (minus the `guardrails.`
prefix).  The saturation-metric group covers the full request lifecycle:

* Aggregate: `requests_active` (`guardrails.requests.active`)
* Non-streaming path: `nonstream_rejections`
  (`guardrails.nonstream.rejections`); the two gauges
  `nonstream.queued` and `nonstream.active` are registered
  separately via `register_nonstream_saturation_gauges` because
  ObservableGauges need a live queue reference.
* Streaming path: `stream_active` (`guardrails.stream.active`)
  and `stream_rejections` (`guardrails.stream.rejections`).

```python
class nemoguardrails.guardrails.telemetry.TracedRequest()
```

**Bases:** `NamedTuple`

Handle yielded by `traced_request`.

`span` is the IORails `guardrails.request` span when tracing is
enabled, or `None` when it is not.  `request_id` is always a
16-char hex string.  Unpacks as `(span, request_id)` for callers
that prefer positional access.

```python
nemoguardrails.guardrails.telemetry._cleanup_request_id(
    token
) -> None
```

Reset the request-ID ContextVar from a cleanup path, tolerating the
one expected `ValueError`.

`ContextVar.reset()` raises `ValueError("... was created in a
different Context")` when called from a different asyncio Context
than where `.set()` was called.  That happens during async-generator
cleanup (`aclose()` running in an outer task's context) and is the
only `ValueError` that `reset_request_id` raises today.  Any
other `ValueError` indicates an unexpected bug in the helper and is
re-raised so callers see it.

```python
nemoguardrails.guardrails.telemetry._ensure_request_instruments() -> typing.Optional[nemoguardrails.guardrails.telemetry.RequestInstruments]
```

Lazily create the request-level instruments and return them as a
:class:`RequestInstruments`.  Returns `None` when the OTEL API is not
installed.

```python
nemoguardrails.guardrails.telemetry._non_system_input_messages(
    messages: nemoguardrails.guardrails.guardrails_types.LLMMessages
) -> list[dict]
```

Return the OTEL GenAI `gen_ai.input.messages` form for non-system messages.

Each non-system message is role-wrapped as `&#123;"role": role, "parts": [&#123;"type": "text", "content": content&#125;]&#125;`.  Named for the attribute it
populates rather than "parts" because — unlike
:func:`_system_parts_from_messages` — it keeps the role wrapper.

Example::

\>>> \_non\_system\_input\_messages(\[
...     \{"role": "system", "content": "be helpful"},
...     \{"role": "user", "content": "hi"},
... ])
\[\{"role": "user", "parts": \[\{"type": "text", "content": "hi"}]}]

```python
nemoguardrails.guardrails.telemetry._set_llm_call_content_events(
    span: opentelemetry.trace.Span,
    input_messages: nemoguardrails.guardrails.guardrails_types.LLMMessages,
    output_text: typing.Optional[str]
) -> None
```

Legacy-event branch of :func:`set_llm_call_content`.

Adds one span event per input message (`gen_ai.system.message` /
`gen_ai.user.message` / `gen_ai.assistant.message` /
`gen_ai.tool.message`) plus a `gen_ai.choice` event for the
assistant output.  Roles not in :data:`_LEGACY_EVENT_BY_ROLE`
(e.g. `function`) are skipped silently.

```python
nemoguardrails.guardrails.telemetry._set_llm_call_content_json(
    span: opentelemetry.trace.Span,
    input_messages: nemoguardrails.guardrails.guardrails_types.LLMMessages,
    output_text: typing.Optional[str]
) -> None
```

JSON-attribute branch of :func:`set_llm_call_content`.

Sets `gen_ai.input.messages`, `gen_ai.output.messages`, and
`gen_ai.system_instructions` as JSON-encoded span attributes per
the latest experimental OTEL GenAI semantic conventions.  Attributes
are only set when non-empty so backends can distinguish "no system
instructions" from "system instructions == ''".

```python
nemoguardrails.guardrails.telemetry._system_parts_from_messages(
    messages: nemoguardrails.guardrails.guardrails_types.LLMMessages
) -> list[dict]
```

Return the bare OTEL GenAI `parts` for system messages only.

Feeds `gen_ai.system_instructions`, which the spec defines as a flat
list of parts with no role wrapper (every entry is implicitly system).
Asymmetric with :func:`_non_system_input_messages`, which keeps the role
wrapper — the two attributes have different shapes by spec.  Entries
missing `role` or `content` are skipped silently.

Example::

\>>> \_system\_parts\_from\_messages(\[
...     \{"role": "system", "content": "be helpful"},
...     \{"role": "user", "content": "hi"},
... ])
\[\{"type": "text", "content": "be helpful"}]

```python
nemoguardrails.guardrails.telemetry._use_json_span_format() -> bool
```

Return True iff OTEL\_SEMCONV\_STABILITY\_OPT\_IN selects JSON span attrs.

The env var holds a comma-separated list of opt-in tokens.  When
`gen_ai_latest_experimental` is present, content is emitted as
JSON-encoded span attributes, otherwise as legacy per-message span events.
Read fresh each call so runtime changes to the env var take effect
immediately.

```python
nemoguardrails.guardrails.telemetry.action_span(
    tracer: typing.Optional[opentelemetry.trace.Tracer],
    action_name: str
) -> typing.Generator[typing.Optional[opentelemetry.trace.Span], None, None]
```

Create a `guardrails.action` INTERNAL span for a rail action execution.

Yields the span (or `None` when *tracer* is `None`).

```python
nemoguardrails.guardrails.telemetry.api_call_span(
    tracer: typing.Optional[opentelemetry.trace.Tracer],
    api_name: str
) -> typing.Generator[typing.Optional[opentelemetry.trace.Span], None, None]
```

Create a CLIENT span for a non-LLM API call (e.g., jailbreak detection).

Uses the `api.name` attribute rather than `gen_ai.operation.name`
because these APIs are plain HTTP endpoints, not GenAI operations.
`http.*` transport attributes can be added additively later without
conflict.  Yields the span (or `None` when *tracer* is `None`).

```python
nemoguardrails.guardrails.telemetry.are_metrics_enabled(
    config_metrics: typing.Optional[nemoguardrails.rails.llm.config.MetricsConfig]
) -> bool
```

Return `True` when inline OTEL metrics should be emitted.

Requires the `opentelemetry-api` package to be installed **and**
`config.metrics.enabled` to be `True`.  Independent of
:func:`is_tracing_enabled` — OTEL signals (traces, metrics, logs) are
designed to be toggled independently so customers can, for example,
run metrics-only for cost-optimized SLO dashboards without the
overhead of full trace export.

```python
nemoguardrails.guardrails.telemetry.get_meter() -> typing.Optional[opentelemetry.metrics.Meter]
```

Return a cached OpenTelemetry meter for nemo-guardrails, or `None`.

The meter is obtained via the OTEL API (not SDK), following the library
instrumentation best practice.  The application is responsible for
configuring a `MeterProvider` before any metrics are recorded; without
one, the API returns a no-op meter and all emissions are silently
discarded.

```python
nemoguardrails.guardrails.telemetry.get_tracer() -> typing.Optional[opentelemetry.trace.Tracer]
```

Return a cached OpenTelemetry tracer for nemo-guardrails, or `None`.

The tracer is obtained via the OTEL API (not SDK), following the library
instrumentation best practice.  The application is responsible for
configuring a `TracerProvider` before any spans are created.

```python
nemoguardrails.guardrails.telemetry.is_content_capture_enabled(
    config_tracing: typing.Optional[nemoguardrails.rails.llm.config.TracingConfig]
) -> bool
```

Return True when message content should be captured onto spans.

`OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT` is the
primary control — when set, it overrides any config-file value so
operators have a single OTEL-standard env var that flips capture
across all services regardless of what the deployed config says.
Recognized values (case-insensitive, surrounding whitespace
stripped): `true` / `1` enable; `false` / `0` disable; any
other value falls through to the config field.

When the env var is absent or unrecognized, capture is on iff
`config.tracing.enable_content_capture` is True.

Callers should ALSO require :func:`is_tracing_enabled` before
treating capture as active — there is no point capturing content
onto spans that won't be exported.  This helper deliberately does
not perform that check itself so it stays orthogonal to the
tracing-enabled signal (and so tests can exercise each independently).

```python
nemoguardrails.guardrails.telemetry.is_tracing_enabled(
    config_tracing: typing.Optional[nemoguardrails.rails.llm.config.TracingConfig]
) -> bool
```

Return `True` when inline OTEL tracing should be active.

Requires the `opentelemetry-api` package to be installed **and**
`config.tracing.enabled` to be `True`.  Other `TracingConfig`
fields (`adapters`, `span_format`) are used by the LLMRails
post-hoc tracing path and are ignored here.

```python
nemoguardrails.guardrails.telemetry.llm_call_span(
    tracer: typing.Optional[opentelemetry.trace.Tracer],
    model_name: str,
    provider_name: str,
    operation_name: str = 'chat'
) -> typing.Generator[typing.Optional[opentelemetry.trace.Span], None, None]
```

Create a CLIENT span for an LLM call following GenAI semantic conventions.

Span name follows the OTEL pattern: `"&#123;operation_name&#125; &#123;model_name&#125;"`.

`operation_name` defaults to `"chat"` because IORails only issues
chat completions. In the future if any other non-chat LLM  operations are
supported, callers should pass an explicit `operation_name` from the
OTEL GenAI semantic conventions.

Yields the span (or `None` when *tracer* is `None`).

```python
nemoguardrails.guardrails.telemetry.mark_rail_stop(
    span: typing.Optional[opentelemetry.trace.Span],
    is_safe: bool
) -> None
```

Set `rail.stop=True` on a rail span when the rail blocked the request.

Safe to call with `None` (no-op) so callers don't have to branch on
whether a real span was produced — matches the `record_span_error`
idiom.  Only marks stop when *is\_safe* is `False`; a passing rail
leaves the attribute unset.

```python
nemoguardrails.guardrails.telemetry.rail_span(
    tracer: typing.Optional[opentelemetry.trace.Tracer],
    flow: str,
    direction: nemoguardrails.guardrails.guardrails_types.RailDirection
) -> typing.Generator[typing.Optional[opentelemetry.trace.Span], None, None]
```

Create a `guardrails.rail` INTERNAL span for a single rail execution.

Yields the span (or `None` when *tracer* is `None`).
The caller should set `rail.stop` on the span after execution if the
rail blocked the request.

```python
nemoguardrails.guardrails.telemetry.record_nonstream_rejected() -> None
```

Increment `guardrails.nonstream.rejections` by 1.

Called from the non-streaming path when the admission queue rejects a
submission with `asyncio.QueueFull` (the queue's `reject_on_full`
behaviour, triggered when `NONSTREAM_QUEUE_DEPTH` is exceeded).

```python
nemoguardrails.guardrails.telemetry.record_request_blocked(
    direction: nemoguardrails.guardrails.guardrails_types.RailDirection
) -> None
```

Increment `guardrails.requests.blocked` with a `rail.type` label.

Fires at the block sites in `iorails.py` (`_do_generate` for the
non-streaming path, `_generation_task` for streaming) whenever the
request returns `REFUSAL_MESSAGE` because an input or output rail
flagged it.  The counter is cumulative over the process lifetime; a
per-rail grain (`rail.name`) will be added in split-2 alongside
`guardrails.rail.blocked`.

No-op when the OTEL API is unavailable or instruments cannot be created.

```python
nemoguardrails.guardrails.telemetry.record_request_error(
    exc: BaseException
) -> None
```

Increment `guardrails.requests.errors` with an `error.type` label.

`request_metrics` already bumps this counter when an exception
propagates through its `except` branch (the non-streaming path).
Streaming code paths catch-and-swallow exceptions inside
`_generation_task` — converting them to error-payload chunks —
so the counter never sees them via propagation.  Those paths should
call this helper explicitly so the errors counter reflects ALL failed
requests, not just those whose exceptions bubble up.

No-op when the OTEL API is unavailable or instruments cannot be created.
Best-effort: a failure inside the meter SDK is swallowed so it can never
mask the original exception the caller is about to re-raise.

```python
nemoguardrails.guardrails.telemetry.record_span_error(
    span: typing.Optional[opentelemetry.trace.Span],
    exc: BaseException
) -> None
```

Record an exception on an OTEL span and set its status to ERROR.

Also sets the `error.type` attribute to the exception's class name
(per OTEL GenAI conditional-required convention).  Safe to call with
`None` (no-op).  Use from every span helper's `except` block and
from callers that swallow exceptions before they can propagate.

Best-effort: any failure while annotating the span (e.g. a broken
exporter or SDK) is swallowed so it can never mask the original
exception the caller is about to re-raise — notably `CancelledError`
/ `GeneratorExit` on a cancelled stream.  Only `Exception` is
suppressed, so a `BaseException` raised *inside* the SDK still
propagates.

```python
nemoguardrails.guardrails.telemetry.record_stream_rejected() -> None
```

Increment `guardrails.stream.rejections` by 1.

Called from the streaming path when a request arrives while the stream
concurrency semaphore is fully occupied (`_stream_semaphore.locked()`).

```python
nemoguardrails.guardrails.telemetry.register_nonstream_saturation_gauges(
    queue: nemoguardrails.guardrails.async_work_queue.AsyncWorkQueue,
    is_running: typing.Callable[[], bool]
) -> None
```

Register `guardrails.nonstream.queued` + `guardrails.nonstream.active`
ObservableGauges on the module-level Meter.

ObservableGauges read live state at collection time, so both metrics
reflect the *current* non-streaming queue + worker occupancy with no
drift risk vs. an UpDownCounter lineage.

`is_running` is a zero-arg callable returning `bool`, deferred
so each collection re-reads the current state (passing the bool
directly would bake its start-time value into the closure).  The
callbacks return an empty observation list when it returns `False`
— the state the flag holds after `IORails.stop()` flips
`self._running` back to False.  OTEL Python has no public
unregister API for observable instruments, so this "no data points"
fallback is the only way to stop a dead IORails instance from
polluting collection.

No-op when the OTEL API is unavailable or no MeterProvider is
configured.

```python
nemoguardrails.guardrails.telemetry.request_metrics() -> typing.Generator[None, None, None]
```

Emit request-level OTEL metrics around the wrapped block.

Increments `guardrails.requests` on entry, bumps
`guardrails.requests.active` (UpDownCounter) for the duration of
the block, records `guardrails.request.duration` in seconds on
exit, and increments `guardrails.requests.errors` with an
`error.type` attribute when the block raises.

`requests.active` covers both non-streaming (queue-wait + execution)
and streaming (semaphore hold) requests.
Summing the per-path saturation metrics
(`nonstream.queued`, `nonstream.active`, `stream.active`)
should approximate this value at any collection instant.

Instruments are created lazily on first use.  No-op when the OTEL
API is not installed or instruments cannot be created.

```python
nemoguardrails.guardrails.telemetry.request_span(
    tracer: opentelemetry.trace.Tracer
) -> typing.Generator[typing.Tuple[opentelemetry.trace.Span, str], None, None]
```

Create a live `guardrails.request` SERVER span.

Yields `(span, request_id)` where *request\_id* is derived from the
OTEL trace ID.  The span is ended automatically when the block exits.
If an exception propagates, the span records it and sets ERROR status
before re-raising.

```python
nemoguardrails.guardrails.telemetry.set_llm_call_content(
    span: typing.Optional[opentelemetry.trace.Span],
    input_messages: nemoguardrails.guardrails.guardrails_types.LLMMessages,
    output_text: typing.Optional[str] = None
) -> None
```

Capture input/output messages on a span representing a model interaction.

Used for both `gen_ai.*` CLIENT spans (LLM calls) and the
`guardrails.request` SERVER span — the OTEL GenAI semconv
attribute names apply to any span that represents a model
interaction, so reusing the names lets backends correlate the outer
guardrails request with the inner LLM call by attribute name alone.

Dispatches on :func:`_use_json_span_format`:

* **JSON attrs** (`OTEL_SEMCONV_STABILITY_OPT_IN` includes
  `gen_ai_latest_experimental`): :func:`_set_llm_call_content_json`
  sets the JSON-encoded `gen_ai.input.messages`,
  `gen_ai.output.messages`, and `gen_ai.system_instructions`
  span attributes per the latest experimental OTEL GenAI semantic
  conventions.
* **Legacy events** (default): :func:`_set_llm_call_content_events`
  adds one span event per input message plus a `gen_ai.choice`
  event for the assistant output.

Safe to call with `span=None` (no-op) so callers don't have to
branch on whether tracing is enabled.  Caller is responsible for
checking the content-capture flag — this helper does NOT re-check
:func:`is_content_capture_enabled` so it stays cheap on hot paths.

```python
nemoguardrails.guardrails.telemetry.set_rail_content(
    span: typing.Optional[opentelemetry.trace.Span],
    rail_input: dict[str, typing.Any],
    reason: typing.Optional[str] = None
) -> None
```

Capture rail input + (optionally) block reason on a `guardrails.rail` span.

Sets `guardrails.rail.input` to the JSON-encoded *rail\_input* dict
(typically `&#123;"messages": [...], "bot_response": ...&#125;`).  When
*reason* is non-None, also sets `guardrails.rail.reason` — caller
passes the human-readable block reason from the failing rail (or
`None` when the rail passed, in which case only the input
attribute is recorded).

Safe to call with `span=None` (no-op).  No GenAI semconv covers
rail spans, so these attributes live under the guardrails.\* namespace
alongside `rail.type` / `rail.name` / `rail.stop`.

```python
nemoguardrails.guardrails.telemetry.set_request_content(
    span: typing.Optional[opentelemetry.trace.Span],
    input_messages: nemoguardrails.guardrails.guardrails_types.LLMMessages,
    output_text: typing.Optional[str] = None
) -> None
```

Capture caller-facing input/output on the `guardrails.request` SERVER span.

Uses `guardrails.request.input` (JSON-encoded input messages) and
`guardrails.request.output` (the text actually returned to the caller)
rather than the `gen_ai.*` attribute names used on LLM CLIENT spans.
This distinction matters on block paths: the LLM CLIENT span records the
raw model response, while the SERVER span records the refusal message —
the same `gen_ai.output.messages` name on both spans would carry
different values and confuse backends correlating the two.

`guardrails.request.input` is always a JSON-encoded list of role/content
message objects matching the caller's input.  `guardrails.request.output`
is the plain string that IORails returned (REFUSAL\_MESSAGE on block paths,
the model's response text on the success path).  `output_text=None`
suppresses the output attribute entirely — used by the streaming path when
the stream produced no content, so an empty output is not falsely recorded.

Safe to call with `span=None` (no-op).

```python
nemoguardrails.guardrails.telemetry.set_speculative_span_attrs(
    span: typing.Optional[opentelemetry.trace.Span],
    first_completed: str,
    first_rejector: str
) -> None
```

Stamp speculative-generation outcome attributes on a request span.

Records which branch of the speculative race finished first
(input rails vs. main LLM generation) and which one ultimately
rejected the request, on the IORails `guardrails.request` span.
Safe to call with `None` (no-op) so callers don't have to branch
on whether tracing is enabled — matches the `record_span_error` /
`mark_rail_stop` idiom.

```python
nemoguardrails.guardrails.telemetry.stream_active_metric() -> typing.Generator[None, None, None]
```

Context manager that tracks a stream as active for its full lifetime.

`+1` on enter / `-1` on exit (`finally`) on
`guardrails.stream.active` (UpDownCounter).  No-op when metrics are
unavailable.  Wrap the block where the stream holds a semaphore permit.

```python
nemoguardrails.guardrails.telemetry.trace_id_to_request_id(
    span: opentelemetry.trace.Span
) -> str
```

Derive a human-readable request ID from the span's OTEL trace ID.

Returns the last `REQUEST_ID_HEX_CHARS` hex characters of the 128-bit
trace ID (the low 64 bits, which carry the highest entropy).  When the
trace ID is zero (e.g. a `NoOpTracerProvider` is active) a random
fallback is used.

```python
nemoguardrails.guardrails.telemetry.traced_request(
    tracer: typing.Optional[opentelemetry.trace.Tracer],
    metrics_enabled: bool = False
) -> typing.Generator[nemoguardrails.guardrails.telemetry.TracedRequest, None, None]
```

Unified request context: sets request ID, optionally creates a span
and/or emits request-level metrics.

The two signals are gated **independently**:

* `tracer is not None` → a live `guardrails.request` SERVER span
  is created and the request ID is derived from its trace ID.
* `metrics_enabled=True` → emit request-level OTEL metrics

All four combinations are valid.  Metrics-only (`tracer=None,
metrics_enabled=True`) is a supported setup for customers running
cheap SLO dashboards without full trace export.

Yields a :class:`TracedRequest` (`span`, `request_id`).  Callers
that want to mark the request span ERROR from a deeply-nested scope
should capture the yielded span and pass it explicitly to
`record_span_error` — never rely on `trace.get_current_span()`
which can return the host app's ambient span when IORails tracing is
disabled.

The request-ID ContextVar is always cleaned up on exit via
:func:`_cleanup_request_id`, which tolerates the expected
cross-context `ValueError` that async-generator cleanup can raise.

```python
nemoguardrails.guardrails.telemetry._INVALID_TRACE_ID = 0
```

```python
nemoguardrails.guardrails.telemetry._LEGACY_EVENT_BY_ROLE = {'system': EventNames.GEN_AI_SYSTEM_MESSAGE, 'user': EventNames.GEN_AI_USER_MESS...
```

```python
nemoguardrails.guardrails.telemetry._OTEL_AVAILABLE: bool = True
```

```python
nemoguardrails.guardrails.telemetry._meter = None
```

```python
nemoguardrails.guardrails.telemetry._request_instruments: Optional[RequestInstruments] = None
```

```python
nemoguardrails.guardrails.telemetry._tracer = None
```

```python
nemoguardrails.guardrails.telemetry.log = logging.getLogger(__name__)
```