Span Reference
The NeMo Guardrails library emits OpenTelemetry spans, allowing you to trace individual requests. This reference documents the spans and attributes each engine produces. It covers the default LLMRails engine first, then the opt-in IORails engine.
LLMRails
LLMRails is the default engine.
It records each request in an interaction log and, after the request completes, reconstructs the spans from that log and passes them to the configured tracing adapter.
With the OpenTelemetry adapter and span_format: opentelemetry (the default), the reconstructed spans carry the OpenTelemetry GenAI semantic-convention attributes, including the token-level usage and response attributes on the LLM span.
Enable Tracing for LLMRails
LLMRails is the default engine, so no opt-in flag is needed.
Configure an OpenTelemetry TracerProvider in your application (see Quick Start), then enable tracing and route spans through the OpenTelemetry adapter.
- Unlike IORails, LLMRails routes spans through a tracing adapter, so
OpenTelemetrymust be listed undertracing.adapters. span_formatselectsopentelemetry(GenAI semantic-convention attributes, the default) orlegacy(a flat metrics dictionary, deprecated). The token-level attributes documented here requireopentelemetry.
LLMRails Span Hierarchy
A single request produces one tree.
LLMRails reconstructs these spans from the interaction log after the request completes, so the nesting reflects the rails and actions that ran.
With span_format: legacy the trace is a flat metrics dictionary instead, and the GenAI attributes below are not emitted.
LLMRails Span Reference
LLMRails sets the attributes below when span_format: opentelemetry.
Every span also carries a span.kind attribute mirroring the OpenTelemetry span kind.
guardrails.request is the root span. Span kind SERVER.
service.name appears here as a span attribute, not the OpenTelemetry
resource attribute of the same name. LLMRails reconstructs spans after
the request and attaches no Resource, so it records service.name on the
span itself. It coexists with, and does not replace, any service.name you
set on your TracerProvider resource.
guardrails.rail is one span per activated rail. Span kind INTERNAL.
guardrails.action is one span per action. Span kind INTERNAL.
{operation} {model} is one span per LLM call, named following the GenAI convention. Span kind CLIENT.
These attributes are always set:
These are read from the response and request when available:
LLMRails does not set gen_ai.usage.reasoning.output_tokens or gen_ai.request.stream.
See LLMRails and IORails Attribute Differences for the full comparison.
IORails
This section describes every span the IORails engine emits.
IORails creates spans while a request executes and emits them directly to the OpenTelemetry API.
To enable tracing, set tracing.enabled: true and configure a TracerProvider in your application.
The LLM span carries the token-level GenAI attributes for token usage, response metadata, and request sampling parameters that follow the OpenTelemetry GenAI semantic conventions.
Experimental Feature
The spans in this section are emitted by the opt-in IORails engine. Enable
it either by constructing Guardrails(config, use_iorails=True) or by
setting NEMO_GUARDRAILS_IORAILS_ENGINE=1, which aliases the top-level
LLMRails import to Guardrails. IORails is an early-release feature, and
span names and attributes can change as the OpenTelemetry GenAI semantic
conventions evolve.
Enable Tracing for IORails
Tracing needs two things: the IORails engine selected, and an OpenTelemetry TracerProvider configured by your application.
The library depends on the OpenTelemetry API only; without a TracerProvider the API returns a no-op tracer and every span is silently discarded.
-
Install the library with tracing support and the OpenTelemetry SDK.
-
Configure the SDK before constructing the engine, select IORails, and enable tracing in the config.
For production exporters (OTLP) and ecosystem compatibility, see OpenTelemetry.
The configuration above does not set adapters or span_format; IORails does not use them.
IORails Span Hierarchy
A single request produces one tree. Span nesting follows execution, so the parent of an LLM or API span depends on where the call is made.
- The main generation call is a child of
guardrails.request. - An LLM or API call made by a rail action is a child of that
guardrails.actionspan. - Streaming requests propagate trace context across the internal task boundary, so streamed spans keep the same parent.
Every span sets ERROR status and records the exception when one propagates through it.
IORails Span Reference
guardrails.request is the root span for a request. Span kind SERVER.
When speculative generation is active, the request span also carries these attributes:
guardrails.rail is one span per rail that runs, wrapping the rail’s execution. Span kind INTERNAL.
guardrails.action is one span per action a rail invokes. Span kind INTERNAL.
chat {model} is a CLIENT span for one LLM call, named {operation} {model}, such as chat gpt-4o-mini.
The operation is chat because IORails issues chat completions.
This span carries the token-level GenAI attributes; see Token-Level Attributes for the emission semantics.
The three identifier attributes are always set:
The response and usage attributes are read from the model response. Each is set only when its source value is present, so backends can distinguish an absent value from a real zero:
The request sampling parameters are read from the request kwargs. Each is set only when present on the request:
When content capture is enabled, the LLM span also records the prompt and completion. See Capturing Message Content.
api {name} is a CLIENT span for a non-LLM API call, such as a jailbreak-detection endpoint, named api {name}. Span kind CLIENT.
These endpoints are plain HTTP services rather than GenAI operations, so the span uses api.name instead of the gen_ai.* attributes. HTTP transport attributes can be added later without conflict.
Token-Level Attributes
The response, usage, and request-parameter attributes on the LLM span are non-sensitive telemetry. Both engines record them whenever the span exists. There is no content-capture gate and no metrics gate. Enabling tracing is sufficient to get them.
How the values are sourced differs by engine:
- LLMRails reconstructs them from the recorded LLM call after the request completes.
- IORails reads them inline as the call runs. For non-streaming calls the values come off the model response. For streaming calls, the values are accumulated across chunks: the model and response ID arrive on early chunks, the finish reason and token usage on the terminal chunk, and the values are written to the span once the stream ends.
For streaming IORails responses, token usage is present only when the upstream provider returns a usage field, which commonly requires forwarding stream_options.include_usage=true.
When usage is absent, the usage attributes are not set, which is deliberately distinct from recording zero tokens.
gen_ai.usage.reasoning.output_tokens is a span attribute, not a metric
label. The gen_ai.client.token.usage metric’s required gen_ai.token.type
label takes only input or output; reasoning tokens are exposed here on
the span instead. See Metric Reference.
How LLMRails and IORails Tracing Differ
Both engines emit OpenTelemetry spans, and both populate the GenAI semantic-convention attributes on the LLM span. They differ in how the spans are produced and how tracing is configured.
The tracing.adapters and tracing.span_format configuration fields apply
only to LLMRails. IORails reads tracing.enabled and otherwise ignores
them. There is no adapter to select because spans are emitted straight to
the OpenTelemetry API.
LLMRails and IORails Attribute Differences
The emitted attributes are the same across both engines except for the following.
All other attributes, including the identifiers (gen_ai.operation.name, gen_ai.request.model, gen_ai.provider.name), gen_ai.usage.input_tokens / output_tokens, the gen_ai.response.* attributes, and the gen_ai.request.* sampling parameters, are emitted by both engines.
gen_ai.response.model is emitted by both, but LLMRails always sets it while IORails sets it only when the provider returns it.
Capturing Message Content
Prompt and completion content is gated and off by default, because it can contain sensitive data.
Both engines gate it on tracing.enable_content_capture (default false).
LLMRails. When enabled, LLMRails records the prompt and completion as span events on the LLM span (currently the gen_ai.content.prompt and gen_ai.content.completion events), along with conversation events.
IORails. Enable with tracing.enable_content_capture: true or the OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT environment variable, which overrides the config value.
When enabled, IORails records content on the request span (guardrails.request.input / guardrails.request.output), the rail spans (guardrails.rail.input, and guardrails.rail.reason on a block), and the LLM span.
On the LLM span the format depends on the OTEL_SEMCONV_STABILITY_OPT_IN environment variable:
- When it contains
gen_ai_latest_experimental, content is written as the JSON span attributesgen_ai.input.messagesandgen_ai.output.messages. - Otherwise, content is written as span events (
gen_ai.user.message,gen_ai.assistant.message,gen_ai.system.message, andgen_ai.choice).
The OTEL_SEMCONV_STABILITY_OPT_IN selector controls the IORails content format and is independent of the tracing.span_format field, which selects the LLMRails adapter format.
Public API Stability
The span names and attribute names on this page are part of each engine’s observable contract, so dashboards and queries can reference them.
Attribute names follow the OpenTelemetry GenAI semantic conventions, which are still under active development and can change as the spec matures.
Pin your opentelemetry-sdk version and review release notes before upgrading.
Related Resources
- Quick Start: Minimal tracing setup with the OpenTelemetry SDK.
- OpenTelemetry: Production exporters and ecosystem compatibility.
- Metric Reference: The metrics IORails emits, including the
gen_ai.client.token.usagehistogram. - OpenTelemetry GenAI spans specification: Upstream semantic conventions for span names and attributes.