Span Reference

View as Markdown

The NeMo Guardrails library emits OpenTelemetry spans, allowing you to trace individual requests. This reference documents the spans and attributes each engine produces. It covers the default LLMRails engine first, then the opt-in IORails engine.

LLMRails

LLMRails is the default engine. It records each request in an interaction log and, after the request completes, reconstructs the spans from that log and passes them to the configured tracing adapter. With the OpenTelemetry adapter and span_format: opentelemetry (the default), the reconstructed spans carry the OpenTelemetry GenAI semantic-convention attributes, including the token-level usage and response attributes on the LLM span.

Enable Tracing for LLMRails

LLMRails is the default engine, so no opt-in flag is needed. Configure an OpenTelemetry TracerProvider in your application (see Quick Start), then enable tracing and route spans through the OpenTelemetry adapter.

1tracing:
2 enabled: true
3 span_format: opentelemetry # default
4 adapters:
5 - name: OpenTelemetry
  • Unlike IORails, LLMRails routes spans through a tracing adapter, so OpenTelemetry must be listed under tracing.adapters.
  • span_format selects opentelemetry (GenAI semantic-convention attributes, the default) or legacy (a flat metrics dictionary, deprecated). The token-level attributes documented here require opentelemetry.

LLMRails Span Hierarchy

A single request produces one tree.

guardrails.request SERVER one per request
└─ guardrails.rail INTERNAL one per activated rail
└─ guardrails.action INTERNAL one per action the rail runs
└─ {operation} {model} CLIENT one per LLM call

LLMRails reconstructs these spans from the interaction log after the request completes, so the nesting reflects the rails and actions that ran. With span_format: legacy the trace is a flat metrics dictionary instead, and the GenAI attributes below are not emitted.

LLMRails Span Reference

LLMRails sets the attributes below when span_format: opentelemetry. Every span also carries a span.kind attribute mirroring the OpenTelemetry span kind.

guardrails.request is the root span. Span kind SERVER.

AttributeTypeWhen setDescription
gen_ai.operation.namestringAlwaysThe value guardrails.
service.namestringAlwaysThe value nemo_guardrails.
request.idstringWhen availableRequest identifier.
user.idstringWhen availableUser identifier, when supplied.
session.idstringWhen availableSession identifier, when supplied.

service.name appears here as a span attribute, not the OpenTelemetry resource attribute of the same name. LLMRails reconstructs spans after the request and attaches no Resource, so it records service.name on the span itself. It coexists with, and does not replace, any service.name you set on your TracerProvider resource.

guardrails.rail is one span per activated rail. Span kind INTERNAL.

AttributeTypeWhen setDescription
rail.typestringAlwaysFor example input, output, or dialog.
rail.namestringAlwaysThe rail name.
rail.stopbooleanWhen setWhether the rail stopped execution.
rail.decisionsstring[]When setDecisions made by the rail.

guardrails.action is one span per action. Span kind INTERNAL.

AttributeTypeWhen setDescription
action.namestringAlwaysThe action name.
action.has_llm_callsbooleanAlwaysWhether the action made LLM calls.
action.llm_calls_countintAlwaysNumber of LLM calls the action made.
action.param.{name}scalarPer scalar parameterOne attribute per scalar action parameter.

{operation} {model} is one span per LLM call, named following the GenAI convention. Span kind CLIENT.

These attributes are always set:

AttributeTypeDescription
gen_ai.operation.namestringThe task that issued the call, or completion when no task is set.
gen_ai.request.modelstringThe model requested.
gen_ai.response.modelstringThe model that responded.
gen_ai.provider.namestringThe provider, for example openai.
llm.cache.hitbooleanWhether the response was served from the LLMRails cache. Always set, including false.

These are read from the response and request when available:

AttributeTypeWhen setDescription
gen_ai.usage.input_tokensintReturned by providerTokens in the prompt.
gen_ai.usage.output_tokensintReturned by providerTokens in the completion.
gen_ai.usage.total_tokensintReturned by providerTotal tokens for the call.
gen_ai.response.idstringReturned by providerThe provider’s response identifier.
gen_ai.response.finish_reasonsstring[]Returned by providerFinish reasons for the response.
gen_ai.request.temperaturedoubleSet on requestSampling temperature.
gen_ai.request.max_tokensintSet on requestMaximum tokens to generate.
gen_ai.request.top_pdoubleSet on requestNucleus sampling probability.
gen_ai.request.top_kintSet on requestTop-k sampling cutoff.
gen_ai.request.frequency_penaltydoubleSet on requestFrequency penalty.
gen_ai.request.presence_penaltydoubleSet on requestPresence penalty.
gen_ai.request.stop_sequencesstring[]Set on requestStop sequences.

LLMRails does not set gen_ai.usage.reasoning.output_tokens or gen_ai.request.stream. See LLMRails and IORails Attribute Differences for the full comparison.

IORails

This section describes every span the IORails engine emits. IORails creates spans while a request executes and emits them directly to the OpenTelemetry API. To enable tracing, set tracing.enabled: true and configure a TracerProvider in your application. The LLM span carries the token-level GenAI attributes for token usage, response metadata, and request sampling parameters that follow the OpenTelemetry GenAI semantic conventions.

Experimental Feature

The spans in this section are emitted by the opt-in IORails engine. Enable it either by constructing Guardrails(config, use_iorails=True) or by setting NEMO_GUARDRAILS_IORAILS_ENGINE=1, which aliases the top-level LLMRails import to Guardrails. IORails is an early-release feature, and span names and attributes can change as the OpenTelemetry GenAI semantic conventions evolve.

Enable Tracing for IORails

Tracing needs two things: the IORails engine selected, and an OpenTelemetry TracerProvider configured by your application. The library depends on the OpenTelemetry API only; without a TracerProvider the API returns a no-op tracer and every span is silently discarded.

  1. Install the library with tracing support and the OpenTelemetry SDK.

    $pip install "nemoguardrails[tracing]" opentelemetry-sdk
  2. Configure the SDK before constructing the engine, select IORails, and enable tracing in the config.

    1from opentelemetry import trace
    2from opentelemetry.sdk.trace import TracerProvider
    3from opentelemetry.sdk.trace.export import BatchSpanProcessor, ConsoleSpanExporter
    4from opentelemetry.sdk.resources import Resource
    5
    6from nemoguardrails import Guardrails, RailsConfig
    7
    8# Configure the TracerProvider BEFORE constructing Guardrails so the engine
    9# resolves a real tracer when it creates spans.
    10resource = Resource.create({"service.name": "guardrails-app"})
    11tracer_provider = TracerProvider(resource=resource)
    12trace.set_tracer_provider(tracer_provider)
    13tracer_provider.add_span_processor(BatchSpanProcessor(ConsoleSpanExporter()))
    14
    15config_yaml = """
    16models:
    17 - type: main
    18 engine: openai
    19 model: gpt-4o-mini
    20
    21tracing:
    22 enabled: true
    23"""
    24
    25config = RailsConfig.from_content(yaml_content=config_yaml)
    26
    27# use_iorails=True selects IORails; require_iorails=True raises a ValueError
    28# if the config is incompatible with IORails.
    29rails = Guardrails(config, use_iorails=True, require_iorails=True)
    30response = rails.generate(messages=[{"role": "user", "content": "Hello!"}])
    31print(f"Response: {response}")

For production exporters (OTLP) and ecosystem compatibility, see OpenTelemetry. The configuration above does not set adapters or span_format; IORails does not use them.

IORails Span Hierarchy

A single request produces one tree. Span nesting follows execution, so the parent of an LLM or API span depends on where the call is made.

guardrails.request SERVER one per generate_async() call
├─ guardrails.rail INTERNAL one per rail that runs
│ └─ guardrails.action INTERNAL one per action the rail invokes
│ ├─ chat {model} CLIENT LLM call issued by the action (for example a content-safety model)
│ └─ api {name} CLIENT non-LLM API call issued by the action (for example jailbreak detection)
└─ chat {model} CLIENT the main protected-model generation call
  • The main generation call is a child of guardrails.request.
  • An LLM or API call made by a rail action is a child of that guardrails.action span.
  • Streaming requests propagate trace context across the internal task boundary, so streamed spans keep the same parent.

Every span sets ERROR status and records the exception when one propagates through it.

IORails Span Reference

guardrails.request is the root span for a request. Span kind SERVER.

AttributeTypeWhen setDescription
gen_ai.operation.namestringAlwaysThe value guardrails, marking the request boundary.
request.idstringAlwaysRequest identifier derived from the OpenTelemetry trace ID.
guardrails.request.inputstringContent capture onJSON-encoded input messages received from the caller.
guardrails.request.outputstringContent capture onThe text actually returned to the caller. On a blocked request this is the refusal message, not the raw model output.

When speculative generation is active, the request span also carries these attributes:

AttributeTypeWhen setDescription
speculative_generation.mode_activebooleanSpeculative generation onThe value true.
speculative_generation.first_completedstringSpeculative generation onThe branch that completed first: input_rails or generation.
speculative_generation.first_rejectorstringSpeculative generation onThe branch that rejected the request: input_rails or none.

guardrails.rail is one span per rail that runs, wrapping the rail’s execution. Span kind INTERNAL.

AttributeTypeWhen setDescription
rail.typestringAlwaysInput or Output.
rail.namestringAlwaysThe rail’s flow name.
rail.stopbooleanWhen the rail blocksSet to true only when this rail blocked the request. A passing rail leaves the attribute unset.
guardrails.rail.inputstringContent capture onJSON-encoded snapshot of the rail’s inputs.
guardrails.rail.reasonstringContent capture on, block onlyThe human-readable reason the rail blocked the request.

guardrails.action is one span per action a rail invokes. Span kind INTERNAL.

AttributeTypeWhen setDescription
action.namestringAlwaysThe name of the action being executed.

chat {model} is a CLIENT span for one LLM call, named {operation} {model}, such as chat gpt-4o-mini. The operation is chat because IORails issues chat completions. This span carries the token-level GenAI attributes; see Token-Level Attributes for the emission semantics.

The three identifier attributes are always set:

AttributeTypeDescription
gen_ai.operation.namestringThe operation, chat.
gen_ai.request.modelstringThe model name passed in the request.
gen_ai.provider.namestringThe provider, for example openai.

The response and usage attributes are read from the model response. Each is set only when its source value is present, so backends can distinguish an absent value from a real zero:

AttributeTypeWhen setDescription
gen_ai.response.modelstringReturned by providerThe model that produced the response.
gen_ai.response.idstringReturned by providerThe provider’s response identifier.
gen_ai.response.finish_reasonsstring[]Returned by providerThe finish reason, wrapped in a single-element list to match the spec’s array shape.
gen_ai.usage.input_tokensintProvider returns usageTokens in the prompt.
gen_ai.usage.output_tokensintProvider returns usageTokens in the completion.
gen_ai.usage.reasoning.output_tokensintProvider returns reasoning tokensReasoning tokens, for reasoning models that report them.

The request sampling parameters are read from the request kwargs. Each is set only when present on the request:

AttributeTypeWhen setDescription
gen_ai.request.temperaturedoubleSet on requestSampling temperature.
gen_ai.request.max_tokensintSet on requestMaximum tokens to generate. Both max_tokens and the max_completion_tokens alias map here.
gen_ai.request.top_pdoubleSet on requestNucleus sampling probability.
gen_ai.request.top_kintSet on requestTop-k sampling cutoff.
gen_ai.request.frequency_penaltydoubleSet on requestFrequency penalty.
gen_ai.request.presence_penaltydoubleSet on requestPresence penalty.
gen_ai.request.stop_sequencesstring[]Set on requestStop sequences. Read from stop or stop_sequences and normalized to a list; an empty value is skipped.
gen_ai.request.streambooleanStreaming requests onlySet to true for streaming calls; omitted on non-streaming calls, per the spec’s conditionally-required rule.

When content capture is enabled, the LLM span also records the prompt and completion. See Capturing Message Content.

api {name} is a CLIENT span for a non-LLM API call, such as a jailbreak-detection endpoint, named api {name}. Span kind CLIENT.

AttributeTypeWhen setDescription
api.namestringAlwaysThe name of the API being called.

These endpoints are plain HTTP services rather than GenAI operations, so the span uses api.name instead of the gen_ai.* attributes. HTTP transport attributes can be added later without conflict.

Token-Level Attributes

The response, usage, and request-parameter attributes on the LLM span are non-sensitive telemetry. Both engines record them whenever the span exists. There is no content-capture gate and no metrics gate. Enabling tracing is sufficient to get them.

How the values are sourced differs by engine:

  • LLMRails reconstructs them from the recorded LLM call after the request completes.
  • IORails reads them inline as the call runs. For non-streaming calls the values come off the model response. For streaming calls, the values are accumulated across chunks: the model and response ID arrive on early chunks, the finish reason and token usage on the terminal chunk, and the values are written to the span once the stream ends.

For streaming IORails responses, token usage is present only when the upstream provider returns a usage field, which commonly requires forwarding stream_options.include_usage=true. When usage is absent, the usage attributes are not set, which is deliberately distinct from recording zero tokens.

gen_ai.usage.reasoning.output_tokens is a span attribute, not a metric label. The gen_ai.client.token.usage metric’s required gen_ai.token.type label takes only input or output; reasoning tokens are exposed here on the span instead. See Metric Reference.

How LLMRails and IORails Tracing Differ

Both engines emit OpenTelemetry spans, and both populate the GenAI semantic-convention attributes on the LLM span. They differ in how the spans are produced and how tracing is configured.

LLMRailsIORails
Span productionSpans are reconstructed from the interaction log after the request completes and handed to a tracing adapter.Spans are created while the request executes and emitted directly to the OpenTelemetry API.
Configurationtracing.enabled: true plus tracing.adapters (for example OpenTelemetry) and span_format (opentelemetry or legacy).tracing.enabled: true. The adapters and span_format fields are ignored.
MetricsNot supported.Supported. See Metrics.
Non-LLM API spansNo equivalent span.Emits a dedicated api {name} span for non-LLM API calls such as jailbreak detection.

The tracing.adapters and tracing.span_format configuration fields apply only to LLMRails. IORails reads tracing.enabled and otherwise ignores them. There is no adapter to select because spans are emitted straight to the OpenTelemetry API.

LLMRails and IORails Attribute Differences

The emitted attributes are the same across both engines except for the following.

AttributeLLMRailsIORailsReason
gen_ai.usage.reasoning.output_tokensNoYesIORails records reasoning tokens when the provider returns them.
gen_ai.request.streamNoYesIORails marks streaming calls; LLMRails does not emit this attribute.
gen_ai.usage.total_tokensYesNoThis attribute was removed from the current GenAI spec. IORails does not emit it; LLMRails continues to for backward compatibility.
llm.cache.hitYesNoLLMRails has an LLM cache layer to report on. IORails does not.
span.kindYesNoLLMRails sets a span.kind attribute mirroring the OpenTelemetry span kind; IORails relies on the native span kind only.
action.has_llm_calls, action.llm_calls_count, action.param.{name}YesNoLLMRails sets these on the guardrails.action span; IORails sets only action.name.

All other attributes, including the identifiers (gen_ai.operation.name, gen_ai.request.model, gen_ai.provider.name), gen_ai.usage.input_tokens / output_tokens, the gen_ai.response.* attributes, and the gen_ai.request.* sampling parameters, are emitted by both engines. gen_ai.response.model is emitted by both, but LLMRails always sets it while IORails sets it only when the provider returns it.

Capturing Message Content

Prompt and completion content is gated and off by default, because it can contain sensitive data. Both engines gate it on tracing.enable_content_capture (default false).

LLMRails. When enabled, LLMRails records the prompt and completion as span events on the LLM span (currently the gen_ai.content.prompt and gen_ai.content.completion events), along with conversation events.

IORails. Enable with tracing.enable_content_capture: true or the OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT environment variable, which overrides the config value. When enabled, IORails records content on the request span (guardrails.request.input / guardrails.request.output), the rail spans (guardrails.rail.input, and guardrails.rail.reason on a block), and the LLM span. On the LLM span the format depends on the OTEL_SEMCONV_STABILITY_OPT_IN environment variable:

  • When it contains gen_ai_latest_experimental, content is written as the JSON span attributes gen_ai.input.messages and gen_ai.output.messages.
  • Otherwise, content is written as span events (gen_ai.user.message, gen_ai.assistant.message, gen_ai.system.message, and gen_ai.choice).

The OTEL_SEMCONV_STABILITY_OPT_IN selector controls the IORails content format and is independent of the tracing.span_format field, which selects the LLMRails adapter format.

Public API Stability

The span names and attribute names on this page are part of each engine’s observable contract, so dashboards and queries can reference them. Attribute names follow the OpenTelemetry GenAI semantic conventions, which are still under active development and can change as the spec matures. Pin your opentelemetry-sdk version and review release notes before upgrading.