Capturing Prompt and Response Content

View as Markdown

By default, traces record metadata about each request, such as span timing, model and provider names, sampling parameters, token usage, and rail decisions. They do not record prompt or response content. Content capture is an opt-in feature that also records user, system, assistant, and tool message text on spans so you can inspect what the application sent to the model and what the model returned.

Experimental Feature

The tracing.enable_content_capture config flag works on both the IORails and LLMRails engines. The environment variable controls and inline span behavior described on this page are specific to the opt-in IORails engine. Enable IORails by constructing Guardrails(config, use_iorails=True) (the form used in this guide) or by setting NEMO_GUARDRAILS_IORAILS_ENGINE=1, which aliases the top-level LLMRails import to Guardrails. IORails is an early-release feature, and span names and attributes can change as the OpenTelemetry GenAI semantic conventions evolve. The legacy LLMRails engine supports a narrower form of content capture. For more information, refer to Differences on the LLMRails Engine.

Privacy

Captured content includes the full text of prompts and responses and can contain personally identifiable information (PII) or other sensitive data. Only enable content capture when you need it, and ensure your telemetry backend and its retention policy comply with your data-protection obligations. For more information, refer to Privacy Considerations.

What Gets Captured

When content capture is enabled, IORails records content on three types of spans:

Span (Kind)Attribute or EventWhen Recorded
guardrails.request (SERVER)guardrails.request.input: JSON-encoded list of the caller’s input messagesAlways, while capturing
guardrails.request (SERVER)guardrails.request.output: the text actually returned to the callerWhen output is produced (a blocked request records the refusal message; an empty stream records nothing)
chat <model> (CLIENT)LLM input and output, in the selected formatOnce per LLM call: the main generation call and every rail-action LLM call. On the streaming path, recorded when the stream ends; see Streaming
guardrails.rail (INTERNAL)guardrails.rail.input: JSON-encoded {"messages": ..., "bot_response": ...} snapshotEvery rail execution, while capturing
guardrails.rail (INTERNAL)guardrails.rail.reason: the human-readable block reasonOnly when the rail blocks the request

The SERVER span and the CLIENT span deliberately use different attribute names. On a blocked request, the CLIENT span records the raw model response while the SERVER span records the refusal message the caller received. Distinct names help avoid confusing backends that correlate these values.

Sampling parameters (gen_ai.request.temperature, max_tokens, and so on) and token-usage attributes are not content, so IORails records them on the CLIENT span whenever tracing is enabled, regardless of the content-capture setting.

Enabling Content Capture

Content capture is active only when all of the following conditions are true:

  1. The opentelemetry-api package is installed (pip install "nemoguardrails[tracing]").
  2. Tracing is enabled (tracing.enabled: true).
  3. Content capture is requested, either through the config field or the environment variable below.

The OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT environment variable is the primary control. It overrides the config field in both directions, so one OTEL-standard variable can control capture across services regardless of what each deployed config says.

SourceValueEffect
OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENTtrue or 1Forces capture on, overriding the config field
OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENTfalse or 0Forces capture off, overriding the config field
OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENTunset or any other valueFalls through to the config field
tracing.enable_content_capturetrueOn (when the environment variable is not decisive)
tracing.enable_content_capturefalse (default)Off

The library strips surrounding whitespace and matches environment variable values case-insensitively.

Quickstart

This example enables content capture, runs a single request, and prints the resulting spans to the console so you can review the captured content before you configure a production exporter.

  1. Install the NVIDIA NeMo Guardrails library and the OpenTelemetry SDK.

    $pip install "nemoguardrails[tracing]" opentelemetry-sdk

    The [tracing] extra installs opentelemetry-api, which is the only OpenTelemetry dependency the library itself takes.

  2. Save the following to content_capture_example.py.

    1# content_capture_example.py
    2import asyncio
    3
    4from opentelemetry import trace
    5from opentelemetry.sdk.resources import Resource
    6from opentelemetry.sdk.trace import TracerProvider
    7from opentelemetry.sdk.trace.export import BatchSpanProcessor, ConsoleSpanExporter
    8
    9from nemoguardrails import Guardrails, RailsConfig
    10
    11# Configure the OpenTelemetry TracerProvider BEFORE constructing Guardrails so
    12# the engine resolves a real tracer when it creates spans.
    13resource = Resource.create({"service.name": "guardrails-content-capture"})
    14provider = TracerProvider(resource=resource)
    15provider.add_span_processor(BatchSpanProcessor(ConsoleSpanExporter()))
    16trace.set_tracer_provider(provider)
    17
    18# Tracing must be enabled for content capture to take effect. The inline
    19# IORails path does not use the `adapters` list (that is the LLMRails
    20# post-hoc path); it exports through the TracerProvider configured above.
    21config_yaml = """
    22models:
    23 - type: main
    24 engine: openai
    25 model: gpt-4o-mini
    26
    27tracing:
    28 enabled: true
    29 enable_content_capture: true
    30"""
    31
    32config = RailsConfig.from_content(yaml_content=config_yaml)
    33
    34async def main() -> None:
    35 # use_iorails=True selects the IORails engine. require_iorails=True raises
    36 # a ValueError if the config is incompatible with IORails.
    37 async with Guardrails(config, use_iorails=True, require_iorails=True) as rails:
    38 response = await rails.generate_async(
    39 messages=[{"role": "user", "content": "Hello!"}],
    40 )
    41 print(f"Response: {response}")
    42
    43try:
    44 asyncio.run(main())
    45finally:
    46 # Flush buffered spans to the console exporter before the process exits.
    47 provider.shutdown()
  3. Run the script.

    $python content_capture_example.py

    The chat gpt-4o-mini CLIENT span carries the captured prompt and response as span events (the default format), and the guardrails.request SERVER span carries the caller-facing input and output. The following output is trimmed for clarity:

    1{
    2 "name": "chat gpt-4o-mini",
    3 "kind": "SpanKind.CLIENT",
    4 "attributes": {
    5 "gen_ai.operation.name": "chat",
    6 "gen_ai.request.model": "gpt-4o-mini"
    7 },
    8 "events": [
    9 {"name": "gen_ai.user.message", "attributes": {"role": "user", "content": "Hello!"}},
    10 {"name": "gen_ai.choice", "attributes": {"index": 0, "message.role": "assistant", "message.content": "Hello! How can I help you today?"}}
    11 ]
    12}
    13{
    14 "name": "guardrails.request",
    15 "kind": "SpanKind.SERVER",
    16 "attributes": {
    17 "gen_ai.operation.name": "guardrails",
    18 "guardrails.request.input": "[{\"role\": \"user\", \"content\": \"Hello!\"}]",
    19 "guardrails.request.output": "Hello! How can I help you today?"
    20 }
    21}

The host application is responsible for configuring a TracerProvider. If you enable tracing and content capture but do not set a TracerProvider, the OpenTelemetry API returns a no-op tracer and silently discards every span and its captured content. Always set the TracerProvider before constructing Guardrails.

Output Format

The OTEL_SEMCONV_STABILITY_OPT_IN environment variable selects how LLM-call content is encoded on the CLIENT span. It holds a comma-separated list of opt-in tokens, and the library reads it on each call so changes take effect immediately.

OTEL_SEMCONV_STABILITY_OPT_INFormWhat Is Emitted
Contains gen_ai_latest_experimentalJSON span attributesgen_ai.input.messages, gen_ai.output.messages, gen_ai.system_instructions (JSON-encoded)
Token absent (default)Legacy span eventsgen_ai.system.message, gen_ai.user.message, gen_ai.assistant.message, gen_ai.tool.message, and gen_ai.choice

This selector applies only to the gen_ai.* CLIENT spans. The guardrails.request.* and guardrails.rail.* attributes are always JSON-encoded regardless of the selector, because no GenAI semantic convention covers them.

This selector is distinct from the tracing.span_format config field. span_format chooses the span structure produced by the LLMRails post-hoc tracing adapter; OTEL_SEMCONV_STABILITY_OPT_IN chooses the encoding of captured content on the inline IORails CLIENT spans.

To emit the JSON-attribute form instead of the default events, set the variable before you run the application:

$export OTEL_SEMCONV_STABILITY_OPT_IN=gen_ai_latest_experimental

When you set this variable, the chat gpt-4o-mini span carries JSON-encoded attributes instead of events:

1{
2 "name": "chat gpt-4o-mini",
3 "kind": "SpanKind.CLIENT",
4 "attributes": {
5 "gen_ai.request.model": "gpt-4o-mini",
6 "gen_ai.input.messages": "[{\"role\": \"user\", \"parts\": [{\"type\": \"text\", \"content\": \"Hello!\"}]}]",
7 "gen_ai.output.messages": "[{\"role\": \"assistant\", \"parts\": [{\"type\": \"text\", \"content\": \"Hello! How can I help you today?\"}]}]"
8 }
9}

The library captures system messages separately as gen_ai.system_instructions, a flat list of parts with no role wrapper, per the GenAI specification. The library sets attributes only when they are non-empty, so a backend can distinguish “no system instructions” from an empty string.

Streaming

On the streaming path, both the guardrails.request SERVER span and the chat <model> CLIENT span accumulate streamed chunks and record content once, when the stream ends, rather than per chunk. They differ in what they record and how they behave when the stream exits abnormally.

guardrails.request SERVER span. Accumulates the chunks the consumer actually receives and records them as guardrails.request.output, so the captured value is exactly what reached the caller, including any output-rail error payload injected on a block. Capture runs in a finally block, so it executes even when the stream exits early through a provider error or consumer cancellation, and an interrupted stream still records whatever was delivered.

chat <model> CLIENT span. Accumulates the model’s response deltas and records the LLM input plus the accumulated response in the selected format. The CLIENT span captures content only when the stream completes naturally. If the consumer cancels the stream or the provider raises an error, IORails intentionally does not record the partial model output on the CLIENT span.

In both cases, if no content was produced, IORails omits the output rather than recording an empty string.

Privacy Considerations

  • Content capture is off by default. Prompts and responses are never written to spans unless you explicitly enable it.
  • Captured content can include PII or other sensitive data. Treat your tracing backend as a system that stores user content, and apply the same access controls and retention limits you would apply to any store of conversation data.
  • Use OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=false to force capture off across every service from the environment, regardless of what individual configs request. This is useful as a deployment-wide guardrail for regulated environments.
  • The OpenTelemetry GenAI semantic conventions recommend against capturing message content by default because of these privacy risks.

Differences on the LLMRails Engine

The legacy LLMRails engine also honors tracing.enable_content_capture, but through a different mechanism with a narrower contract. Instead of instrumenting live spans, it captures content after the request completes, when its tracing adapter extracts spans from the interaction log.

DimensionIORails (this page)LLMRails
MechanismInline live spans during executionPost-hoc extraction from the interaction log via the tracing adapter
Enable controlenable_content_capture and the OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT environment variableenable_content_capture config field only
Format selectorOTEL_SEMCONV_STABILITY_OPT_IN (JSON attributes or current legacy events)None (one fixed form)
LLM content formgen_ai.input.messages / gen_ai.output.messages / gen_ai.system_instructions, or the current gen_ai.user.message / gen_ai.assistant.message / gen_ai.system.message / gen_ai.choice eventsDeprecated gen_ai.content.prompt and gen_ai.content.completion events
Request and rail contentguardrails.request.input / .output, guardrails.rail.input / .reasonNo equivalent; instead emits guardrails-internal guardrails.user_message and guardrails.utterance.* events
StreamingAccumulates and records the delivered text at stream endNot applicable (post-hoc from the log)

Neither OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT nor OTEL_SEMCONV_STABILITY_OPT_IN is consulted on the LLMRails path; the config field is the only control there.