Capturing Prompt and Response Content
By default, traces record metadata about each request, such as span timing, model and provider names, sampling parameters, token usage, and rail decisions. They do not record prompt or response content. Content capture is an opt-in feature that also records user, system, assistant, and tool message text on spans so you can inspect what the application sent to the model and what the model returned.
Experimental Feature
The tracing.enable_content_capture config flag works on both the IORails and LLMRails engines.
The environment variable controls and inline span behavior described on this page are specific to the opt-in IORails engine.
Enable IORails by constructing Guardrails(config, use_iorails=True) (the form used in this guide) or by setting NEMO_GUARDRAILS_IORAILS_ENGINE=1, which aliases the top-level LLMRails import to Guardrails.
IORails is an early-release feature, and span names and attributes can change as the OpenTelemetry GenAI semantic conventions evolve.
The legacy LLMRails engine supports a narrower form of content capture. For more information, refer to Differences on the LLMRails Engine.
Privacy
Captured content includes the full text of prompts and responses and can contain personally identifiable information (PII) or other sensitive data. Only enable content capture when you need it, and ensure your telemetry backend and its retention policy comply with your data-protection obligations. For more information, refer to Privacy Considerations.
What Gets Captured
When content capture is enabled, IORails records content on three types of spans:
The SERVER span and the CLIENT span deliberately use different attribute names. On a blocked request, the CLIENT span records the raw model response while the SERVER span records the refusal message the caller received. Distinct names help avoid confusing backends that correlate these values.
Sampling parameters (gen_ai.request.temperature, max_tokens, and so on) and token-usage attributes are not content, so IORails records them on the CLIENT span whenever tracing is enabled, regardless of the content-capture setting.
Enabling Content Capture
Content capture is active only when all of the following conditions are true:
- The
opentelemetry-apipackage is installed (pip install "nemoguardrails[tracing]"). - Tracing is enabled (
tracing.enabled: true). - Content capture is requested, either through the config field or the environment variable below.
The OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT environment variable is the primary control. It overrides the config field in both directions, so one OTEL-standard variable can control capture across services regardless of what each deployed config says.
The library strips surrounding whitespace and matches environment variable values case-insensitively.
Quickstart
This example enables content capture, runs a single request, and prints the resulting spans to the console so you can review the captured content before you configure a production exporter.
-
Install the NVIDIA NeMo Guardrails library and the OpenTelemetry SDK.
The
[tracing]extra installsopentelemetry-api, which is the only OpenTelemetry dependency the library itself takes. -
Save the following to
content_capture_example.py. -
Run the script.
The
chat gpt-4o-miniCLIENT span carries the captured prompt and response as span events (the default format), and theguardrails.requestSERVER span carries the caller-facing input and output. The following output is trimmed for clarity:
The host application is responsible for configuring a TracerProvider.
If you enable tracing and content capture but do not set a TracerProvider, the OpenTelemetry API returns a no-op tracer and silently discards every span and its captured content.
Always set the TracerProvider before constructing Guardrails.
Output Format
The OTEL_SEMCONV_STABILITY_OPT_IN environment variable selects how LLM-call content is encoded on the CLIENT span.
It holds a comma-separated list of opt-in tokens, and the library reads it on each call so changes take effect immediately.
This selector applies only to the gen_ai.* CLIENT spans.
The guardrails.request.* and guardrails.rail.* attributes are always JSON-encoded regardless of the selector, because no GenAI semantic convention covers them.
This selector is distinct from the tracing.span_format config field.
span_format chooses the span structure produced by the LLMRails post-hoc tracing adapter; OTEL_SEMCONV_STABILITY_OPT_IN chooses the encoding of captured content on the inline IORails CLIENT spans.
To emit the JSON-attribute form instead of the default events, set the variable before you run the application:
When you set this variable, the chat gpt-4o-mini span carries JSON-encoded attributes instead of events:
The library captures system messages separately as gen_ai.system_instructions, a flat list of parts with no role wrapper, per the GenAI specification.
The library sets attributes only when they are non-empty, so a backend can distinguish “no system instructions” from an empty string.
Streaming
On the streaming path, both the guardrails.request SERVER span and the chat <model> CLIENT span accumulate streamed chunks and record content once, when the stream ends, rather than per chunk.
They differ in what they record and how they behave when the stream exits abnormally.
guardrails.request SERVER span. Accumulates the chunks the consumer actually receives and records them as guardrails.request.output, so the captured value is exactly what reached the caller, including any output-rail error payload injected on a block.
Capture runs in a finally block, so it executes even when the stream exits early through a provider error or consumer cancellation, and an interrupted stream still records whatever was delivered.
chat <model> CLIENT span. Accumulates the model’s response deltas and records the LLM input plus the accumulated response in the selected format.
The CLIENT span captures content only when the stream completes naturally. If the consumer cancels the stream or the provider raises an error, IORails intentionally does not record the partial model output on the CLIENT span.
In both cases, if no content was produced, IORails omits the output rather than recording an empty string.
Privacy Considerations
- Content capture is off by default. Prompts and responses are never written to spans unless you explicitly enable it.
- Captured content can include PII or other sensitive data. Treat your tracing backend as a system that stores user content, and apply the same access controls and retention limits you would apply to any store of conversation data.
- Use
OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=falseto force capture off across every service from the environment, regardless of what individual configs request. This is useful as a deployment-wide guardrail for regulated environments. - The OpenTelemetry GenAI semantic conventions recommend against capturing message content by default because of these privacy risks.
Differences on the LLMRails Engine
The legacy LLMRails engine also honors tracing.enable_content_capture, but through a different mechanism with a narrower contract.
Instead of instrumenting live spans, it captures content after the request completes, when its tracing adapter extracts spans from the interaction log.
Neither OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT nor OTEL_SEMCONV_STABILITY_OPT_IN is consulted on the LLMRails path; the config field is the only control there.
Related Topics
- Tracing Configuration: The
tracingconfig schema, including theenable_content_capturefield. - Quick Start: Minimal setup to enable tracing.
- OpenTelemetry: Production-ready exporter setup.