Capturing Prompt and Response Content | NVIDIA NeMo Guardrails Library Developer Guide

By default, traces record metadata about each request, such as span timing, model and provider names, sampling parameters, token usage, and rail decisions. They do not record prompt or response content. Content capture is an opt-in feature that also records user, system, assistant, and tool message text on spans so you can inspect what the application sent to the model and what the model returned.

Experimental Feature

The tracing.enable_content_capture config flag works on both the IORails and LLMRails engines. The environment variable controls and inline span behavior described on this page are specific to the opt-in IORails engine. Enable IORails by constructing Guardrails(config, use_iorails=True) (the form used in this guide) or by setting NEMO_GUARDRAILS_IORAILS_ENGINE=1, which aliases the top-level LLMRails import to Guardrails. IORails is an early-release feature, and span names and attributes can change as the OpenTelemetry GenAI semantic conventions evolve. The legacy LLMRails engine supports a narrower form of content capture. For more information, refer to Differences on the LLMRails Engine.

Privacy

Captured content includes the full text of prompts and responses and can contain personally identifiable information (PII) or other sensitive data. Only enable content capture when you need it, and ensure your telemetry backend and its retention policy comply with your data-protection obligations. For more information, refer to Privacy Considerations.

What Gets Captured

When content capture is enabled, IORails records content on three types of spans:

Span (Kind)	Attribute or Event	When Recorded
`guardrails.request` (SERVER)	`guardrails.request.input`: JSON-encoded list of the caller’s input messages	Always, while capturing
`guardrails.request` (SERVER)	`guardrails.request.output`: the text actually returned to the caller	When output is produced (a blocked request records the refusal message; an empty stream records nothing)
`chat <model>` (CLIENT)	LLM input and output, in the selected format	Once per LLM call: the main generation call and every rail-action LLM call. On the streaming path, recorded when the stream ends; see Streaming
`guardrails.rail` (INTERNAL)	`guardrails.rail.input`: JSON-encoded `{"messages": ..., "bot_response": ...}` snapshot	Every rail execution, while capturing
`guardrails.rail` (INTERNAL)	`guardrails.rail.reason`: the human-readable block reason	Only when the rail blocks the request

The SERVER span and the CLIENT span deliberately use different attribute names. On a blocked request, the CLIENT span records the raw model response while the SERVER span records the refusal message the caller received. Distinct names help avoid confusing backends that correlate these values.

Sampling parameters (gen_ai.request.temperature, max_tokens, and so on) and token-usage attributes are not content, so IORails records them on the CLIENT span whenever tracing is enabled, regardless of the content-capture setting.

Enabling Content Capture

Content capture is active only when all of the following conditions are true:

The opentelemetry-api package is installed (pip install "nemoguardrails[tracing]").
Tracing is enabled (tracing.enabled: true).
Content capture is requested, either through the config field or the environment variable below.

The OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT environment variable is the primary control. It overrides the config field in both directions, so one OTEL-standard variable can control capture across services regardless of what each deployed config says.

Source	Value	Effect
`OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT`	`true` or `1`	Forces capture on, overriding the config field
`OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT`	`false` or `0`	Forces capture off, overriding the config field
`OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT`	unset or any other value	Falls through to the config field
`tracing.enable_content_capture`	`true`	On (when the environment variable is not decisive)
`tracing.enable_content_capture`	`false` (default)	Off

The library strips surrounding whitespace and matches environment variable values case-insensitively.

Quickstart

This example enables content capture, runs a single request, and prints the resulting spans to the console so you can review the captured content before you configure a production exporter.

Install the NVIDIA NeMo Guardrails library and the OpenTelemetry SDK.
```
$ pip install "nemoguardrails[tracing]" opentelemetry-sdk
```
The [tracing] extra installs opentelemetry-api, which is the only OpenTelemetry dependency the library itself takes.

Save the following to content_capture_example.py.

1 # content_capture_example.py
2 import asyncio
3 
4 from opentelemetry import trace
5 from opentelemetry.sdk.resources import Resource
6 from opentelemetry.sdk.trace import TracerProvider
7 from opentelemetry.sdk.trace.export import BatchSpanProcessor, ConsoleSpanExporter
8 
9 from nemoguardrails import Guardrails, RailsConfig
10 
11 # Configure the OpenTelemetry TracerProvider BEFORE constructing Guardrails so
12 # the engine resolves a real tracer when it creates spans.
13 resource = Resource.create({"service.name": "guardrails-content-capture"})
14 provider = TracerProvider(resource=resource)
15 provider.add_span_processor(BatchSpanProcessor(ConsoleSpanExporter()))
16 trace.set_tracer_provider(provider)
17 
18 # Tracing must be enabled for content capture to take effect. The inline
19 # IORails path does not use the `adapters` list (that is the LLMRails
20 # post-hoc path); it exports through the TracerProvider configured above.
21 config_yaml = """
22 models:
23   - type: main
24     engine: openai
25     model: gpt-4o-mini
26 
27 tracing:
28   enabled: true
29   enable_content_capture: true
30 """
31 
32 config = RailsConfig.from_content(yaml_content=config_yaml)
33 
34 async def main() -> None:
35     # use_iorails=True selects the IORails engine. require_iorails=True raises
36     # a ValueError if the config is incompatible with IORails.
37     async with Guardrails(config, use_iorails=True, require_iorails=True) as rails:
38         response = await rails.generate_async(
39             messages=[{"role": "user", "content": "Hello!"}],
40         )
41         print(f"Response: {response}")
42 
43 try:
44     asyncio.run(main())
45 finally:
46     # Flush buffered spans to the console exporter before the process exits.
47     provider.shutdown()

Run the script.

$ python content_capture_example.py

The chat gpt-4o-mini CLIENT span carries the captured prompt and response as span events (the default format), and the guardrails.request SERVER span carries the caller-facing input and output. The following output is trimmed for clarity:

1 {
2   "name": "chat gpt-4o-mini",
3   "kind": "SpanKind.CLIENT",
4   "attributes": {
5     "gen_ai.operation.name": "chat",
6     "gen_ai.request.model": "gpt-4o-mini"
7   },
8   "events": [
9     {"name": "gen_ai.user.message", "attributes": {"role": "user", "content": "Hello!"}},
10     {"name": "gen_ai.choice", "attributes": {"index": 0, "message.role": "assistant", "message.content": "Hello! How can I help you today?"}}
11   ]
12 }
13 {
14   "name": "guardrails.request",
15   "kind": "SpanKind.SERVER",
16   "attributes": {
17     "gen_ai.operation.name": "guardrails",
18     "guardrails.request.input": "[{\"role\": \"user\", \"content\": \"Hello!\"}]",
19     "guardrails.request.output": "Hello! How can I help you today?"
20   }
21 }

The host application is responsible for configuring a TracerProvider. If you enable tracing and content capture but do not set a TracerProvider, the OpenTelemetry API returns a no-op tracer and silently discards every span and its captured content. Always set the TracerProvider before constructing Guardrails.

Output Format

The OTEL_SEMCONV_STABILITY_OPT_IN environment variable selects how LLM-call content is encoded on the CLIENT span. It holds a comma-separated list of opt-in tokens, and the library reads it on each call so changes take effect immediately.

`OTEL_SEMCONV_STABILITY_OPT_IN`	Form	What Is Emitted
Contains `gen_ai_latest_experimental`	JSON span attributes	`gen_ai.input.messages`, `gen_ai.output.messages`, `gen_ai.system_instructions` (JSON-encoded)
Token absent (default)	Legacy span events	`gen_ai.system.message`, `gen_ai.user.message`, `gen_ai.assistant.message`, `gen_ai.tool.message`, and `gen_ai.choice`

This selector applies only to the gen_ai.* CLIENT spans. The guardrails.request.* and guardrails.rail.* attributes are always JSON-encoded regardless of the selector, because no GenAI semantic convention covers them.

This selector is distinct from the tracing.span_format config field. span_format chooses the span structure produced by the LLMRails post-hoc tracing adapter; OTEL_SEMCONV_STABILITY_OPT_IN chooses the encoding of captured content on the inline IORails CLIENT spans.

To emit the JSON-attribute form instead of the default events, set the variable before you run the application:

$ export OTEL_SEMCONV_STABILITY_OPT_IN=gen_ai_latest_experimental

When you set this variable, the chat gpt-4o-mini span carries JSON-encoded attributes instead of events:

1 {
2   "name": "chat gpt-4o-mini",
3   "kind": "SpanKind.CLIENT",
4   "attributes": {
5     "gen_ai.request.model": "gpt-4o-mini",
6     "gen_ai.input.messages": "[{\"role\": \"user\", \"parts\": [{\"type\": \"text\", \"content\": \"Hello!\"}]}]",
7     "gen_ai.output.messages": "[{\"role\": \"assistant\", \"parts\": [{\"type\": \"text\", \"content\": \"Hello! How can I help you today?\"}]}]"
8   }
9 }

The library captures system messages separately as gen_ai.system_instructions, a flat list of parts with no role wrapper, per the GenAI specification. The library sets attributes only when they are non-empty, so a backend can distinguish “no system instructions” from an empty string.

Streaming

On the streaming path, both the guardrails.request SERVER span and the chat <model> CLIENT span accumulate streamed chunks and record content once, when the stream ends, rather than per chunk. They differ in what they record and how they behave when the stream exits abnormally.

guardrails.request SERVER span. Accumulates the chunks the consumer actually receives and records them as guardrails.request.output, so the captured value is exactly what reached the caller, including any output-rail error payload injected on a block. Capture runs in a finally block, so it executes even when the stream exits early through a provider error or consumer cancellation, and an interrupted stream still records whatever was delivered.

chat <model> CLIENT span. Accumulates the model’s response deltas and records the LLM input plus the accumulated response in the selected format. The CLIENT span captures content only when the stream completes naturally. If the consumer cancels the stream or the provider raises an error, IORails intentionally does not record the partial model output on the CLIENT span.

In both cases, if no content was produced, IORails omits the output rather than recording an empty string.

Privacy Considerations

Content capture is off by default. Prompts and responses are never written to spans unless you explicitly enable it.
Captured content can include PII or other sensitive data. Treat your tracing backend as a system that stores user content, and apply the same access controls and retention limits you would apply to any store of conversation data.
Use OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=false to force capture off across every service from the environment, regardless of what individual configs request. This is useful as a deployment-wide guardrail for regulated environments.
The OpenTelemetry GenAI semantic conventions recommend against capturing message content by default because of these privacy risks.

Differences on the LLMRails Engine

The legacy LLMRails engine also honors tracing.enable_content_capture, but through a different mechanism with a narrower contract. Instead of instrumenting live spans, it captures content after the request completes, when its tracing adapter extracts spans from the interaction log.

Dimension	IORails (this page)	LLMRails
Mechanism	Inline live spans during execution	Post-hoc extraction from the interaction log via the tracing adapter
Enable control	`enable_content_capture` and the `OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT` environment variable	`enable_content_capture` config field only
Format selector	`OTEL_SEMCONV_STABILITY_OPT_IN` (JSON attributes or current legacy events)	None (one fixed form)
LLM content form	`gen_ai.input.messages` / `gen_ai.output.messages` / `gen_ai.system_instructions`, or the current `gen_ai.user.message` / `gen_ai.assistant.message` / `gen_ai.system.message` / `gen_ai.choice` events	Deprecated `gen_ai.content.prompt` and `gen_ai.content.completion` events
Request and rail content	`guardrails.request.input` / `.output`, `guardrails.rail.input` / `.reason`	No equivalent; instead emits guardrails-internal `guardrails.user_message` and `guardrails.utterance.*` events
Streaming	Accumulates and records the delivered text at stream end	Not applicable (post-hoc from the log)

Neither OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT nor OTEL_SEMCONV_STABILITY_OPT_IN is consulted on the LLMRails path; the config field is the only control there.

Tracing Configuration: The tracing config schema, including the enable_content_capture field.
Quick Start: Minimal setup to enable tracing.
OpenTelemetry: Production-ready exporter setup.