nemoguardrails.tracing.constants
OpenTelemetry constants, semantic conventions, and engine-agnostic GenAI client-side metric instruments for NeMo Guardrails.
The OTEL GenAI client-side metric helpers (LLMInstruments,
record_token_usage, llm_operation_duration,
record_time_to_first_chunk, record_time_per_output_chunk) live
here next to the metric-name and attribute constants they emit. They
are engine-agnostic — any caller that issues an LLM call can use them
to satisfy the OTEL GenAI semantic conventions.
Module Contents
Classes
Functions
Data
API
Common OpenTelemetry attributes used across spans.
Standard event names for OpenTelemetry GenAI semantic conventions.
Based on official spec at: https://github.com/open-telemetry/semantic-conventions/blob/main/model/gen-ai/events.yaml
GenAI semantic convention attributes following the draft specification.
Note: These are based on the experimental OpenTelemetry GenAI semantic conventions since they are not yet available in the stable semantic conventions package.
See: https://opentelemetry.io/docs/specs/semconv/gen-ai/
NeMo Guardrails-specific attributes for spans.
NeMo Guardrails-specific event names (not OTel GenAI conventions).
These events represent internal guardrails state changes, not LLM API calls. They use a guardrails-specific namespace to avoid confusion with OTel GenAI semantic conventions.
NeMo Guardrails internal event type constants.
These are the type values from internal guardrails events.
LLM-call-scope OTEL instruments for downstream model calls.
These metrics fire once per LLM call (not once per IORails request)
and follow the OTEL GenAI semantic conventions exactly — the field
names mirror the metric names with the gen_ai.client. prefix
stripped, and both are Histograms (per spec).
token_usage—gen_ai.client.token.usageHistogram, unit{token}. Records input and output tokens as separate observations distinguished by the requiredgen_ai.token.typelabel (inputoroutput).operation_duration—gen_ai.client.operation.durationHistogram, units. Records the wall-clock time of each LLM call from request issue to response completion.time_to_first_chunk—gen_ai.client.operation.time_to_first_chunkHistogram, units. Streaming-only. Time from request issue to the first content-bearing chunk yielded.time_per_output_chunk—gen_ai.client.operation.time_per_output_chunkHistogram, units. Streaming-only. Inter-chunk gap; one observation per content-bearing chunk after the first.
OTEL metric names emitted by the IORails engine.
These names are part of the library’s public API — customers point dashboards and alerts at them. Tests deliberately assert on the raw strings rather than these constants so the assertions verify the wire contract instead of re-referencing the same symbol the production code uses.
Standard operation names for GenAI semantic conventions.
Note: This only defines standard LLM operations. Custom actions and tasks should be passed through as-is since they are dynamic and user-defined.
OTEL environment-variable names and tokens for content-capture gating.
Two independent OTEL-standard env vars control content capture:
OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT— fallback enable switch whenconfig.tracing.enable_content_captureis unset/False. Truthy values ("true","1") turn capture on.OTEL_SEMCONV_STABILITY_OPT_IN— comma-separated stability opt-in list. When"gen_ai_latest_experimental"is present, content is emitted as new-form span attributes (gen_ai.input.messagesetc.); otherwise as legacy span events (gen_ai.user.messageetc.).
String constants for span kinds.
Patterns used for identifying span types from span names.
Standard span names following OpenTelemetry GenAI semantic conventions.
Based on: https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-spans/
IMPORTANT: Span names must be low cardinality to avoid performance issues. Variable/high cardinality data (like specific rail types, model names, etc.) should go in attributes instead of the span name.
Internal span type identifiers used in span mapping.
These are internal identifiers used to categorize spans before mapping to actual span names. They represent the type of operation being traced.
Note: ‘llm_call’ maps to various GenAI semantic convention span types like inference (gen_ai.inference.client), embeddings, etc.
System-level constants for NeMo Guardrails.
Allowed values for the gen_ai.token.type metric label.
Per OTEL GenAI semconv, only input and output are valid.
Reasoning and cached tokens are exposed as span attributes
(gen_ai.usage.reasoning.output_tokens etc.), not as additional
token.type values on the gen_ai.client.token.usage metric.
Lazily create the LLM-call-scope instruments and return them as
an :class:LLMInstruments. Returns None when the OTEL API is
not installed.
Bucket boundaries on every histogram are exact matches to the OTEL
GenAI semantic-conventions spec recommendations:
https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-metrics/
See :data:_LLM_DURATION_BUCKETS and :data:_LLM_TOKEN_BUCKETS
above.
Return the standard OTEL GenAI label set shared by every
gen_ai.client.* Histogram emission.
These three are the lowest-cardinality labels the spec mandates as
Required (operation.name, provider.name) or Conditionally
Required (request.model). Per-metric labels (token.type,
error.type) are added by individual emission helpers.
Context manager that records the wrapped block’s wall-clock
duration into gen_ai.client.operation.duration.
On exception, adds the error.type label (per spec, conditionally
required on the duration metric only — token usage carries no
error.type even on failed calls) and re-raises. No-op when the
OTEL API is unavailable.
Emit a gen_ai.client.operation.time_per_output_chunk observation.
Records the inter-chunk interval for one content-bearing chunk after the first. Each chunk produces one observation; aggregates show p50/p95/p99 for chunk-arrival pacing across the stream.
Caller is responsible for skipping the first chunk (covered by
record_time_to_first_chunk instead) and for skipping
non-content frames (terminal usage chunk, role-only frames) that
would skew the distribution.
No-op when the OTEL API is unavailable.
Emit a gen_ai.client.operation.time_to_first_chunk observation.
Records the elapsed seconds from request issue to the first content-bearing chunk yielded by the streaming response. Caller is responsible for the timing — this helper just records the value onto the histogram with the standard label set.
Per OTEL semconv, “first chunk” is the first chunk carrying actual output (content or reasoning delta) — not the role-only or other cosmetic SSE frames that don’t carry data.
No-op when the OTEL API is unavailable.
Emit two gen_ai.client.token.usage observations (one input,
one output) for a completed LLM call.
Per spec only input and output are valid
gen_ai.token.type values — reasoning and cached tokens are
span-only attributes, not metric labels.
No-op when usage is None (the upstream provider didn’t
return a usage field — common for streaming when
stream_options.include_usage is suppressed) or the OTEL API is
unavailable. Skipping emission rather than recording zeros keeps
the histogram honest: “no observation” is distinct from “0 tokens”.