nemoguardrails.tracing.constants

View as Markdown

OpenTelemetry constants, semantic conventions, and engine-agnostic GenAI client-side metric instruments for NeMo Guardrails.

The OTEL GenAI client-side metric helpers (LLMInstruments, record_token_usage, llm_operation_duration, record_time_to_first_chunk, record_time_per_output_chunk) live here next to the metric-name and attribute constants they emit. They are engine-agnostic — any caller that issues an LLM call can use them to satisfy the OTEL GenAI semantic conventions.

Module Contents

Classes

NameDescription
CommonAttributesCommon OpenTelemetry attributes used across spans.
EventNamesStandard event names for OpenTelemetry GenAI semantic conventions.
GenAIAttributesGenAI semantic convention attributes following the draft specification.
GuardrailsAttributesNeMo Guardrails-specific attributes for spans.
GuardrailsEventNamesNeMo Guardrails-specific event names (not OTel GenAI conventions).
GuardrailsEventTypesNeMo Guardrails internal event type constants.
LLMInstrumentsLLM-call-scope OTEL instruments for downstream model calls.
MetricNamesOTEL metric names emitted by the IORails engine.
OperationNamesStandard operation names for GenAI semantic conventions.
OtelContentCaptureOTEL environment-variable names and tokens for content-capture gating.
SpanKindString constants for span kinds.
SpanNamePatternsPatterns used for identifying span types from span names.
SpanNamesStandard span names following OpenTelemetry GenAI semantic conventions.
SpanTypesInternal span type identifiers used in span mapping.
SystemConstantsSystem-level constants for NeMo Guardrails.
TokenTypeAllowed values for the gen_ai.token.type metric label.

Functions

NameDescription
_ensure_llm_instrumentsLazily create the LLM-call-scope instruments and return them as
_llm_call_attributesReturn the standard OTEL GenAI label set shared by every
llm_operation_durationContext manager that records the wrapped block’s wall-clock
record_time_per_output_chunkEmit a gen_ai.client.operation.time_per_output_chunk observation.
record_time_to_first_chunkEmit a gen_ai.client.operation.time_to_first_chunk observation.
record_token_usageEmit two gen_ai.client.token.usage observations (one input,

Data

_LLM_DURATION_BUCKETS

_LLM_TOKEN_BUCKETS

_llm_instruments

API

class nemoguardrails.tracing.constants.CommonAttributes()

Common OpenTelemetry attributes used across spans.

SPAN_KIND
= 'span.kind'
class nemoguardrails.tracing.constants.EventNames()

Standard event names for OpenTelemetry GenAI semantic conventions.

Based on official spec at: https://github.com/open-telemetry/semantic-conventions/blob/main/model/gen-ai/events.yaml

GEN_AI_ASSISTANT_MESSAGE
= 'gen_ai.assistant.message'
GEN_AI_CHOICE
= 'gen_ai.choice'
GEN_AI_CONTENT_COMPLETION
= 'gen_ai.content.completion'
GEN_AI_CONTENT_PROMPT
= 'gen_ai.content.prompt'
GEN_AI_SYSTEM_MESSAGE
= 'gen_ai.system.message'
GEN_AI_TOOL_MESSAGE
= 'gen_ai.tool.message'
GEN_AI_USER_MESSAGE
= 'gen_ai.user.message'
class nemoguardrails.tracing.constants.GenAIAttributes()

GenAI semantic convention attributes following the draft specification.

Note: These are based on the experimental OpenTelemetry GenAI semantic conventions since they are not yet available in the stable semantic conventions package.

See: https://opentelemetry.io/docs/specs/semconv/gen-ai/

GEN_AI_INPUT_MESSAGES
= 'gen_ai.input.messages'
GEN_AI_OPERATION_NAME
= 'gen_ai.operation.name'
GEN_AI_OUTPUT_MESSAGES
= 'gen_ai.output.messages'
GEN_AI_PROVIDER_NAME
= 'gen_ai.provider.name'
GEN_AI_REQUEST_FREQUENCY_PENALTY
= 'gen_ai.request.frequency_penalty'
GEN_AI_REQUEST_MAX_TOKENS
= 'gen_ai.request.max_tokens'
GEN_AI_REQUEST_MODEL
= 'gen_ai.request.model'
GEN_AI_REQUEST_PRESENCE_PENALTY
= 'gen_ai.request.presence_penalty'
GEN_AI_REQUEST_STOP_SEQUENCES
= 'gen_ai.request.stop_sequences'
GEN_AI_REQUEST_TEMPERATURE
= 'gen_ai.request.temperature'
GEN_AI_REQUEST_TOP_K
= 'gen_ai.request.top_k'
GEN_AI_REQUEST_TOP_P
= 'gen_ai.request.top_p'
GEN_AI_RESPONSE_FINISH_REASONS
= 'gen_ai.response.finish_reasons'
GEN_AI_RESPONSE_ID
= 'gen_ai.response.id'
GEN_AI_RESPONSE_MODEL
= 'gen_ai.response.model'
GEN_AI_SYSTEM
= 'gen_ai.system'
GEN_AI_SYSTEM_INSTRUCTIONS
= 'gen_ai.system_instructions'
GEN_AI_TOKEN_TYPE
= 'gen_ai.token.type'
GEN_AI_USAGE_INPUT_TOKENS
= 'gen_ai.usage.input_tokens'
GEN_AI_USAGE_OUTPUT_TOKENS
= 'gen_ai.usage.output_tokens'
GEN_AI_USAGE_TOTAL_TOKENS
= 'gen_ai.usage.total_tokens'
class nemoguardrails.tracing.constants.GuardrailsAttributes()

NeMo Guardrails-specific attributes for spans.

ACTION_HAS_LLM_CALLS
= 'action.has_llm_calls'
ACTION_LLM_CALLS_COUNT
= 'action.llm_calls_count'
ACTION_NAME
= 'action.name'
ACTION_PARAM_PREFIX
= 'action.param.'
API_NAME
= 'api.name'
LLM_CACHE_HIT
= 'llm.cache.hit'
RAIL_DECISIONS
= 'rail.decisions'
RAIL_INPUT
= 'guardrails.rail.input'
RAIL_NAME
= 'rail.name'
RAIL_REASON
= 'guardrails.rail.reason'
RAIL_STOP
= 'rail.stop'
RAIL_TYPE
= 'rail.type'
REQUEST_INPUT
= 'guardrails.request.input'
REQUEST_OUTPUT
= 'guardrails.request.output'
SPECULATIVE_FIRST_COMPLETED
= 'speculative_generation.first_completed'
SPECULATIVE_FIRST_COMPLETED_GENERATION
= 'generation'
SPECULATIVE_FIRST_COMPLETED_INPUT_RAILS
= 'input_rails'
SPECULATIVE_FIRST_REJECTOR
= 'speculative_generation.first_rejector'
SPECULATIVE_MODE_ACTIVE
= 'speculative_generation.mode_active'
class nemoguardrails.tracing.constants.GuardrailsEventNames()

NeMo Guardrails-specific event names (not OTel GenAI conventions).

These events represent internal guardrails state changes, not LLM API calls. They use a guardrails-specific namespace to avoid confusion with OTel GenAI semantic conventions.

USER_MESSAGE
= 'guardrails.user_message'
UTTERANCE_BOT_FINISHED
= 'guardrails.utterance.bot.finished'
UTTERANCE_BOT_STARTED
= 'guardrails.utterance.bot.started'
UTTERANCE_USER_FINISHED
= 'guardrails.utterance.user.finished'
class nemoguardrails.tracing.constants.GuardrailsEventTypes()

NeMo Guardrails internal event type constants.

These are the type values from internal guardrails events.

START_UTTERANCE_BOT_ACTION
= 'StartUtteranceBotAction'
SYSTEM_MESSAGE
= 'SystemMessage'
USER_MESSAGE
= 'UserMessage'
UTTERANCE_BOT_ACTION_FINISHED
= 'UtteranceBotActionFinished'
UTTERANCE_USER_ACTION_FINISHED
= 'UtteranceUserActionFinished'
class nemoguardrails.tracing.constants.LLMInstruments(
token_usage: opentelemetry.metrics.Histogram,
operation_duration: opentelemetry.metrics.Histogram,
time_to_first_chunk: opentelemetry.metrics.Histogram,
time_per_output_chunk: opentelemetry.metrics.Histogram
)
Dataclass

LLM-call-scope OTEL instruments for downstream model calls.

These metrics fire once per LLM call (not once per IORails request) and follow the OTEL GenAI semantic conventions exactly — the field names mirror the metric names with the gen_ai.client. prefix stripped, and both are Histograms (per spec).

  • token_usagegen_ai.client.token.usage Histogram, unit {token}. Records input and output tokens as separate observations distinguished by the required gen_ai.token.type label (input or output).
  • operation_durationgen_ai.client.operation.duration Histogram, unit s. Records the wall-clock time of each LLM call from request issue to response completion.
  • time_to_first_chunkgen_ai.client.operation.time_to_first_chunk Histogram, unit s. Streaming-only. Time from request issue to the first content-bearing chunk yielded.
  • time_per_output_chunkgen_ai.client.operation.time_per_output_chunk Histogram, unit s. Streaming-only. Inter-chunk gap; one observation per content-bearing chunk after the first.
operation_duration
Histogram
time_per_output_chunk
Histogram
time_to_first_chunk
Histogram
token_usage
Histogram
class nemoguardrails.tracing.constants.MetricNames()

OTEL metric names emitted by the IORails engine.

These names are part of the library’s public API — customers point dashboards and alerts at them. Tests deliberately assert on the raw strings rather than these constants so the assertions verify the wire contract instead of re-referencing the same symbol the production code uses.

GEN_AI_CLIENT_OPERATION_DURATION
= 'gen_ai.client.operation.duration'
GEN_AI_CLIENT_OPERATION_TIME_PER_OUTPUT_CHUNK
= 'gen_ai.client.operation.time_per_output_chunk'
GEN_AI_CLIENT_OPERATION_TIME_TO_FIRST_CHUNK
= 'gen_ai.client.operation.time_to_first_chunk'
GEN_AI_CLIENT_TOKEN_USAGE
= 'gen_ai.client.token.usage'
NONSTREAM_ACTIVE
= 'guardrails.nonstream.active'
NONSTREAM_QUEUED
= 'guardrails.nonstream.queued'
NONSTREAM_REJECTIONS
= 'guardrails.nonstream.rejections'
REQUESTS
= 'guardrails.requests'
REQUESTS_ACTIVE
= 'guardrails.requests.active'
REQUESTS_BLOCKED
= 'guardrails.requests.blocked'
REQUESTS_ERRORS
= 'guardrails.requests.errors'
REQUEST_DURATION
= 'guardrails.request.duration'
STREAM_ACTIVE
= 'guardrails.stream.active'
STREAM_REJECTIONS
= 'guardrails.stream.rejections'
class nemoguardrails.tracing.constants.OperationNames()

Standard operation names for GenAI semantic conventions.

Note: This only defines standard LLM operations. Custom actions and tasks should be passed through as-is since they are dynamic and user-defined.

CHAT
= 'chat'
COMPLETION
= 'completion'
EMBEDDING
= 'embedding'
GUARDRAILS
= 'guardrails'
class nemoguardrails.tracing.constants.OtelContentCapture()

OTEL environment-variable names and tokens for content-capture gating.

Two independent OTEL-standard env vars control content capture:

  • OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT — fallback enable switch when config.tracing.enable_content_capture is unset/False. Truthy values ("true", "1") turn capture on.
  • OTEL_SEMCONV_STABILITY_OPT_IN — comma-separated stability opt-in list. When "gen_ai_latest_experimental" is present, content is emitted as new-form span attributes (gen_ai.input.messages etc.); otherwise as legacy span events (gen_ai.user.message etc.).
CAPTURE_CONTENT_ENV
STABILITY_OPT_IN_ENV
= 'OTEL_SEMCONV_STABILITY_OPT_IN'
STABILITY_OPT_IN_LATEST
= 'gen_ai_latest_experimental'
class nemoguardrails.tracing.constants.SpanKind()

String constants for span kinds.

CLIENT
= 'client'
INTERNAL
= 'internal'
SERVER
= 'server'
class nemoguardrails.tracing.constants.SpanNamePatterns()

Patterns used for identifying span types from span names.

COMPLETION
= 'completion'
GEN_AI_PREFIX
= 'gen_ai.'
GUARDRAILS_REQUEST_PATTERN
= 'guardrails.request'
INTERACTION
= 'interaction'
LLM
= 'llm'
class nemoguardrails.tracing.constants.SpanNames()

Standard span names following OpenTelemetry GenAI semantic conventions.

Based on: https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-spans/

IMPORTANT: Span names must be low cardinality to avoid performance issues. Variable/high cardinality data (like specific rail types, model names, etc.) should go in attributes instead of the span name.

GEN_AI_CHAT
= 'chat'
GEN_AI_COMPLETION
= 'completion'
GEN_AI_EMBEDDING
= 'embedding'
GUARDRAILS_ACTION
= 'guardrails.action'
GUARDRAILS_RAIL
= 'guardrails.rail'
GUARDRAILS_REQUEST
= 'guardrails.request'
class nemoguardrails.tracing.constants.SpanTypes()

Internal span type identifiers used in span mapping.

These are internal identifiers used to categorize spans before mapping to actual span names. They represent the type of operation being traced.

Note: ‘llm_call’ maps to various GenAI semantic convention span types like inference (gen_ai.inference.client), embeddings, etc.

ACTION
= 'action'
INTERACTION
= 'interaction'
LLM_CALL
= 'llm_call'
RAIL
= 'rail'
class nemoguardrails.tracing.constants.SystemConstants()

System-level constants for NeMo Guardrails.

SYSTEM_NAME
= 'nemo-guardrails'
UNKNOWN
= 'unknown'
class nemoguardrails.tracing.constants.TokenType()

Allowed values for the gen_ai.token.type metric label.

Per OTEL GenAI semconv, only input and output are valid. Reasoning and cached tokens are exposed as span attributes (gen_ai.usage.reasoning.output_tokens etc.), not as additional token.type values on the gen_ai.client.token.usage metric.

INPUT
= 'input'
OUTPUT
= 'output'
nemoguardrails.tracing.constants._ensure_llm_instruments() -> typing.Optional[nemoguardrails.tracing.constants.LLMInstruments]

Lazily create the LLM-call-scope instruments and return them as an :class:LLMInstruments. Returns None when the OTEL API is not installed.

Bucket boundaries on every histogram are exact matches to the OTEL GenAI semantic-conventions spec recommendations: https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-metrics/ See :data:_LLM_DURATION_BUCKETS and :data:_LLM_TOKEN_BUCKETS above.

nemoguardrails.tracing.constants._llm_call_attributes(
model_name: str,
provider_name: str,
operation_name: str
) -> dict

Return the standard OTEL GenAI label set shared by every gen_ai.client.* Histogram emission.

These three are the lowest-cardinality labels the spec mandates as Required (operation.name, provider.name) or Conditionally Required (request.model). Per-metric labels (token.type, error.type) are added by individual emission helpers.

nemoguardrails.tracing.constants.llm_operation_duration(
model_name: str,
provider_name: str,
operation_name: str
) -> typing.Generator[None, None, None]

Context manager that records the wrapped block’s wall-clock duration into gen_ai.client.operation.duration.

On exception, adds the error.type label (per spec, conditionally required on the duration metric only — token usage carries no error.type even on failed calls) and re-raises. No-op when the OTEL API is unavailable.

nemoguardrails.tracing.constants.record_time_per_output_chunk(
model_name: str,
provider_name: str,
operation_name: str,
duration_s: float
) -> None

Emit a gen_ai.client.operation.time_per_output_chunk observation.

Records the inter-chunk interval for one content-bearing chunk after the first. Each chunk produces one observation; aggregates show p50/p95/p99 for chunk-arrival pacing across the stream.

Caller is responsible for skipping the first chunk (covered by record_time_to_first_chunk instead) and for skipping non-content frames (terminal usage chunk, role-only frames) that would skew the distribution.

No-op when the OTEL API is unavailable.

nemoguardrails.tracing.constants.record_time_to_first_chunk(
model_name: str,
provider_name: str,
operation_name: str,
duration_s: float
) -> None

Emit a gen_ai.client.operation.time_to_first_chunk observation.

Records the elapsed seconds from request issue to the first content-bearing chunk yielded by the streaming response. Caller is responsible for the timing — this helper just records the value onto the histogram with the standard label set.

Per OTEL semconv, “first chunk” is the first chunk carrying actual output (content or reasoning delta) — not the role-only or other cosmetic SSE frames that don’t carry data.

No-op when the OTEL API is unavailable.

nemoguardrails.tracing.constants.record_token_usage(
model_name: str,
provider_name: str,
operation_name: str,
usage: typing.Optional[nemoguardrails.types.UsageInfo]
) -> None

Emit two gen_ai.client.token.usage observations (one input, one output) for a completed LLM call.

Per spec only input and output are valid gen_ai.token.type values — reasoning and cached tokens are span-only attributes, not metric labels.

No-op when usage is None (the upstream provider didn’t return a usage field — common for streaming when stream_options.include_usage is suppressed) or the OTEL API is unavailable. Skipping emission rather than recording zeros keeps the histogram honest: “no observation” is distinct from “0 tokens”.

nemoguardrails.tracing.constants._LLM_DURATION_BUCKETS = [0.01, 0.02, 0.04, 0.08, 0.16, 0.32, 0.64, 1.28, 2.56, 5.12, 10.24, 20.48, 40.96...
nemoguardrails.tracing.constants._LLM_TOKEN_BUCKETS = [1, 4, 16, 64, 256, 1024, 4096, 16384, 65536, 262144, 1048576, 4194304, 16777216...
nemoguardrails.tracing.constants._llm_instruments: Optional[LLMInstruments] = None