Metric Reference
This page lists every metric the IORails engine emits when metrics.enabled: true and a MeterProvider is configured.
Metrics fall into two families:
- Request-level metrics (
guardrails.*) describe IORails request flow: volume, errors, blocks, latency, and saturation of the streaming and non-streaming admission paths. - LLM client-side metrics (
gen_ai.client.*) describe downstream LLM calls IORails issues. These follow the OpenTelemetry GenAI semantic conventions and use the bucket boundaries recommended by that spec.
Request-Level Metrics
Bucket Boundaries: guardrails.request.duration
The duration histogram buckets use seconds:
Saturation Metrics
These metrics expose the internal admission paths so you can detect overload before users encounter errors.
Non-Streaming Path (Admission Queue)
queued and active are read live at collection time, so dashboards always show the current state.
After IORails.stop() is called, both gauges return no observations rather than stale values.
Streaming Path (Concurrency Semaphore)
Cross-Checking Saturation Metrics
At any collection instant, the sum of the per-path saturation gauges should approximately equal guardrails.requests.active:
A persistent drift between the two is a useful integrity check during dashboard development.
Dual-Counted Rejections
A QueueFull rejection on the non-streaming path increments both:
guardrails.nonstream.rejections(saturation signal)guardrails.requests.errors{error.type=QueueFull}(error signal)
This is intentional: dashboards built around either signal alone still reflect the rejection.
LLM Client-Side Metrics
These metrics are recorded once per downstream LLM call, not once per IORails request, and follow the OpenTelemetry GenAI semantic conventions.
gen_ai.token.type only takes the values input and output per spec.
Reasoning and cached tokens are exposed as span attributes (gen_ai.usage.reasoning.output_tokens and so on), not as additional metric label values.
Bucket Boundaries
Per the OpenTelemetry GenAI spec, durations use powers-of-two boundaries up to ~82 s:
Token counts use powers-of-four boundaries up to ~67M tokens:
Both match the spec exactly so backends auto-render the distributions correctly.
Streaming vs. Non-Streaming Emission
For streaming responses, token.usage is emitted only when the upstream provider returns a usage field. This is common when stream_options.include_usage=true is forwarded.
When usage is absent, no observation is recorded; “no observation” is deliberately distinct from “0 tokens”.
Common Label Reference
Public API Stability
The metric names listed on this page are part of the library’s public API, so dashboards and alerts can reference them. The library tests assert on the raw strings for this reason. Bucket boundaries follow the OpenTelemetry GenAI spec and can change if the spec changes.
Related Resources
- Enable Guardrails Metrics — Minimal SDK setup with console output.
- OpenTelemetry Metrics Integration — Production exporters: OTLP, Prometheus.
- OpenTelemetry GenAI metrics specification — Upstream semantic conventions.