Observability

View as Markdown

Use the Observability plugin when you need to inspect NeMo Relay lifecycle events in process or export agent activity to tracing, trajectory, or analysis systems from one plugin configuration document.

Observability in NeMo Relay starts with events. Scopes, marks, managed tool calls, managed LLM calls, middleware, and manual lifecycle APIs emit the canonical Agent Trajectory Observability Format (ATOF) event stream. Subscribers consume that stream in process, and exporter-oriented subscribers write raw ATOF JSONL or translate events into Agent Trajectory Interchange Format (ATIF), OpenTelemetry, or OpenInference.

The first-party plugin component has kind observability. It can install:

  • Agent Trajectory Observability Format (ATOF) JSONL export for raw lifecycle events.
  • Agent Trajectory Interchange Format (ATIF) trajectory export for each top-level agent scope.
  • OpenTelemetry OTLP trace export.
  • OpenInference-oriented OTLP trace export.

Plugin-Managed Versus Manual Export

Use the Observability plugin for process-level exporter setup that should be activated from config, plugins.toml, or a shared plugin document.

Use manual subscriber or exporter APIs when a test, script, or application needs direct control over registration names, collection windows, explicit flush timing, or per-run exporter objects. The plugin owns subscriber names and teardown for the sections it enables.

Use Observability When

Start here when you need to:

  • Verify that instrumentation is attached to the right scope.
  • Inspect tool and LLM inputs and outputs after sanitization.
  • Correlate concurrent agent runs by root scope.
  • Export traces to OTLP-compatible infrastructure.
  • Produce trajectory data for analysis, replay, or evaluation workflows.

If you have not instrumented scopes, tools, or LLM calls yet, start with Instrument Applications.

Exporter Selection

Choose the exporter based on the downstream system:

NeedUse
Raw canonical event streamAgent Trajectory Observability Format (ATOF)
Offline analysis, replay, or evaluation trajectoriesAgent Trajectory Interchange Format (ATIF)
Generic OTLP tracesOpenTelemetry
OpenInference-oriented agent and LLM spansOpenInference

Start with in-process event inspection before exporting externally. Add sanitize guardrails before exporters receive sensitive payloads.

For trace incidents involving missing traces, wrong scope attachment, export failures, duplicate events, or sensitive telemetry, use the Trace Incident Runbook.

Correlating Trajectories And Traces

When ATIF and trace exporters observe the same NeMo Relay events, they share NeMo Relay UUIDs for cross-format joins. Plugin-managed ATIF uses the top-level agent scope UUID as the trajectory session_id. ATIF step lineage stores the event UUID as step.extra.ancestry.function_id and the parent UUID as step.extra.ancestry.parent_id.

OpenTelemetry and OpenInference spans carry the same values as nemo_relay.uuid and nemo_relay.parent_uuid span attributes. Mark events use nemo_relay.mark.uuid and nemo_relay.mark.parent_uuid. Native backend trace_id and span_id values are still generated by the tracing backend and are not written into ATIF.

Pages