Overview
NVIDIA NeMo Relay helps you observe and control what happens inside agent runs without rewriting the agent stack you already have. It gives coding agents, applications, framework integrations, middleware, and observability backends a shared runtime for scopes, policy, plugins, and lifecycle events.
Agent systems usually cross several boundaries in one request: an entrypoint starts work, a model is called, tools run, subagents may branch off, and observability or policy systems need to understand what happened. Relay gives those boundaries one runtime contract instead of asking each layer to invent its own wrappers, trace vocabulary, and cleanup rules.
Integrating With Relay
Relay sits around the work you want to observe or control. Work can be a local coding-agent session, a request, a turn, an LLM call, a tool call, a subagent run, or a framework-specific lifecycle unit.
Relay does not replace your agent framework, model provider, application logic, observability backend, or guardrail authoring system. It gives those systems a common runtime boundary to meet at.
The first design question is simple: where can Relay observe or control the real work? The answer determines whether you should use a CLI sidecar, direct SDK instrumentation, a maintained integration, a framework wrapper, or a plugin.
Choose Your First Path
Pick the row closest to what you are trying to do.
If you are unsure how much Relay you need, capture one boundary first. Confirm that Relay emits raw lifecycle events, then add normalized exports, middleware, guardrails, or adaptive behavior.
Validate Raw Capture First
Start with Agent Trajectory Observability Format (ATOF) JSONL, the raw canonical event stream. It shows the lifecycle events Relay actually captured before anything is translated into Agent Trajectory Interchange Format (ATIF), OpenTelemetry, or OpenInference output.
A good first integration process workflow is as follows:
- Create or identify one scope boundary.
- Capture one LLM, tool, session, or turn boundary.
- Export ATOF JSONL and inspect the raw event stream.
- Add ATIF, OpenTelemetry, or OpenInference when the raw events are trustworthy.
- Add middleware only when Relay must block, sanitize, rewrite, route, or replace real execution.
Key Features
NeMo Relay offers the following features when you use it with your agent stacks:
- Scopes so runs, turns, tools, LLM calls, and subagents have clear ownership, parent-child lineage, cleanup boundaries, and request isolation.
- Managed LLM and tool calls so the same lifecycle and middleware rules apply around each callback.
- Middleware for the places where Relay must block, sanitize, transform, route, retry, or replace execution.
- Plugins so reusable observability, guardrail, adaptive, and exporter behavior can be turned on from configuration.
- Events and subscribers so raw ATOF, normalized ATIF, OpenTelemetry, and OpenInference output all come from the same runtime stream.
Use Concepts when you want the deeper model for scopes, events, middleware, subscribers, and plugins.
Developer Background
Rust is the source of truth for runtime behavior. The Python and Node.js bindings expose the same core model for primary application use. Go, WebAssembly, and raw C FFI are experimental and source-first surfaces.
Developers building integrations should start by identifying who owns the real LLM or tool call. If Relay can wrap the real callback, use managed execution. If the framework exposes lifecycle hooks but not execution control, use explicit lifecycle APIs or hook replay. If provider-shaped payload capture matters, consider the gateway/provider path and treat it as production traffic.
Do not add behavior to one primary binding without checking Rust, Python, and Node.js parity. Public behavior should stay consistent across the supported runtime surfaces.
Documentation
Use the tasks below to build your understanding and set up Relay:
Conceptual Diagram
The diagram below shows how applications, runtime components, and exporters relate to each other. Scopes define where work belongs, middleware registries define what runs around that work, and subscribers consume the lifecycle events that the core emits.