Overview

View as Markdown

NVIDIA NeMo Relay helps you observe and control what happens inside agent runs without rewriting the agent stack you already have. It gives coding agents, applications, framework integrations, middleware, and observability backends a shared runtime for scopes, policy, plugins, and lifecycle events.

Agent systems usually cross several boundaries in one request: an entrypoint starts work, a model is called, tools run, subagents may branch off, and observability or policy systems need to understand what happened. Relay gives those boundaries one runtime contract instead of asking each layer to invent its own wrappers, trace vocabulary, and cleanup rules.

Integrating With Relay

Relay sits around the work you want to observe or control. Work can be a local coding-agent session, a request, a turn, an LLM call, a tool call, a subagent run, or a framework-specific lifecycle unit.

Relay does not replace your agent framework, model provider, application logic, observability backend, or guardrail authoring system. It gives those systems a common runtime boundary to meet at.

The first design question is simple: where can Relay observe or control the real work? The answer determines whether you should use a CLI sidecar, direct SDK instrumentation, a maintained integration, a framework wrapper, or a plugin.

Choose Your First Path

Pick the row closest to what you are trying to do.

GoalStart WithWhy
Observe Codex, Claude Code, Cursor, or Hermes locallyNeMo Relay CLI and Basic UsageRelay runs as a local sidecar, forwards hooks, routes provider traffic when configured, and writes observability artifacts without changing application code.
Run the smallest binding-specific exampleQuick StartUse this when you want a minimal Rust, Python, or Node.js workflow before adding Relay to real application code.
Instrument application-owned LLM or tool callsInstrument ApplicationsDirect SDK instrumentation gives Relay full managed-call semantics around callbacks your code owns.
Use LangChain, LangGraph, Deep Agents, or OpenClawSupported IntegrationsMaintained integrations use public framework or plugin APIs where they preserve enough lifecycle fidelity.
Build a framework, host, or provider integrationIntegrate into FrameworksIntegration guidance helps you choose managed wrappers, explicit lifecycle APIs, hook replay, provider codecs, or upstream support.
Package reusable exporters, middleware, or policyBuild Plugins, Observability, and NeMo Guardrails PluginPlugins are the configuration-driven path for behavior that should be shared across applications or teams.
Develop or validate the repository itselfDevelopment Setup and Testing and DocsUse the contributor workflow when you are changing Relay source, docs, examples, bindings, or integration patches.

If you are unsure how much Relay you need, capture one boundary first. Confirm that Relay emits raw lifecycle events, then add normalized exports, middleware, guardrails, or adaptive behavior.

Validate Raw Capture First

Start with Agent Trajectory Observability Format (ATOF) JSONL, the raw canonical event stream. It shows the lifecycle events Relay actually captured before anything is translated into Agent Trajectory Interchange Format (ATIF), OpenTelemetry, or OpenInference output.

A good first integration process workflow is as follows:

  1. Create or identify one scope boundary.
  2. Capture one LLM, tool, session, or turn boundary.
  3. Export ATOF JSONL and inspect the raw event stream.
  4. Add ATIF, OpenTelemetry, or OpenInference when the raw events are trustworthy.
  5. Add middleware only when Relay must block, sanitize, rewrite, route, or replace real execution.

Key Features

NeMo Relay offers the following features when you use it with your agent stacks:

  • Scopes so runs, turns, tools, LLM calls, and subagents have clear ownership, parent-child lineage, cleanup boundaries, and request isolation.
  • Managed LLM and tool calls so the same lifecycle and middleware rules apply around each callback.
  • Middleware for the places where Relay must block, sanitize, transform, route, retry, or replace execution.
  • Plugins so reusable observability, guardrail, adaptive, and exporter behavior can be turned on from configuration.
  • Events and subscribers so raw ATOF, normalized ATIF, OpenTelemetry, and OpenInference output all come from the same runtime stream.

Use Concepts when you want the deeper model for scopes, events, middleware, subscribers, and plugins.

Developer Background

Rust is the source of truth for runtime behavior. The Python and Node.js bindings expose the same core model for primary application use. Go, WebAssembly, and raw C FFI are experimental and source-first surfaces.

Developers building integrations should start by identifying who owns the real LLM or tool call. If Relay can wrap the real callback, use managed execution. If the framework exposes lifecycle hooks but not execution control, use explicit lifecycle APIs or hook replay. If provider-shaped payload capture matters, consider the gateway/provider path and treat it as production traffic.

Do not add behavior to one primary binding without checking Rust, Python, and Node.js parity. Public behavior should stay consistent across the supported runtime surfaces.

Documentation

Use the tasks below to build your understanding and set up Relay:

TaskStart With
Install packagesInstallation
Understand the mental modelAgent Runtime Primer
Configure plugin filesPlugin Configuration Files
Export traces or trajectoriesObservability
Tune performance with adaptive behaviorAdaptive
Debug trace incidentsTrace Incident Runbook
Look up symbolsAPIs

Conceptual Diagram

The diagram below shows how applications, runtime components, and exporters relate to each other. Scopes define where work belongs, middleware registries define what runs around that work, and subscribers consume the lifecycle events that the core emits.