NVIDIA NeMo Relay is a portable execution runtime for agent systems that already have a framework, model provider, policy layer, or observability backend. It gives those systems one consistent way to describe what is happening when an agent crosses a request, tool, or LLM boundary.
That layer is useful because agent applications rarely live inside one clean abstraction. A production stack might combine NeMo Agent Toolkit, LangChain, LangGraph, provider SDKs, custom harness code, NeMo Guardrails, tracing systems, and evaluation pipelines. NeMo Relay sits underneath those choices as the shared runtime contract for scopes, middleware, plugins, lifecycle events, adaptive behavior, and observability. Under the NeMo Relay scope stack and middleware, the scoped execution path is referred to as work.
The result is a framework-neutral substrate for agent execution. Applications keep their orchestration model, providers keep their native clients, and middleware authors get one place to package policy, interception, telemetry, and adaptive behavior across Rust, Python, and Node.js.
NeMo Relay is designed for teams that need agent runtime behavior to stay consistent as applications grow across frameworks, languages, and deployment targets.
Use the reading path that matches your task:
The diagram below shows how applications, runtime components, and exporters relate to each other. Scopes define where work belongs, middleware registries define what runs around that work, and subscribers consume the lifecycle events that the core emits.