This page explains how NeMo Relay connects scopes, middleware, plugins, events, subscribers, and exporters.
This diagram connects the runtime pieces to the layers they inhabit.
Adaptive appears here as a built-in plugin component rather than a separate runtime model because it activates through the same plugin lifecycle.
NeMo Relay combines a small number of runtime pieces into one shared execution model:
Every emitted scope, tool, LLM, or mark event attaches to the active scope stack. Every managed tool or LLM call resolves the currently visible middleware before it executes.
These components are the primary building blocks that make up the runtime model.
The active scope stack defines the ownership tree for runtime work. It establishes:
The middleware registries hold the active intercepts and guardrails for tool and LLM execution. Managed helpers read those registries before invoking the real callback.
The plugin system installs reusable runtime components from configuration. A plugin can register middleware, subscribers, or related behavior without requiring each application call site to do the work manually.
The runtime emits structured events for scopes, tools, LLMs, and named marks. Those events are the canonical record of runtime behavior. Native Rust, Python, Node.js, and FFI event-producing APIs enqueue subscriber work and return without waiting for subscriber callbacks or exporter work.
Subscribers consume the event stream through the background dispatcher. Some subscribers stay in-process. Others export that stream into files or tracing systems. Use the binding flush API when a test or shutdown path must wait for already-queued subscriber work.
Runtime state is easiest to understand by separating ownership from process-wide registration.
The scope stack defines:
Middleware exists at two levels:
That split lets long-lived defaults coexist with request-specific or task-specific behavior.
Managed tool and LLM execution follows the same high-level order:
Two distinctions matter:
For the expanded request-to-response runtime path, including streaming and subscriber handoff, see Middleware.
From bottom to top, NeMo Relay is organized as:
The details of a binding can vary, but the conceptual model stays the same across those layers.
NeMo Relay is designed so that application developers, framework integrators, plugin authors, and observability consumers all reason about the same runtime semantics. One conceptual model should remain stable even when the binding or integration style changes.
The following concepts are related to this architecture: