NeMo Relay is a portable runtime layer for agent systems that already have an application, framework, or model provider. Use this primer when you need to understand what NeMo Relay adds before running Quick Start.
Agent applications usually cross several boundaries in one request: an entry point starts work, the agent calls a model, the model asks for tools, tools call services, and tracing or policy systems need to understand the result. Without a shared runtime layer, each boundary tends to grow its own wrappers, callback shape, trace vocabulary, and cleanup rules.
NeMo Relay gives those boundaries one execution model.
NeMo Relay does not decide what your agent should do. It describes and manages what happens when your agent crosses runtime boundaries.
The core runtime model has five parts:
The simplest mental model is:
NeMo Relay sits below the choices your application already makes.
It does not replace:
Instead, it gives those systems a shared runtime contract for call boundaries, policy hooks, event emission, and export.
Where you start depends on who owns the call boundary.
If your application directly calls tools or model providers, start by instrumenting the application boundary. Add scopes first, then wrap the tool and LLM calls your code owns.
If a framework owns scheduling, retries, callbacks, or provider payloads, use a framework integration. The integration should preserve framework behavior while adding NeMo Relay scopes, managed calls, codecs, middleware, and events at stable framework boundaries.
If you need the same behavior across multiple services or teams, package it as a plugin. Plugins are the configuration-driven path for reusable middleware, subscribers, exporters, and adaptive components.
The following pages help you choose the next step for your integration.