> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/relay/llms.txt.
> For full documentation content, see https://docs.nvidia.com/nemo/relay/llms-full.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/relay/_mcp/server.

# Agent Runtime Primer

NeMo Relay is a portable runtime layer for agent systems that already have an
application, framework, or model provider. Use this primer when you need to
understand what NeMo Relay adds before running [Quick Start](/getting-started/quick-start).

Agent applications usually cross several boundaries in one request: an entry
point starts work, the agent calls a model, the model asks for tools, tools call
services, and tracing or policy systems need to understand the result. Without a
shared runtime layer, each boundary tends to grow its own wrappers, callback
shape, trace vocabulary, and cleanup rules.

NeMo Relay gives those boundaries one execution model.

## What NeMo Relay Adds

NeMo Relay does not decide what your agent should do. It describes and manages
what happens when your agent crosses runtime boundaries.

The core runtime model has five parts:

* **Scopes** describe where work belongs. They preserve parent-child
  relationships across requests, agent runs, tools, LLM calls, background work,
  and nested functions.
* **Managed tool and LLM calls** attach work to the active scope, run middleware
  in a consistent order, and emit lifecycle events. The application result is
  preserved unless registered intercepts or guardrails intentionally change the
  execution path.
* **Middleware** runs around managed execution. Intercepts can transform or wrap
  real calls. Guardrails can block execution or sanitize emitted observability
  payloads.
* **Events** record what happened. NeMo Relay emits Agent Trajectory
  Observability Format (ATOF) lifecycle records that subscribers and exporters
  can consume.
* **Plugins** package reusable runtime behavior so teams can install middleware,
  subscribers, exporters, or adaptive behavior from configuration instead of
  repeating setup code in every application.

The simplest mental model is:

```mermaid
flowchart LR
  boundary["App or framework boundary"]
  scope["NeMo Relay scope"]
  managedCall["Managed tool or LLM call"]
  middleware["Middleware"]
  event["Lifecycle event"]
  sink["Subscriber or exporter"]

  boundary --> scope --> managedCall --> middleware --> event --> sink
```

## What NeMo Relay Does Not Replace

NeMo Relay sits below the choices your application already makes.

It does not replace:

* your agent framework or orchestration logic
* your model provider or provider SDK
* your application business logic
* your production observability backend
* NeMo Agent Toolkit

Instead, it gives those systems a shared runtime contract for call boundaries,
policy hooks, event emission, and export.

## Choose The Boundary You Own

Where you start depends on who owns the call boundary.

If your application directly calls tools or model providers, start by
instrumenting the application boundary. Add scopes first, then wrap the tool and
LLM calls your code owns.

If a framework owns scheduling, retries, callbacks, or provider payloads, use a
framework integration. The integration should preserve framework behavior while
adding NeMo Relay scopes, managed calls, codecs, middleware, and events at stable
framework boundaries.

If you need the same behavior across multiple services or teams, package it as a
plugin. Plugins are the configuration-driven path for reusable middleware,
subscribers, exporters, and adaptive components.

## Read Next

The following pages help you choose the next step for your integration.

* Use [Quick Start](/getting-started/quick-start) for the smallest binding-specific example.
* Use [Instrument Applications](/instrument-applications/about) when you
  own the tool or LLM call site.
* Use [Integrate into Frameworks](/integrate-into-frameworks/about) when a
  framework owns invocation, scheduling, retries, callbacks, or provider
  payloads.
* Use [Build Plugins](/build-plugins/about) when behavior should be
  reusable and activated from configuration.
* Use [Observability](/observability-plugin/about) when you need to export
  runtime events to ATIF, OpenTelemetry, or OpenInference.
* Use [Adaptive](/adaptive-plugin/about) after baseline instrumentation is
  working and you want to tune behavior from observed runtime signals.