For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
  • About NVIDIA NeMo Relay
    • Overview
    • Architecture
    • Ecosystem
    • Concepts
    • Release Notes
  • Getting Started
    • Agent Runtime Primer
    • Prerequisites
    • Installation
    • Configuration / Setup
    • Quick Start
  • NVIDIA NeMo Relay CLI
    • About
    • Basic Usage
    • Claude Code
    • Codex
    • Cursor
    • Hermes Agent
  • Supported Integrations
    • About
    • OpenClaw Plugin Guide
    • LangChain Integration Guide
    • LangGraph Integration Guide
    • Deep Agents Integration Guide
  • Instrument Applications
    • About
    • Adding Scopes and Marks
    • Instrument a Tool Call
    • Instrument an LLM Call
    • Add Middleware
    • Code Examples
  • Observability Plugin
    • About
    • Configuration
    • Agent Trajectory Interchange Format (ATIF)
    • Agent Trajectory Observability Format (ATOF)
    • OpenTelemetry
    • OpenInference
  • Adaptive Plugin
    • About
    • Configuration
    • Adaptive Cache Governor (ACG)
    • Adaptive Hints
  • NeMo Guardrails Plugin
    • About
    • Configuration
  • Integrate into Frameworks
    • About
    • Adding Scopes
    • Wrap Tool Calls
    • Wrap LLM Calls
    • Handle Non-Serializable Data
    • Using Codecs
    • Provider Codecs
    • Provider Response Codecs
    • Code Examples
  • Build Plugins
    • About
    • Define a Plugin
    • Validate Plugin Configuration
    • Plugin Configuration Files
    • Register Plugin Behavior
    • Design Plugin Configuration
    • NeMo Guardrails Example Plugin
    • Code Examples
  • Contribute
    • About
    • Development Setup
    • Workflow and Reviews
    • Testing and Documentation
  • Reference
    • APIs
    • Performance
  • Resources
    • Support and FAQs
    • Glossary
    • Troubleshooting Guide
    • Community
    • Legal
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogo
On this page
  • Benefits
  • What Should I Read First?
  • Conceptual Diagram
About NVIDIA NeMo Relay

Overview

||View as Markdown|
Next

Architecture

NVIDIA NeMo Relay is a portable execution runtime for agent systems that already have a framework, model provider, policy layer, or observability backend. It gives those systems one consistent way to describe what is happening when an agent crosses a request, tool, or LLM boundary.

That layer is useful because agent applications rarely live inside one clean abstraction. A production stack might combine NeMo Agent Toolkit, LangChain, LangGraph, provider SDKs, custom harness code, NeMo Guardrails, tracing systems, and evaluation pipelines. NeMo Relay sits underneath those choices as the shared runtime contract for scopes, middleware, plugins, lifecycle events, adaptive behavior, and observability. Under the NeMo Relay scope stack and middleware, the scoped execution path is referred to as work.

The result is a framework-neutral substrate for agent execution. Applications keep their orchestration model, providers keep their native clients, and middleware authors get one place to package policy, interception, telemetry, and adaptive behavior across Rust, Python, and Node.js.

Benefits

NeMo Relay is designed for teams that need agent runtime behavior to stay consistent as applications grow across frameworks, languages, and deployment targets.

  • Instrument once at the execution boundary: Managed tool and LLM helpers attach work to the active scope, emit lifecycle events, and run the same middleware pipeline without scattering custom wrappers through every call site.
  • Keep concurrent agents isolated: Hierarchical scopes preserve parent-child event relationships, expose request-local middleware and subscribers, and clean up scope-owned registrations when work finishes.
  • Turn policy into reusable runtime components: Guardrails and intercepts can block work, sanitize observability payloads, transform requests, or wrap execution. Plugins package that behavior so applications and framework integrations can install it from configuration.
  • Export one event stream to many backends: Subscribers consume the canonical lifecycle stream in-process or translate it to Agent Trajectory Interchange Format (ATIF) trajectories, OpenTelemetry traces, and OpenInference-compatible traces for debugging, evaluation, and production observability.
  • Adopt without replacing the stack: NeMo Relay can sit below NeMo ecosystem components, third-party agent frameworks, provider adapters, or direct application code, so teams can add shared runtime semantics without a framework migration.
  • Share semantics across primary bindings: The Rust core, Python wrapper, and Node.js binding expose the same execution model, which helps framework authors, plugin authors, and application teams reason about behavior consistently.

What Should I Read First?

Use the reading path that matches your task:

TaskStart With
Understand what NeMo Relay addsAgent Runtime Primer
Run a minimal exampleQuick Start
Install packagesInstallation
Develop from sourceDevelopment Setup
Understand the runtime modelConcepts
Instrument an applicationInstrument Applications
Use a maintained integrationSupported Integrations
Integrate a frameworkIntegrate into Frameworks
Observe a local coding-agent CLINeMo Relay CLI
Package reusable behaviorBuild Plugins
Export traces or trajectoriesObservability
Debug trace incidentsTrace Incident Runbook
Tune performance with adaptive behaviorAdaptive
Look up symbolsAPIs

Conceptual Diagram

The diagram below shows how applications, runtime components, and exporters relate to each other. Scopes define where work belongs, middleware registries define what runs around that work, and subscribers consume the lifecycle events that the core emits.