For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
  • About NVIDIA NeMo Relay
    • Overview
    • Architecture
    • Ecosystem
    • Concepts
    • Release Notes
  • Getting Started
    • Agent Runtime Primer
    • Prerequisites
    • Installation
    • Configuration / Setup
    • Quick Start
  • NVIDIA NeMo Relay CLI
    • About
    • Basic Usage
    • Claude Code
    • Codex
    • Cursor
    • Hermes Agent
  • Supported Integrations
    • About
    • OpenClaw Plugin Guide
    • LangChain Integration Guide
    • LangGraph Integration Guide
    • Deep Agents Integration Guide
  • Instrument Applications
    • About
    • Adding Scopes and Marks
    • Instrument a Tool Call
    • Instrument an LLM Call
    • Add Middleware
    • Code Examples
  • Observability Plugin
    • About
    • Configuration
    • Agent Trajectory Interchange Format (ATIF)
    • Agent Trajectory Observability Format (ATOF)
    • OpenTelemetry
    • OpenInference
  • Adaptive Plugin
    • About
    • Configuration
    • Adaptive Cache Governor (ACG)
    • Adaptive Hints
  • NeMo Guardrails Plugin
    • About
    • Configuration
  • Integrate into Frameworks
    • About
    • Adding Scopes
    • Wrap Tool Calls
    • Wrap LLM Calls
    • Handle Non-Serializable Data
    • Using Codecs
    • Provider Codecs
    • Provider Response Codecs
    • Code Examples
  • Build Plugins
    • About
    • Define a Plugin
    • Validate Plugin Configuration
    • Plugin Configuration Files
    • Register Plugin Behavior
    • Design Plugin Configuration
    • NeMo Guardrails Example Plugin
    • Code Examples
  • Contribute
    • About
    • Development Setup
    • Workflow and Reviews
    • Testing and Documentation
  • Reference
    • APIs
    • Performance
  • Resources
    • Support and FAQs
    • Glossary
    • Troubleshooting Guide
    • Community
    • Legal
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogo
On this page
  • How NeMo Relay Fits In The NVIDIA NeMo Ecosystem
  • How NeMo Relay Fits Agent Frameworks And Harnesses
  • Related Topics
About NVIDIA NeMo Relay

Ecosystem

||View as Markdown|
Previous

Architecture

Next

Concepts

NeMo Relay is the agent execution runtime layer in the NVIDIA NeMo ecosystem. It does not replace an agent framework, model provider, guardrail authoring system, or deployment platform. Instead, it gives those systems one shared way to model execution scopes, lifecycle events, middleware, plugins, adaptive behavior, and observability around tool and LLM calls.

Use this page to understand where NeMo Relay fits:

  • Inside the NVIDIA NeMo software stack
  • Inside agent frameworks, harnesses, and provider adapters
  • Across the Rust, Python, Node.js, Go, WebAssembly, and C FFI surfaces in this repository

How NeMo Relay Fits In The NVIDIA NeMo Ecosystem

The NVIDIA NeMo ecosystem spans model development, agent construction, guardrailing, inference, optimization, and runtime operations. NeMo Relay has a narrower responsibility: it is the portable execution substrate that agent systems can call when actual work crosses a scope, tool, or model boundary.

LayerTypical ResponsibilityNeMo Relay Relationship
NeMo model, inference, and deployment componentsProvide or serve the models an agent uses.NeMo Relay records and controls LLM execution boundaries, but it does not train, host, or route model inference by itself.
NeMo Agent Toolkit and agent application frameworksBuild, run, profile, and optimize agent workflows across tools, data sources, and framework choices.NeMo Relay can sit below these systems as the shared runtime contract for scopes, middleware, lifecycle events, subscribers, and plugins.
NeMo Guardrails and policy systemsDefine safety, control, and compliance behavior for LLM applications.NeMo Relay can host runtime guardrails and intercepts around managed tool and LLM calls, while higher-level guardrail systems can still own policy authoring and orchestration.
Application harnesses and workflow codeDecide the agent pattern, planner, memory, retries, scheduling, and user-facing behavior.NeMo Relay instruments the execution boundaries that the harness already owns.
Observability and evaluation backendsStore traces, trajectories, metrics, and analysis data.NeMo Relay emits lifecycle events and exports them to in-process subscribers, Agent Trajectory Observability Format (ATOF), Agent Trajectory Interchange Format (ATIF), OpenTelemetry, OpenInference-compatible traces, or other backends.

In practical terms, NeMo Relay answers a different question than higher-level agent products. A framework asks, “What should the agent do next?” NeMo Relay asks, “When the agent does work, which scope owns it, which middleware applies, what events are emitted, and which subscribers can consume the result?”

The dotted path matters. An application or custom harness can call NeMo Relay directly without adopting a higher-level framework. A framework integration can also call NeMo Relay on behalf of application code when the framework owns the tool or provider boundary.

How NeMo Relay Fits Agent Frameworks And Harnesses

The agent framework and harness landscape is intentionally mixed. A team might use NeMo Agent Toolkit, LangChain, LangGraph, an internal orchestration layer, a provider SDK, or direct application code. NeMo Relay is designed to meet those systems at stable execution boundaries instead of requiring one framework shape.

Integration PointUse NeMo Relay ForKeep In The Framework Or Harness
Request, run, workflow, or agent lifecycle hooksCreate scopes, emit scope start and end events, and isolate concurrent work.Scheduling, routing, retry policy, planner choice, memory, and user session state.
Tool invocation callbacksRun managed tool execution, apply tool middleware, emit tool lifecycle events, and preserve parent scope context.Tool discovery, tool schema presentation, framework-specific callback signatures, and application-visible result handling.
LLM or provider adapter callsRun managed LLM execution, attach model metadata, apply LLM middleware, handle stream lifecycle events, and emit normalized observability payloads.Provider clients, authentication, transport, provider-native request objects, and provider-specific response types.
Framework internals that cannot hand over a callbackUse explicit lifecycle APIs, request-intercept helpers, guardrail helpers, or mark events.The actual invocation path when the framework must retain control.
Cross-cutting behaviorPackage middleware, subscribers, adaptive behavior, and reusable policy as plugins.Framework configuration, agent definitions, deployment topology, and business logic.

Prefer a managed execution wrapper when a framework exposes a stable callback that NeMo Relay can own. Use explicit lifecycle calls or standalone helpers when the framework owns the callback internally but exposes reliable start, finish, or request transformation hooks.

This lets NeMo Relay provide consistent runtime semantics without forcing a framework migration:

  • Applications keep their existing agent orchestration model
  • Framework adapters preserve public behavior and callback signatures
  • Non-serializable provider objects stay in framework-owned storage
  • NeMo Relay receives JSON-compatible payloads for middleware and events
  • Subscribers see a consistent scope, tool, and LLM event stream across integrations

Related Topics

Use these links to continue into adjacent concepts and workflows.

  • NVIDIA NeMo documentation
  • NVIDIA NeMo Agent Toolkit documentation
  • NVIDIA NeMo Guardrails documentation
  • Integrate into Frameworks
  • Adding Framework Scopes
  • Wrapping Tool Calls
  • Wrapping LLM Calls
  • Plugin Model