For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
  • About NVIDIA NeMo Relay
    • Overview
    • Architecture
    • Ecosystem
    • Concepts
    • Release Notes
  • Getting Started
    • Agent Runtime Primer
    • Prerequisites
    • Installation
    • Configuration / Setup
    • Quick Start
  • NVIDIA NeMo Relay CLI
    • About
    • Basic Usage
    • Claude Code
    • Codex
    • Cursor
    • Hermes Agent
  • Supported Integrations
    • About
    • OpenClaw Plugin Guide
    • LangChain Integration Guide
    • LangGraph Integration Guide
    • Deep Agents Integration Guide
  • Instrument Applications
    • About
    • Adding Scopes and Marks
    • Instrument a Tool Call
    • Instrument an LLM Call
    • Add Middleware
    • Code Examples
  • Observability Plugin
    • About
    • Configuration
    • Agent Trajectory Interchange Format (ATIF)
    • Agent Trajectory Observability Format (ATOF)
    • OpenTelemetry
    • OpenInference
  • Adaptive Plugin
    • About
    • Configuration
    • Adaptive Cache Governor (ACG)
    • Adaptive Hints
  • NeMo Guardrails Plugin
    • About
    • Configuration
  • Integrate into Frameworks
    • About
    • Adding Scopes
    • Wrap Tool Calls
    • Wrap LLM Calls
    • Handle Non-Serializable Data
    • Using Codecs
    • Provider Codecs
    • Provider Response Codecs
    • Code Examples
  • Build Plugins
    • About
    • Define a Plugin
    • Validate Plugin Configuration
    • Plugin Configuration Files
    • Register Plugin Behavior
    • Design Plugin Configuration
    • NeMo Guardrails Example Plugin
    • Code Examples
  • Contribute
    • About
    • Development Setup
    • Workflow and Reviews
    • Testing and Documentation
  • Reference
    • APIs
    • Performance
  • Resources
    • Support and FAQs
    • Glossary
    • Troubleshooting Guide
    • Community
    • Legal
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogo
On this page
  • Runtime Model
  • Practical Guidance
  • Related Topics
Reference

Performance

||View as Markdown|

NeMo Relay keeps runtime overhead focused around the work that is active for the current scope and call.

Runtime Model

These points summarize the runtime behaviors that matter most for performance-sensitive paths.

  • Scope stacks define active ownership and scope-local visibility.
  • Middleware registries are priority ordered and lazily sorted.
  • Managed tool and LLM helpers resolve visible middleware before executing the user callback.
  • Subscribers receive emitted events after runtime work creates them.

Practical Guidance

Use these practices when applying the concept in application or integration code.

  • Prefer scope-local middleware for request-specific behavior so cleanup happens when the scope closes.
  • Keep subscriber callbacks lightweight or move expensive export work out of the hot path.
  • Use execution intercepts when you need to wrap real execution and sanitize guardrails when you only need to change emitted observability payloads.
  • Use binding-native typed wrappers and codecs when provider payload conversion would otherwise be repeated at many call sites.

Related Topics

Use these links to continue into adjacent concepts and workflows.

  • Architecture
  • Middleware
  • Subscribers
  • Typed Wrappers And Codecs
Previous

Function nemo_relay_tool_handle_uuid

Next

Support and FAQs