For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
  • About NVIDIA NeMo Relay
    • Overview
    • Architecture
    • Ecosystem
    • Concepts
    • Release Notes
  • Getting Started
    • Agent Runtime Primer
    • Prerequisites
    • Installation
    • Configuration / Setup
    • Quick Start
  • NVIDIA NeMo Relay CLI
    • About
    • Basic Usage
    • Claude Code
    • Codex
    • Cursor
    • Hermes Agent
  • Supported Integrations
    • About
    • OpenClaw Plugin Guide
    • LangChain Integration Guide
    • LangGraph Integration Guide
    • Deep Agents Integration Guide
  • Instrument Applications
    • About
    • Adding Scopes and Marks
    • Instrument a Tool Call
    • Instrument an LLM Call
    • Add Middleware
    • Code Examples
  • Observability Plugin
    • About
    • Configuration
    • Agent Trajectory Interchange Format (ATIF)
    • Agent Trajectory Observability Format (ATOF)
    • OpenTelemetry
    • OpenInference
  • Adaptive Plugin
    • About
    • Configuration
    • Adaptive Cache Governor (ACG)
    • Adaptive Hints
  • NeMo Guardrails Plugin
    • About
    • Configuration
  • Integrate into Frameworks
    • About
    • Adding Scopes
    • Wrap Tool Calls
    • Wrap LLM Calls
    • Handle Non-Serializable Data
    • Using Codecs
    • Provider Codecs
    • Provider Response Codecs
    • Code Examples
  • Build Plugins
    • About
    • Define a Plugin
    • Validate Plugin Configuration
    • Plugin Configuration Files
    • Register Plugin Behavior
    • Design Plugin Configuration
    • NeMo Guardrails Example Plugin
    • Code Examples
  • Contribute
    • About
    • Development Setup
    • Workflow and Reviews
    • Testing and Documentation
  • Reference
    • APIs
    • Performance
  • Resources
    • Support and FAQs
    • Glossary
    • Troubleshooting Guide
    • Community
    • Legal
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogo
On this page
  • Dynamic Header Injection
  • OpenInference Export
  • Multi-Surface Policy Bundle
  • Framework-Facing Plugins
Build Plugins

Code Examples

||View as Markdown|
Previous

NeMo Guardrails Example Plugin

Next

About

This page collects concrete examples for the surrounding guide area.

Dynamic Header Injection

Use an LLM request intercept when a plugin needs to inject tenant or routing metadata into every provider request.

LLM request intercepts receive three arguments: name, request, and annotated. The request object is immutable, however it is possible to return a new instance of the request with edits, the exception to this is when the intercept is written in Rust.

Python
Node.js
1from typing import Any
2
3import nemo_relay
4
5class HeaderPlugin:
6 def validate(self, plugin_config: dict[str, Any]) -> list[dict[str, str]]:
7 if "header_name" not in plugin_config or "value" not in plugin_config:
8 return [{
9 "level": "error",
10 "code": "header-plugin.invalid_config",
11 "message": "header_name and value are required",
12 }]
13 return []
14
15 def register(self, plugin_config: dict[str, Any], context: nemo_relay.plugin.PluginContext):
16 def add_header(
17 name: str,
18 request: nemo_relay.LLMRequest,
19 annotated: nemo_relay.AnnotatedLLMRequest | None
20 ) -> tuple[nemo_relay.LLMRequest, nemo_relay.AnnotatedLLMRequest | None]:
21 # The request object is immutable, however we can return a new instance with updated headers.
22 headers = request.headers.copy()
23 headers[plugin_config["header_name"]] = plugin_config["value"]
24 return nemo_relay.LLMRequest(headers=headers, content=request.content), annotated
25
26 context.register_llm_request_intercept("inject-header", 100, False, add_header)
27
28nemo_relay.plugin.register("header-plugin", HeaderPlugin())

This pattern is useful for:

  • Tenant identity
  • Trace correlation
  • Region or deployment routing

OpenInference Export

Use a subscriber-oriented plugin when the component should watch the full lifecycle rather than rewrite requests.

Python
Node.js
1import nemo_relay
2
3class OpenInferencePlugin:
4 def validate(self, plugin_config):
5 if "endpoint" not in plugin_config:
6 return [{
7 "level": "error",
8 "code": "openinference-export.invalid_config",
9 "message": "endpoint is required",
10 }]
11 return []
12
13 def register(self, plugin_config, context):
14 endpoint = plugin_config["endpoint"]
15
16 def on_event(event):
17 print("export", endpoint, event.kind, event.name)
18
19 context.register_subscriber("openinference-export", on_event)
20
21nemo_relay.plugin.register("openinference-export", OpenInferencePlugin())

This is the right pattern when the component:

  • Exports traces or metrics
  • Aggregates events across tools and LLMs
  • Should not change execution behavior

Multi-Surface Policy Bundle

A plugin can register more than one runtime surface when one configuration document controls a related behavior bundle.

For example, a policy bundle can install:

  • A telemetry subscriber
  • LLM request intercepts for request metadata
  • Tool guardrails for policy enforcement
  • Sanitize guardrails for exported payloads
  • Shared component-local state used by those hooks

Use this pattern when the configured behavior is easier to reason about as one component than as several unrelated plugin components. Keep each registered surface small and make the component config explicit about which surfaces are enabled.

Framework-Facing Plugins

Plugins can stay framework-agnostic if they operate on the normalized runtime data rather than framework-specific objects.

Good examples:

  • Rewrite provider headers
  • Emit tracing data
  • Attach scheduling hints
  • Apply cross-framework safety policies