For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
  • About NVIDIA NeMo Relay
    • Overview
    • Architecture
    • Ecosystem
    • Concepts
    • Release Notes
  • Getting Started
    • Agent Runtime Primer
    • Prerequisites
    • Installation
    • Configuration / Setup
    • Quick Start
  • NVIDIA NeMo Relay CLI
    • About
    • Basic Usage
    • Claude Code
    • Codex
    • Cursor
    • Hermes Agent
  • Supported Integrations
    • About
    • OpenClaw Plugin Guide
    • LangChain Integration Guide
    • LangGraph Integration Guide
    • Deep Agents Integration Guide
  • Instrument Applications
    • About
    • Adding Scopes and Marks
    • Instrument a Tool Call
    • Instrument an LLM Call
    • Add Middleware
    • Code Examples
  • Observability Plugin
    • About
    • Configuration
    • Agent Trajectory Interchange Format (ATIF)
    • Agent Trajectory Observability Format (ATOF)
    • OpenTelemetry
    • OpenInference
  • Adaptive Plugin
    • About
    • Configuration
    • Adaptive Cache Governor (ACG)
    • Adaptive Hints
  • NeMo Guardrails Plugin
    • About
    • Configuration
  • Integrate into Frameworks
    • About
    • Adding Scopes
    • Wrap Tool Calls
    • Wrap LLM Calls
    • Handle Non-Serializable Data
    • Using Codecs
    • Provider Codecs
    • Provider Response Codecs
    • Code Examples
  • Build Plugins
    • About
    • Define a Plugin
    • Validate Plugin Configuration
    • Plugin Configuration Files
    • Register Plugin Behavior
    • Design Plugin Configuration
    • NeMo Guardrails Example Plugin
    • Code Examples
  • Contribute
    • About
    • Development Setup
    • Workflow and Reviews
    • Testing and Documentation
  • Reference
    • APIs
    • Performance
  • Resources
    • Support and FAQs
    • Glossary
    • Troubleshooting Guide
    • Community
    • Legal
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogo
On this page
  • Invocation API Selection
  • Manual Tool Lifecycle
  • Managed LLM Execution
  • Streaming LLM Execution
  • Partial Middleware Calls
  • Scope and Context Helpers
  • Middleware Registration Families
Instrument Applications

Code Examples

||View as Markdown|
Previous

Add Middleware

Next

Observability

Use these examples when you need the direct runtime surfaces behind the application instrumentation guides.

Invocation API Selection

The following table shows which API to use based on your integration need:

NeedPreferred APIUse When
Run a tool with full instrumentationtools.execute, toolCallExecute, tool_call_executeApplication code owns the callback.
Run an LLM call with full instrumentationllm.execute, llmCallExecute, llm_call_executeApplication code owns the provider call.
Run a streaming LLM callllm_stream_execute, typedLlmStreamExecute, llm_stream_call_executeYou need chunk collection and one final aggregate end event.
Emit start/end manuallycall and call_end helpersA framework owns the real invocation boundary.
Emit a checkpointscope.event, eventYou need milestone visibility inside an active scope.
Attach work to one requestScope-local registration helpersMiddleware or subscribers should disappear when that scope closes.

Manual Tool Lifecycle

Use manual lifecycle calls only when the surrounding code owns the real tool invocation and only exposes reliable start and finish hooks. If you are replaying events or bridging a framework clock, pass an explicit timestamp to the manual start, end, or mark helpers. Python accepts timezone-aware datetime values, Node.js and WebAssembly accept Unix microseconds since epoch, Rust accepts DateTime<Utc>, and Go accepts time.Time.

Python
Node.js
Rust
1import nemo_relay
2
3handle = nemo_relay.tools.call("search", {"query": "weather"}, data={"attempt": 1})
4try:
5 result = {"hits": 2}
6finally:
7 nemo_relay.tools.call_end(handle, result)

Managed LLM Execution

Use managed execution when NeMo Relay should run the full middleware pipeline around the provider call.

Python
Node.js
Rust
1import nemo_relay
2from nemo_relay import LLMRequest
3
4request = LLMRequest({}, {"messages": [{"role": "user", "content": "hello"}]})
5
6async def invoke(req: LLMRequest):
7 return {"text": "hi", "request": req.content}
8
9response = await nemo_relay.llm.execute(
10 "demo-provider",
11 request,
12 invoke,
13 model_name="demo-model",
14)

Streaming LLM Execution

Use the streaming helper when subscribers need chunk collection plus one final response payload.

Python
Node.js
Rust
1from dataclasses import dataclass
2
3from nemo_relay import LLMRequest
4from nemo_relay.typed import DataclassCodec, llm_stream_execute
5
6@dataclass
7class Chunk:
8 delta: str
9
10@dataclass
11class FinalResponse:
12 text: str
13
14request = LLMRequest({}, {"messages": [{"role": "user", "content": "hello"}]})
15collected: list[Chunk] = []
16
17async def stream_impl(_request: LLMRequest):
18 yield Chunk(delta="hi")
19
20stream = await llm_stream_execute(
21 "demo-provider",
22 request,
23 stream_impl,
24 collector=collected.append,
25 finalizer=lambda: FinalResponse(text="".join(chunk.delta for chunk in collected)),
26 chunk_json_codec=DataclassCodec(Chunk),
27 response_json_codec=DataclassCodec(FinalResponse),
28)

Partial Middleware Calls

These helpers are useful when framework code cannot use managed execution but still wants a request rewrite or block decision.

Python
Node.js
Rust
1import nemo_relay
2from nemo_relay import LLMRequest
3
4tool_args = nemo_relay.tools.request_intercepts("search", {"query": "weather"})
5nemo_relay.tools.conditional_execution("search", tool_args)
6
7llm_request = LLMRequest({}, {"messages": [{"role": "user", "content": "hello"}]})
8llm_request = nemo_relay.llm.request_intercepts("demo-provider", llm_request)
9nemo_relay.llm.conditional_execution(llm_request)

Scope and Context Helpers

Use normal scope helpers first. Reach for explicit stack helpers only when work crosses thread, task, worker, or request boundaries.

Python
Node.js
Rust
1from concurrent.futures import ThreadPoolExecutor
2
3import nemo_relay
4
5with nemo_relay.scope.scope("request", nemo_relay.ScopeType.Agent):
6 nemo_relay.scope.event("started", data={"ok": True})
7 shared = nemo_relay.propagate_scope_to_thread()
8
9 def worker() -> None:
10 nemo_relay.set_thread_scope_stack(shared)
11 nemo_relay.scope.event("worker-ran")
12
13 with ThreadPoolExecutor() as pool:
14 pool.submit(worker).result()

Middleware Registration Families

The runtime exposes the same registration families for tool and LLM calls:

  • Sanitize-request guardrails change emitted start-event payloads only
  • Sanitize-response guardrails change emitted end-event payloads only
  • Conditional-execution guardrails return an allow-or-block decision
  • Request intercepts change the real request before execution
  • Execution intercepts wrap the callback and may post-process or short-circuit
  • LLM stream execution intercepts wrap streaming provider callbacks

Every family also has a scope-local surface:

  • Python: nemo_relay.scope_local.register_*
  • Node.js: scopeRegister*
  • Rust: middleware scope_register_* functions under nemo_relay::api::registry; subscriber scope registration under nemo_relay::api::subscriber

Use Add Middleware for an end-to-end policy example and API Reference for symbol-level details.