For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
  • About NVIDIA NeMo Relay
    • Overview
    • Architecture
    • Ecosystem
    • Concepts
    • Release Notes
  • Getting Started
    • Agent Runtime Primer
    • Prerequisites
    • Installation
    • Configuration / Setup
    • Quick Start
  • NVIDIA NeMo Relay CLI
    • About
    • Basic Usage
    • Claude Code
    • Codex
    • Cursor
    • Hermes Agent
  • Supported Integrations
    • About
    • OpenClaw Plugin Guide
    • LangChain Integration Guide
    • LangGraph Integration Guide
    • Deep Agents Integration Guide
  • Instrument Applications
    • About
    • Adding Scopes and Marks
    • Instrument a Tool Call
    • Instrument an LLM Call
    • Add Middleware
    • Code Examples
  • Observability Plugin
    • About
    • Configuration
    • Agent Trajectory Interchange Format (ATIF)
    • Agent Trajectory Observability Format (ATOF)
    • OpenTelemetry
    • OpenInference
  • Adaptive Plugin
    • About
    • Configuration
    • Adaptive Cache Governor (ACG)
    • Adaptive Hints
  • NeMo Guardrails Plugin
    • About
    • Configuration
  • Integrate into Frameworks
    • About
    • Adding Scopes
    • Wrap Tool Calls
    • Wrap LLM Calls
    • Handle Non-Serializable Data
    • Using Codecs
    • Provider Codecs
    • Provider Response Codecs
    • Code Examples
  • Build Plugins
    • About
    • Define a Plugin
    • Validate Plugin Configuration
    • Plugin Configuration Files
    • Register Plugin Behavior
    • Design Plugin Configuration
    • NeMo Guardrails Example Plugin
    • Code Examples
  • Contribute
    • About
    • Development Setup
    • Workflow and Reviews
    • Testing and Documentation
  • Reference
    • APIs
    • Performance
  • Resources
    • Support and FAQs
    • Glossary
    • Troubleshooting Guide
    • Community
    • Legal
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogo
On this page
  • What You Build
  • Before You Start
  • Integration Pattern
  • Minimal Example
  • Validate the Integration
  • Production Checklist
  • Common Issues
  • Next Steps
Instrument Applications

Instrument an LLM Call

||View as Markdown|
Previous

Instrument a Tool Call

Next

Add Middleware

Use this guide when you own the model-provider callback and want NeMo Relay to emit lifecycle events, apply LLM middleware, and preserve the active agent scope around the call.

What You Build

You will wrap one existing LLM provider invocation with the managed LLM execution API. The result is an LLM call that:

  • Receives an LLM request object such as LLMRequest in Python or LlmRequest in Node.js and Rust.
  • Runs LLM request intercepts, guardrails, execution intercepts, and response guardrails.
  • Emits LLM start and LLM end events.
  • Records model metadata for observability and trajectory export.
  • Keeps the LLM span attached to the current agent or request scope.
  • Returns the original provider result to the application.

Before You Start

Complete one binding Quick Start guide first:

  • Python Quick Start
  • Node.js Quick Start
  • Rust Quick Start

Create a scope for the active request or agent run before adding LLM instrumentation. If you have not done that yet, start with Adding Scopes and Marks.

The request and response payloads must be JSON-compatible. If your provider SDK uses clients, streams, callbacks, or other opaque objects, keep those objects in the provider callback and pass only a serializable request projection into NeMo Relay.

Integration Pattern

Follow these steps to route the provider invocation through NeMo Relay:

  1. Identify the stable provider invocation boundary in your application.
  2. Create or inherit a scope for the current agent run, request, or workflow.
  3. Register a temporary subscriber while validating the integration.
  4. Build an LLM request object with provider headers and content.
  5. Replace the direct provider invocation with the managed LLM execute helper.
  6. Pass the active scope handle and a stable model_name.
  7. Check that the provider result is unchanged and lifecycle events are emitted.

Minimal Example

The examples below wrap a demo provider callback and print emitted events.

Python
Node.js
Rust
1import asyncio
2
3import nemo_relay
4
5def log_event(event) -> None:
6 print(f"{event.kind} {event.name}")
7
8async def call_provider(request: nemo_relay.LLMRequest):
9 return {
10 "text": "hello",
11 "messages": request.content["messages"],
12 }
13
14async def main() -> None:
15 nemo_relay.subscribers.register("llm-check", log_event)
16
17 try:
18 with nemo_relay.scope.scope("agent-run", nemo_relay.ScopeType.Agent) as handle:
19 request = nemo_relay.LLMRequest(
20 {},
21 {"messages": [{"role": "user", "content": "hello"}]},
22 )
23 result = await nemo_relay.llm.execute(
24 "demo-provider",
25 request,
26 call_provider,
27 handle=handle,
28 model_name="demo-model",
29 )
30 print(result)
31 finally:
32 nemo_relay.subscribers.flush()
33 nemo_relay.subscribers.deregister("llm-check")
34
35asyncio.run(main())

Validate the Integration

Check both behavior and instrumentation:

  • The provider result matches what the application returned before the wrapper was added.
  • The subscriber prints an agent or request scope event.
  • The subscriber prints LLM start and LLM end events for demo-provider.

Native subscriber delivery is asynchronous. Flush subscribers before validating printed output. In Node.js, also wait one event-loop tick after flushSubscribers() so JavaScript callbacks can run.

  • LLM start input contains the request after request intercepts and sanitize-request guardrails.
  • LLM end output contains the provider response after response guardrails.
  • The LLM event includes the normalized model_name when you provide one.

If only the business result appears, the callback ran but instrumentation did not run. Confirm that the call goes through llm.execute, llmCallExecute, or llm_call_execute.

Production Checklist

Before deploying to production, ensure the following checklist is completed:

  • Keep provider names stable. Subscribers and exporters use names for filtering and dashboards.
  • Pass model_name separately when the model should be easy to filter or export.
  • Keep request and response payloads JSON-compatible.
  • Keep SDK clients and transport objects inside the provider callback.
  • Use codecs when middleware needs normalized provider request or response semantics.
  • Use sanitize guardrails before exporting prompts or model responses in production.

Common Issues

Check these symptoms first when the workflow does not behave as expected.

  • No LLM events appear: The application is still calling the provider directly.
  • The LLM appears outside the agent scope: Pass the current scope handle into the managed execute helper.
  • Middleware sees provider-specific shapes: Add a codec so request intercepts can work with normalized annotated data.
  • Sensitive prompt data appears in traces: Add LLM sanitize-request and sanitize-response guardrails before registering production exporters.

Next Steps

Use these links to continue from this workflow into the next related task.

  • Instrument tools with Instrument a Tool Call.
  • Add policy or transformation with Add Middleware.
  • Use Provider Codecs when middleware needs normalized LLM request and response data.
  • Export events with Observability.