For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
  • About NVIDIA NeMo Relay
    • Overview
    • Architecture
    • Ecosystem
    • Concepts
    • Release Notes
  • Getting Started
    • Agent Runtime Primer
    • Prerequisites
    • Installation
    • Configuration / Setup
    • Quick Start
  • NVIDIA NeMo Relay CLI
    • About
    • Basic Usage
    • Claude Code
    • Codex
    • Cursor
    • Hermes Agent
  • Supported Integrations
    • About
    • OpenClaw Plugin Guide
    • LangChain Integration Guide
    • LangGraph Integration Guide
    • Deep Agents Integration Guide
  • Instrument Applications
    • About
    • Adding Scopes and Marks
    • Instrument a Tool Call
    • Instrument an LLM Call
    • Add Middleware
    • Code Examples
  • Observability Plugin
    • About
    • Configuration
    • Agent Trajectory Interchange Format (ATIF)
    • Agent Trajectory Observability Format (ATOF)
    • OpenTelemetry
    • OpenInference
  • Adaptive Plugin
    • About
    • Configuration
    • Adaptive Cache Governor (ACG)
    • Adaptive Hints
  • NeMo Guardrails Plugin
    • About
    • Configuration
  • Integrate into Frameworks
    • About
    • Adding Scopes
    • Wrap Tool Calls
    • Wrap LLM Calls
    • Handle Non-Serializable Data
    • Using Codecs
    • Provider Codecs
    • Provider Response Codecs
    • Code Examples
  • Build Plugins
    • About
    • Define a Plugin
    • Validate Plugin Configuration
    • Plugin Configuration Files
    • Register Plugin Behavior
    • Design Plugin Configuration
    • NeMo Guardrails Example Plugin
    • Code Examples
  • Contribute
    • About
    • Development Setup
    • Workflow and Reviews
    • Testing and Documentation
  • Reference
    • APIs
    • Performance
  • Resources
    • Support and FAQs
    • Glossary
    • Troubleshooting Guide
    • Community
    • Legal
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogo
On this page
  • What You Build
  • Before You Start
  • Integration Pattern
  • Concrete LLM Example
  • Streaming Providers
  • Validate the LLM Wrapper
  • Common Issues
  • Next Steps
Integrate into Frameworks

Wrap LLM Calls

||View as Markdown|
Previous

Wrap Tool Calls

Next

Handle Non-Serializable Data

Use this guide when a framework, SDK, or provider adapter owns model invocation and you need NeMo Relay to observe and control those provider calls.

What You Build

You will place a managed NeMo Relay LLM execution wrapper at the provider boundary. The wrapper emits LLM lifecycle events, runs LLM middleware, attaches the call to the active scope, records the model_name, and returns the provider response to the framework.

Before You Start

You need:

  • A framework request or run scope. If the framework does not create one yet, start with Adding Scopes.
  • A stable model-provider boundary, such as a provider adapter or client dispatch method.
  • A JSON-compatible request projection inside LLMRequest.
  • A JSON-compatible response projection for subscribers and exporters.

Integration Pattern

Follow this sequence to keep framework work attached to the expected runtime context.

  1. Enter or inherit the active framework scope.
  2. Convert the framework provider payload into LLMRequest.
  3. Route the real provider callback through the managed LLM execute helper.
  4. Pass a stable provider name and model_name.
  5. Keep provider clients, streams, callbacks, and retry state outside emitted JSON payloads.

Use a request or response codec when provider payloads need normalization before middleware or events see them. Use Provider Codecs for those cases.

Concrete LLM Example

The examples below wrap one provider call and attach it to the active parent scope.

Python
Node.js
Rust
1from typing import TypedDict
2
3import nemo_relay
4from nemo_relay import LLMRequest
5
6class LlmResponse(TypedDict):
7 text: str
8 request: object
9
10async def framework_llm(provider_name: str, payload: object) -> LlmResponse:
11 parent = nemo_relay.scope.get_handle()
12 request = LLMRequest({}, payload)
13
14 async def invoke(req: LLMRequest) -> LlmResponse:
15 return {"text": "hi", "request": req.content}
16
17 return await nemo_relay.llm.execute(
18 provider_name,
19 request,
20 invoke,
21 handle=parent,
22 model_name="demo-model",
23 )

Streaming Providers

Use the LLM stream execute helper when the framework exposes a stream boundary that NeMo Relay can own. Stream wrappers preserve the same scope and middleware model while letting subscribers observe the completed response after chunks are collected.

If the framework owns the stream internally, emit explicit start and end lifecycle events around the provider stream and use mark events for retry, queue, and partial-output milestones.

Validate the LLM Wrapper

Run one provider path and check:

  • The application receives the same provider response as before.
  • Subscribers see one LLM start event and one matching LLM end event.
  • The event includes the expected provider name and model_name.
  • LLM middleware runs exactly once.
  • Provider-owned clients, streams, and callbacks stay outside emitted JSON payloads.

Common Issues

Check these symptoms first when the workflow does not behave as expected.

  • The LLM appears outside the request trace: Pass the active scope handle or run the provider call inside the framework request scope.
  • The model name is missing: Pass model_name from the provider payload, model client, or framework run configuration.
  • Request middleware receives provider objects: Convert provider payloads into LLMRequest with JSON-compatible content before calling NeMo Relay.
  • Stream output is incomplete: Use the stream execute helper when NeMo Relay owns the stream boundary, or emit explicit lifecycle events when it does not.

Next Steps

Use these links to continue from this workflow into the next related task.

  • Add tool integration with Wrap Tool Calls.
  • Normalize provider payloads with Provider Codecs.
  • Use Handle Non-Serializable Data for provider clients, streams, and callback objects.