Wrap LLM Calls | NVIDIA NeMo Relay

Use this guide when a framework, SDK, or provider adapter owns model invocation and you need NeMo Relay to observe and control those provider calls.

What You Build

You will place a managed NeMo Relay LLM execution wrapper at the provider boundary. The wrapper emits LLM lifecycle events, runs LLM middleware, attaches the call to the active scope, records the model_name, and returns the provider response to the framework.

Before You Start

You need:

A framework request or run scope. If the framework does not create one yet, start with Adding Scopes.
A stable model-provider boundary, such as a provider adapter or client dispatch method.
A JSON-compatible request projection inside LLMRequest.
A JSON-compatible response projection for subscribers and exporters.

Integration Pattern

Follow this sequence to keep framework work attached to the expected runtime context.

Enter or inherit the active framework scope.
Convert the framework provider payload into LLMRequest.
Route the real provider callback through the managed LLM execute helper.
Pass a stable provider name and model_name.
Keep provider clients, streams, callbacks, and retry state outside emitted JSON payloads.

Use a request codec when provider requests need normalization before request intercepts or request-side middleware run. Use a response codec when provider responses need normalized LLM end-event annotations for subscribers or exporters. Use Provider Codecs for those cases.

Concrete LLM Example

The examples below wrap one provider call and attach it to the active parent scope.

Python

Node.js

Rust

1 from typing import TypedDict
2 
3 import nemo_relay
4 from nemo_relay import LLMRequest
5 
6 class LlmResponse(TypedDict):
7     text: str
8     request: object
9 
10 async def framework_llm(provider_name: str, payload: object) -> LlmResponse:
11     parent = nemo_relay.scope.get_handle()
12     request = LLMRequest({}, payload)
13 
14     async def invoke(req: LLMRequest) -> LlmResponse:
15         return {"text": "hi", "request": req.content}
16 
17     return await nemo_relay.llm.execute(
18         provider_name,
19         request,
20         invoke,
21         handle=parent,
22         model_name="demo-model",
23     )

Streaming Providers

Use the LLM stream execute helper when the framework exposes a stream boundary that NeMo Relay can own. Stream wrappers preserve the same scope and middleware model while letting subscribers observe the completed response after chunks are collected.

If the framework owns the stream internally, emit explicit start and end lifecycle events around the provider stream and use mark events for retry, queue, and partial-output milestones.

Validate the LLM Wrapper

Run one provider path and check:

The application receives the same provider response as before.
Subscribers see one LLM start event and one matching LLM end event.
The event includes the expected provider name and model_name.
LLM middleware runs exactly once.
Provider-owned clients, streams, and callbacks stay outside emitted JSON payloads.

Common Issues

Check these symptoms first when the workflow does not behave as expected.

The LLM appears outside the request trace: Pass the active scope handle or run the provider call inside the framework request scope.
The model name is missing: Pass model_name from the provider payload, model client, or framework run configuration.
Request middleware receives provider objects: Convert provider payloads into LLMRequest with JSON-compatible content before calling NeMo Relay.
Stream output is incomplete: Use the stream execute helper when NeMo Relay owns the stream boundary, or emit explicit lifecycle events when it does not.

Next Steps

Use these links to continue from this workflow into the next related task.

Add tool integration with Wrap Tool Calls.
Normalize provider payloads with Provider Codecs.
Use Handle Non-Serializable Data for provider clients, streams, and callback objects.