Instrument an LLM Call

Use this guide when you own the model-provider callback and want NeMo Relay to emit lifecycle events, apply LLM middleware, and preserve the active agent scope around the call.

What You Build

You will wrap one existing LLM provider invocation with the managed LLM execution API. The result is an LLM call that:

Receives an LLM request object such as LLMRequest in Python or LlmRequest in Node.js and Rust.
Runs LLM request intercepts, guardrails, execution intercepts, and response guardrails.
Emits LLM start and LLM end events.
Records model metadata for observability and trajectory export.
Keeps the LLM span attached to the current agent or request scope.
Returns the original provider result to the application.

Before You Start

Complete one binding Quick Start guide first:

Create a scope for the active request or agent run before adding LLM instrumentation. If you have not done that yet, start with Adding Scopes and Marks.

The request and response payloads must be JSON-compatible. If your provider SDK uses clients, streams, callbacks, or other opaque objects, keep those objects in the provider callback and pass only a serializable request projection into NeMo Relay.

For every managed LLM request, Relay automatically propagates agent lineage to Dynamo using x-dynamo-session-id and, when a parent agent scope is active, x-dynamo-parent-session-id. The current and parent IDs come from the most recent explicit Agent scopes in the active scope stack and ignores the implicit root scope. When present, Relay uses the harness session metadata, otherwise, it uses application scope names. No plugin or configuration is required.

If you want Relay to add cost estimates, initialize the built-in pricing plugin before the LLM call and attach a response codec that decodes model and token usage from the provider response. Provider- or framework-reported cost is preserved when present. Otherwise Relay estimates cost only when a configured model pricing source matches the response model and usage fields. For catalog setup and embedded plugin examples, refer to Provider Response Codecs and Model Pricing.

Integration Pattern

Follow these steps to route the provider invocation through NeMo Relay:

Identify the stable provider invocation boundary in your application.
Create or inherit a scope for the current agent run, request, or workflow.
Register a temporary subscriber while validating the integration.
Build an LLM request object with provider headers and content.
Replace the direct provider invocation with the managed LLM execute helper.
Pass the active scope handle and a stable model_name.
Attach a response codec when subscribers or exporters need normalized response usage, tool calls, or cost annotations.
Check that the provider result is unchanged and lifecycle events are emitted.

Minimal Example

The examples below wrap a demo provider callback and print emitted events.

Python

Node.js

Rust

1 import asyncio
2 
3 import nemo_relay
4 
5 def log_event(event) -> None:
6     print(f"{event.kind} {event.name}")
7 
8 async def call_provider(request: nemo_relay.LLMRequest):
9     return {
10         "text": "hello",
11         "messages": request.content["messages"],
12     }
13 
14 async def main() -> None:
15     nemo_relay.subscribers.register("llm-check", log_event)
16 
17     try:
18         with nemo_relay.scope.scope("agent-run", nemo_relay.ScopeType.Agent) as handle:
19             request = nemo_relay.LLMRequest(
20                 {},
21                 {"messages": [{"role": "user", "content": "hello"}]},
22             )
23             result = await nemo_relay.llm.execute(
24                 "demo-provider",
25                 request,
26                 call_provider,
27                 handle=handle,
28                 model_name="demo-model",
29             )
30             print(result)
31     finally:
32         nemo_relay.subscribers.flush()
33         nemo_relay.subscribers.deregister("llm-check")
34 
35 asyncio.run(main())

Validate the Integration

Check both behavior and instrumentation:

The provider result matches what the application returned before the wrapper was added.
The subscriber prints an agent or request scope event.
The subscriber prints LLM start and LLM end events for demo-provider.
If model pricing is configured, LLM end events include annotated_response.usage.cost only when a response codec decoded model and usage fields and a source matched the model.

Native subscriber delivery is asynchronous. Flush subscribers before validating printed output. In Node.js, also wait one event-loop tick after flushSubscribers() so JavaScript callbacks can run.

LLM start input contains the request after request intercepts and sanitize-request guardrails.
LLM end output contains the provider response after response guardrails.
The LLM event includes the normalized model_name when you provide one.

If only the business result appears, the callback ran but instrumentation did not run. Confirm that the call goes through llm.execute, llmCallExecute, or llm_call_execute.

Production Checklist

Before deploying to production, ensure the following checklist is completed:

Keep provider names stable. Subscribers and exporters use names for filtering and dashboards.
Pass model_name separately when the model should be easy to filter or export.
Keep request and response payloads JSON-compatible.
Keep SDK clients and transport objects inside the provider callback.
Use request codecs when request intercepts or request-side middleware need normalized provider request semantics.
Use response codecs when LLM end events, subscribers, or exporters need normalized provider response annotations.
Use response codecs and the pricing plugin when exporters need cost estimates from model pricing.
Use sanitize guardrails before exporting prompts or model responses in production.

Common Issues

Check these symptoms first when the workflow does not behave as expected.

No LLM events appear: The application is still calling the provider directly.
The LLM appears outside the agent scope: Pass the current scope handle into the managed execute helper.
Middleware sees provider-specific shapes: Add a codec so request intercepts can work with normalized annotated data.
Sensitive prompt data appears in traces: Add LLM sanitize-request and sanitize-response guardrails before registering production exporters.

Next Steps

Use these links to continue from this workflow into the next related task.

Instrument tools with Instrument a Tool Call.
Add policy or transformation with Add Middleware.
Use Provider Codecs when request intercepts need normalized LLM request data or downstream consumers need normalized response annotations.
Export events with Observability.