> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/relay/llms.txt.
> For full documentation content, see https://docs.nvidia.com/nemo/relay/llms-full.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/relay/_mcp/server.

# Wrap LLM Calls

Use this guide when a framework, SDK, or provider adapter owns model invocation and you need NeMo Relay to observe and control those provider calls.

## What You Build

You will place a managed NeMo Relay LLM execution wrapper at the provider boundary. The wrapper emits LLM lifecycle events, runs LLM middleware, attaches the call to the active scope, records the `model_name`, and returns the provider response to the framework.

## Before You Start

You need:

* A framework request or run scope. If the framework does not create one yet, start with [Adding Scopes](/integrate-into-frameworks/adding-scopes).
* A stable model-provider boundary, such as a provider adapter or client dispatch method.
* A JSON-compatible request projection inside `LLMRequest`.
* A JSON-compatible response projection for subscribers and exporters.

## Integration Pattern

Follow this sequence to keep framework work attached to the expected runtime context.

1. Enter or inherit the active framework scope.
2. Convert the framework provider payload into `LLMRequest`.
3. Route the real provider callback through the managed LLM execute helper.
4. Pass a stable provider name and `model_name`.
5. Keep provider clients, streams, callbacks, and retry state outside emitted JSON payloads.

Use a request or response codec when provider payloads need normalization before middleware or events see them. Use [Provider Codecs](/integrate-into-frameworks/provider-codecs) for those cases.

## Concrete LLM Example

The examples below wrap one provider call and attach it to the active parent scope.

```python
from typing import TypedDict

import nemo_relay
from nemo_relay import LLMRequest

class LlmResponse(TypedDict):
    text: str
    request: object

async def framework_llm(provider_name: str, payload: object) -> LlmResponse:
    parent = nemo_relay.scope.get_handle()
    request = LLMRequest({}, payload)

    async def invoke(req: LLMRequest) -> LlmResponse:
        return {"text": "hi", "request": req.content}

    return await nemo_relay.llm.execute(
        provider_name,
        request,
        invoke,
        handle=parent,
        model_name="demo-model",
    )
```

```ts
import { getHandle, LlmRequest, llmCallExecute, type ScopeHandle } from 'nemo-relay-node';

type LlmResponse = { text: string; request: unknown };

export async function frameworkLlm(providerName: string, payload: unknown): Promise<LlmResponse> {
  const parent: ScopeHandle = getHandle();
  const request = new LlmRequest({}, payload);

  return await llmCallExecute(
    providerName,
    request,
    async (req: LlmRequest): Promise<LlmResponse> => ({ text: 'hi', request: req.content }),
    parent,
    null,
    null,
    null,
    'demo-model',
  ) as Promise<LlmResponse>;
}
```

```rust
use nemo_relay::api::llm::{llm_call_execute, LlmCallExecuteParams, LlmRequest};
use nemo_relay::api::scope::get_handle;
use serde_json::json;
use std::sync::Arc;

async fn run_provider_call() -> anyhow::Result<serde_json::Value> {
    let parent = get_handle()?;
    let request = LlmRequest {
        headers: Default::default(),
        content: json!({"messages": [{"role": "user", "content": "hello"}]}),
    };

    let response = llm_call_execute(
        LlmCallExecuteParams::builder()
            .name("demo-provider")
            .request(request)
            .func(Arc::new(|req| Box::pin(async move {
                Ok(json!({"text": "hi", "request": req.content}))
            })))
            .parent(parent)
            .model_name("demo-model")
            .build(),
    )
    .await?;

    Ok(response)
}
```

## Streaming Providers

Use the LLM stream execute helper when the framework exposes a stream boundary that NeMo Relay can own. Stream wrappers preserve the same scope and middleware model while letting subscribers observe the completed response after chunks are collected.

If the framework owns the stream internally, emit explicit start and end lifecycle events around the provider stream and use mark events for retry, queue, and partial-output milestones.

## Validate the LLM Wrapper

Run one provider path and check:

* The application receives the same provider response as before.
* Subscribers see one LLM start event and one matching LLM end event.
* The event includes the expected provider name and `model_name`.
* LLM middleware runs exactly once.
* Provider-owned clients, streams, and callbacks stay outside emitted JSON payloads.

## Common Issues

Check these symptoms first when the workflow does not behave as expected.

* **The LLM appears outside the request trace**: Pass the active scope handle or run the provider call inside the framework request scope.
* **The model name is missing**: Pass `model_name` from the provider payload, model client, or framework run configuration.
* **Request middleware receives provider objects**: Convert provider payloads into `LLMRequest` with JSON-compatible content before calling NeMo Relay.
* **Stream output is incomplete**: Use the stream execute helper when NeMo Relay owns the stream boundary, or emit explicit lifecycle events when it does not.

## Next Steps

Use these links to continue from this workflow into the next related task.

* Add tool integration with [Wrap Tool Calls](/integrate-into-frameworks/wrap-tool-calls).
* Normalize provider payloads with [Provider Codecs](/integrate-into-frameworks/provider-codecs).
* Use [Handle Non-Serializable Data](/integrate-into-frameworks/non-serializable-data) for provider clients, streams, and callback objects.