> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/relay/llms.txt.
> For full documentation content, see https://docs.nvidia.com/nemo/relay/llms-full.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/relay/_mcp/server.

# Code Examples

Use these examples when you need the direct runtime surfaces behind the application instrumentation guides.

## Invocation API Selection

The following table shows which API to use based on your integration need:

| Need                                      | Preferred API                                                            | Use When                                                           |
| ----------------------------------------- | ------------------------------------------------------------------------ | ------------------------------------------------------------------ |
| Run a tool with full instrumentation      | `tools.execute`, `toolCallExecute`, `tool_call_execute`                  | Application code owns the callback.                                |
| Run an LLM call with full instrumentation | `llm.execute`, `llmCallExecute`, `llm_call_execute`                      | Application code owns the provider call.                           |
| Run a streaming LLM call                  | `llm_stream_execute`, `typedLlmStreamExecute`, `llm_stream_call_execute` | You need chunk collection and one final aggregate end event.       |
| Emit start/end manually                   | `call` and `call_end` helpers                                            | A framework owns the real invocation boundary.                     |
| Emit a checkpoint                         | `scope.event`, `event`                                                   | You need milestone visibility inside an active scope.              |
| Attach work to one request                | Scope-local registration helpers                                         | Middleware or subscribers should disappear when that scope closes. |

## Manual Tool Lifecycle

Use manual lifecycle calls only when the surrounding code owns the real tool invocation and only exposes reliable start and finish hooks.
If you are replaying events or bridging a framework clock, pass an explicit timestamp to the manual start, end, or mark helpers.
Python accepts timezone-aware `datetime` values, Node.js and WebAssembly accept Unix microseconds since epoch, Rust accepts `DateTime<Utc>`, and Go accepts `time.Time`.

```python
import nemo_relay

handle = nemo_relay.tools.call("search", {"query": "weather"}, data={"attempt": 1})
try:
    result = {"hits": 2}
finally:
    nemo_relay.tools.call_end(handle, result)
```

```ts
import { toolCall, toolCallEnd } from 'nemo-relay-node';

const handle = toolCall('search', { query: 'weather' }, null, null, { attempt: 1 }, null, null);
const result = { hits: 2 };
toolCallEnd(handle, result, null, null);
```

```rust
use nemo_relay::api::tool::{tool_call, tool_call_end, ToolCallEndParams, ToolCallParams};
use serde_json::json;

let handle = tool_call(
    ToolCallParams::builder()
        .name("search")
        .args(json!({"query": "weather"}))
        .data(json!({"attempt": 1}))
        .build(),
)?;

tool_call_end(
    ToolCallEndParams::builder()
        .handle(&handle)
        .result(json!({"hits": 2}))
        .build(),
)?;
```

## Managed LLM Execution

Use managed execution when NeMo Relay should run the full middleware pipeline around the provider call.

```python
import nemo_relay
from nemo_relay import LLMRequest

request = LLMRequest({}, {"messages": [{"role": "user", "content": "hello"}]})

async def invoke(req: LLMRequest):
    return {"text": "hi", "request": req.content}

response = await nemo_relay.llm.execute(
    "demo-provider",
    request,
    invoke,
    model_name="demo-model",
)
```

```ts
import { LlmRequest, llmCallExecute } from 'nemo-relay-node';

const request = new LlmRequest({}, { messages: [{ role: 'user', content: 'hello' }] });

const response = await llmCallExecute(
  'demo-provider',
  request,
  async (req: LlmRequest) => ({ text: 'hi', request: req.content }),
  null,
  null,
  null,
  null,
  'demo-model',
);
```

```rust
use nemo_relay::api::llm::{llm_call_execute, LlmCallExecuteParams, LlmRequest};
use serde_json::json;
use std::sync::Arc;

let request = LlmRequest {
    headers: Default::default(),
    content: json!({"messages": [{"role": "user", "content": "hello"}]}),
};

let response = llm_call_execute(
    LlmCallExecuteParams::builder()
        .name("demo-provider")
        .request(request)
        .func(Arc::new(|req| Box::pin(async move {
            Ok(json!({"text": "hi", "request": req.content}))
        })))
        .model_name("demo-model")
        .build(),
).await?;
```

## Streaming LLM Execution

Use the streaming helper when subscribers need chunk collection plus one final response payload.

```python
from dataclasses import dataclass

from nemo_relay import LLMRequest
from nemo_relay.typed import DataclassCodec, llm_stream_execute

@dataclass
class Chunk:
    delta: str

@dataclass
class FinalResponse:
    text: str

request = LLMRequest({}, {"messages": [{"role": "user", "content": "hello"}]})
collected: list[Chunk] = []

async def stream_impl(_request: LLMRequest):
    yield Chunk(delta="hi")

stream = await llm_stream_execute(
    "demo-provider",
    request,
    stream_impl,
    collector=collected.append,
    finalizer=lambda: FinalResponse(text="".join(chunk.delta for chunk in collected)),
    chunk_json_codec=DataclassCodec(Chunk),
    response_json_codec=DataclassCodec(FinalResponse),
)
```

```ts
import { LlmRequest } from 'nemo-relay-node';
import { typedLlmStreamExecute, type Codec } from 'nemo-relay-node/typed';

type Chunk = { delta: string };
type FinalResponse = { text: string };

const chunkCodec: Codec<Chunk> = {
  toJson: (value) => value,
  fromJson: (json) => json as Chunk,
};
const responseCodec: Codec<FinalResponse> = {
  toJson: (value) => value,
  fromJson: (json) => json as FinalResponse,
};

const request = new LlmRequest({}, { messages: [{ role: 'user', content: 'hello' }] });
const collected: Chunk[] = [];

const stream = await typedLlmStreamExecute(
  'demo-provider',
  request,
  async function* () {
    yield { delta: 'hi' };
  },
  (chunk) => collected.push(chunk),
  () => ({ text: collected.map((chunk) => chunk.delta).join('') }),
  chunkCodec,
  responseCodec,
);
```

```rust
use nemo_relay::api::llm::{
    llm_stream_call_execute, LlmAttributes, LlmRequest, LlmStreamCallExecuteParams,
};
use serde_json::json;

let request = LlmRequest {
    headers: Default::default(),
    content: json!({"messages": [{"role": "user", "content": "hello"}]}),
};

let stream = llm_stream_call_execute(
    LlmStreamCallExecuteParams::builder()
        .name("demo-provider")
        .request(request)
        .func(std::sync::Arc::new(|_req| Box::pin(async move {
            Ok(Box::pin(tokio_stream::iter(vec![Ok(json!({"delta": "hi"}))])))
        })))
        .collector(Box::new(|_chunk| Ok(())))
        .finalizer(Box::new(|| json!({"text": "hi"})))
        .attributes(LlmAttributes::STREAMING)
        .model_name("demo-model")
        .build(),
).await?;
```

## Partial Middleware Calls

These helpers are useful when framework code cannot use managed execution but still wants a request rewrite or block decision.

```python
import nemo_relay
from nemo_relay import LLMRequest

tool_args = nemo_relay.tools.request_intercepts("search", {"query": "weather"})
nemo_relay.tools.conditional_execution("search", tool_args)

llm_request = LLMRequest({}, {"messages": [{"role": "user", "content": "hello"}]})
llm_request = nemo_relay.llm.request_intercepts("demo-provider", llm_request)
nemo_relay.llm.conditional_execution(llm_request)
```

```ts
import {
  LlmRequest,
  llmConditionalExecution,
  llmRequestIntercepts,
  toolConditionalExecution,
  toolRequestIntercepts,
} from 'nemo-relay-node';

const toolArgs = await toolRequestIntercepts('search', { query: 'weather' });
await toolConditionalExecution('search', toolArgs);

const request = new LlmRequest({}, { messages: [{ role: 'user', content: 'hello' }] });
const rewritten = await llmRequestIntercepts('demo-provider', request);
await llmConditionalExecution(rewritten);
```

```rust
use nemo_relay::api::llm::{llm_conditional_execution, llm_request_intercepts, LlmRequest};
use nemo_relay::api::tool::{tool_conditional_execution, tool_request_intercepts};
use serde_json::json;

let tool_args = tool_request_intercepts("search", json!({"query": "weather"}))?;
tool_conditional_execution("search", &tool_args)?;

let request = LlmRequest {
    headers: Default::default(),
    content: json!({"messages": [{"role": "user", "content": "hello"}]}),
};
let rewritten = llm_request_intercepts("demo-provider", request)?;
llm_conditional_execution(&rewritten)?;
```

## Scope and Context Helpers

Use normal scope helpers first. Reach for explicit stack helpers only when work crosses thread, task, worker, or request boundaries.

```python
from concurrent.futures import ThreadPoolExecutor

import nemo_relay

with nemo_relay.scope.scope("request", nemo_relay.ScopeType.Agent):
    nemo_relay.scope.event("started", data={"ok": True})
    shared = nemo_relay.propagate_scope_to_thread()

    def worker() -> None:
        nemo_relay.set_thread_scope_stack(shared)
        nemo_relay.scope.event("worker-ran")

    with ThreadPoolExecutor() as pool:
        pool.submit(worker).result()
```

```ts
import { ScopeType, createScopeStack, event, setThreadScopeStack, withScope } from 'nemo-relay-node';

const workerStack = createScopeStack();
setThreadScopeStack(workerStack);

await withScope('request', ScopeType.Agent, async (handle) => {
  event('started', handle, { ok: true }, null);
});
```

```rust
use nemo_relay::api::runtime::{create_scope_stack, set_thread_scope_stack, TASK_SCOPE_STACK};
use nemo_relay::api::scope::{event, EmitMarkEventParams};
use serde_json::json;

let stack = create_scope_stack();
TASK_SCOPE_STACK
    .scope(stack.clone(), async {
        event(EmitMarkEventParams::builder().name("started").data(json!({"ok": true})).build())
    })
    .await?;

std::thread::spawn(move || {
    set_thread_scope_stack(stack);
    // NeMo Relay calls in this thread attach to the same explicit stack.
})
.join()
.unwrap();
```

## Middleware Registration Families

The runtime exposes the same registration families for tool and LLM calls:

* Sanitize-request guardrails change emitted start-event payloads only
* Sanitize-response guardrails change emitted end-event payloads only
* Conditional-execution guardrails return an allow-or-block decision
* Request intercepts change the real request before execution
* Execution intercepts wrap the callback and may post-process or short-circuit
* LLM stream execution intercepts wrap streaming provider callbacks

Every family also has a scope-local surface:

* Python: `nemo_relay.scope_local.register_*`
* Node.js: `scopeRegister*`
* Rust: middleware `scope_register_*` functions under
  `nemo_relay::api::registry`; subscriber scope registration under
  `nemo_relay::api::subscriber`

Use [Add Middleware](/instrument-applications/advanced-guide) for an end-to-end policy example and [API Reference](/reference/api) for symbol-level details.