For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
  • About NVIDIA NeMo Relay
    • Overview
    • Architecture
    • Ecosystem
    • Concepts
    • Release Notes
  • Getting Started
    • Agent Runtime Primer
    • Prerequisites
    • Installation
    • Configuration / Setup
    • Quick Start
  • NVIDIA NeMo Relay CLI
    • About
    • Basic Usage
    • Claude Code
    • Codex
    • Cursor
    • Hermes Agent
  • Supported Integrations
    • About
    • OpenClaw Plugin Guide
    • LangChain Integration Guide
    • LangGraph Integration Guide
    • Deep Agents Integration Guide
  • Instrument Applications
    • About
    • Adding Scopes and Marks
    • Instrument a Tool Call
    • Instrument an LLM Call
    • Add Middleware
    • Code Examples
  • Observability Plugin
    • About
    • Configuration
    • Agent Trajectory Interchange Format (ATIF)
    • Agent Trajectory Observability Format (ATOF)
    • OpenTelemetry
    • OpenInference
  • Adaptive Plugin
    • About
    • Configuration
    • Adaptive Cache Governor (ACG)
    • Adaptive Hints
  • NeMo Guardrails Plugin
    • About
    • Configuration
  • Integrate into Frameworks
    • About
    • Adding Scopes
    • Wrap Tool Calls
    • Wrap LLM Calls
    • Handle Non-Serializable Data
    • Using Codecs
    • Provider Codecs
    • Provider Response Codecs
    • Code Examples
  • Build Plugins
    • About
    • Define a Plugin
    • Validate Plugin Configuration
    • Plugin Configuration Files
    • Register Plugin Behavior
    • Design Plugin Configuration
    • NeMo Guardrails Example Plugin
    • Code Examples
  • Contribute
    • About
    • Development Setup
    • Workflow and Reviews
    • Testing and Documentation
  • Reference
    • APIs
      • Python Library Reference
      • Node.js Library Reference
      • Rust Library Reference
        • nemo-relay
          • api
          • codec
          • config_editor
          • error
          • json
          • observability
          • plugin
          • plugins
          • stream
            • LlmStreamWrapper
          • editor_config
        • nemo-relay-adaptive
        • nemo-relay-ffi
    • Performance
  • Resources
    • Support and FAQs
    • Glossary
    • Troubleshooting Guide
    • Community
    • Legal
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogo
On this page
  • Implementations
  • impl LlmStreamWrapper
  • new
  • Parameters
  • Returns
  • scope_stack
  • Returns
  • Trait Implementations
  • impl Drop for LlmStreamWrapper
  • drop
  • impl Stream for LlmStreamWrapper
  • Item
  • poll_next
  • size_hint
ReferenceAPIsRust Library Referencenemo-relaystream

Struct LlmStream Wrapper

||View as Markdown|
Previous

Module stream

Next

Macro editor_config

Generated from cargo doc --no-deps -p nemo-relay -p nemo-relay-adaptive -p nemo-relay-ffi.

1pub struct LlmStreamWrapper { /* private fields */ }

Wraps an inner Stream<Item = Result<Json>> of raw chunks and:

  1. Passes each chunk to the user-supplied collector closure. If the collector returns Err, the stream terminates with that error.
  2. On stream exhaustion, calls the finalizer to produce an aggregated Json response, runs sanitize response guardrails on it, then emits the LLM END event.

This type is returned by crate::api::llm::llm_stream_call_execute and is usually consumed as an ordinary async stream. The wrapper preserves the originating scope stack so end-of-stream bookkeeping still uses the correct scope-local middleware and subscribers even when polling happens elsewhere.

Implementations

impl LlmStreamWrapper

impl LlmStreamWrapper

new

pub fn new(
    inner: Pin<Box<dyn Stream<Item = Result<Json>> + Send>>,
    handle: LlmHandle,
    collector: Box<dyn FnMut(Json) -> Result<()> + Send>,
    finalizer: Box<dyn FnOnce() -> Json + Send>,
    _data: Option<Json>,
    metadata: Option<Json>,
    response_codec: Option<Arc<dyn LlmResponseCodec>>,
) -> Self

Create a new LlmStreamWrapper around the given raw stream.

Captures the current ScopeStackHandle at creation time so the correct scope stack is used when the stream is later polled, even if polling happens on a different task or thread.

Parameters
  • inner: Raw stream of JSON chunks from the provider callback.
  • handle: LlmHandle identifying the managed LLM span.
  • collector: Per-chunk callback used to accumulate stream state or forward chunks elsewhere. Returning Err terminates the stream.
  • finalizer: One-shot callback invoked when the stream finishes to synthesize the aggregated response payload.
  • data: Retained compatibility payload; Agent Trajectory Observability Format (ATOF) end data is the finalized response.
  • metadata: Optional event metadata merged into the emitted LLM-end event.
  • response_codec: Optional codec used to derive annotated response metadata from the aggregated final payload.
Returns

A new LlmStreamWrapper ready to be polled.

scope_stack

pub fn scope_stack(&self) -> &ScopeStackHandle

Return the captured scope stack handle for this stream.

Callers can use this to bind the correct scope stack when spawning the stream on a different task via TASK_SCOPE_STACK.scope(...).

Returns

A shared reference to the ScopeStackHandle captured when the stream wrapper was created.

Trait Implementations

impl Drop for LlmStreamWrapper

impl Drop for LlmStreamWrapper

drop

fn drop(&mut self)

impl Stream for LlmStreamWrapper

impl Stream for LlmStreamWrapper

Item

type Item = Result<Value, FlowError>

poll_next

fn poll_next(
    self: Pin<&mut Self>,
    cx: &mut Context<'_>,
) -> Poll<Option<Self::Item>>

size_hint

fn size_hint(&self) -> (usize, Option<usize>)