For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
  • About NVIDIA NeMo Relay
    • Overview
    • Architecture
    • Ecosystem
    • Concepts
    • Release Notes
  • Getting Started
    • Agent Runtime Primer
    • Prerequisites
    • Installation
    • Configuration / Setup
    • Quick Start
  • NVIDIA NeMo Relay CLI
    • About
    • Basic Usage
    • Claude Code
    • Codex
    • Cursor
    • Hermes Agent
  • Supported Integrations
    • About
    • OpenClaw Plugin Guide
    • LangChain Integration Guide
    • LangGraph Integration Guide
    • Deep Agents Integration Guide
  • Instrument Applications
    • About
    • Adding Scopes and Marks
    • Instrument a Tool Call
    • Instrument an LLM Call
    • Add Middleware
    • Code Examples
  • Observability Plugin
    • About
    • Configuration
    • Agent Trajectory Interchange Format (ATIF)
    • Agent Trajectory Observability Format (ATOF)
    • OpenTelemetry
    • OpenInference
  • Adaptive Plugin
    • About
    • Configuration
    • Adaptive Cache Governor (ACG)
    • Adaptive Hints
  • NeMo Guardrails Plugin
    • About
    • Configuration
  • Integrate into Frameworks
    • About
    • Adding Scopes
    • Wrap Tool Calls
    • Wrap LLM Calls
    • Handle Non-Serializable Data
    • Using Codecs
    • Provider Codecs
    • Provider Response Codecs
    • Code Examples
  • Build Plugins
    • About
    • Define a Plugin
    • Validate Plugin Configuration
    • Plugin Configuration Files
    • Register Plugin Behavior
    • Design Plugin Configuration
    • NeMo Guardrails Example Plugin
    • Code Examples
  • Contribute
    • About
    • Development Setup
    • Workflow and Reviews
    • Testing and Documentation
  • Reference
    • APIs
      • Python Library Reference
      • Node.js Library Reference
      • Rust Library Reference
        • nemo-relay
          • api
            • event
            • llm
              • CreateLlmHandleParams
              • EndLlmHandleParams
              • LlmAttributes
              • LlmCallEndParams
              • LlmCallExecuteParams
              • LlmCallParams
              • LlmHandle
              • LlmRequest
              • LlmStreamCallExecuteParams
              • llm_call
              • llm_call_end
              • llm_call_execute
              • llm_conditional_execution
              • llm_request_intercepts
              • llm_stream_call_execute
            • registry
            • runtime
            • scope
            • subscriber
            • tool
          • codec
          • config_editor
          • error
          • json
          • observability
          • plugin
          • plugins
          • stream
          • editor_config
        • nemo-relay-adaptive
        • nemo-relay-ffi
    • Performance
  • Resources
    • Support and FAQs
    • Glossary
    • Troubleshooting Guide
    • Community
    • Legal
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogo
On this page
  • Parameters
  • Returns
  • Errors
  • Notes
ReferenceAPIsRust Library Referencenemo-relayapillm

Function llm_stream_call_execute

||View as Markdown|
Previous

Function llm_request_intercepts

Next

Module registry

Generated from cargo doc --no-deps -p nemo-relay -p nemo-relay-adaptive -p nemo-relay-ffi.

pub async fn llm_stream_call_execute(
    params: LlmStreamCallExecuteParams,
) -> Result<LlmJsonStream>

Execute a streaming LLM call through the managed middleware pipeline.

This runs the same pre-execution middleware as llm_call_execute, emits the LLM-start event, and then wraps the provider stream so chunk callbacks and finalization can emit a single LLM-end event when streaming completes.

Parameters

  • name: Logical provider or model family name recorded on emitted events.
  • request: Raw LlmRequest passed into the managed pipeline.
  • func: Streaming provider callback or execution continuation.
  • collector: Per-chunk collector callback used to accumulate stream state.
  • finalizer: Finalizer callback used to construct the completed response.
  • parent: Optional explicit parent scope for the emitted LLM span.
  • attributes: LLM attribute bitflags applied to the managed span.
  • data: Optional application payload stored on the managed LLM handle. It may be used on failure end events that have no output payload.
  • metadata: Optional JSON metadata recorded on emitted events.
  • model_name: Optional normalized model name for observability output.
  • codec: Optional request codec used to produce annotated request data for intercepts and events.
  • response_codec: Optional response codec used to attach annotated response data to the end event.

Returns

A Result containing a boxed stream of JSON chunks.

Errors

Returns FlowError::GuardrailRejected when conditional-execution guardrails block the call, or any error raised by request intercepts, execution intercepts, stream callbacks, codecs, or the provider callback.

Notes

The LLM-start event is emitted before stream execution intercepts run.

The returned stream emits chunk-level results while the runtime defers the LLM-end event until the collector and finalizer complete.