For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
  • About NVIDIA NeMo Relay
    • Overview
    • Architecture
    • Ecosystem
    • Concepts
    • Release Notes
  • Getting Started
    • Agent Runtime Primer
    • Prerequisites
    • Installation
    • Configuration / Setup
    • Quick Start
  • NVIDIA NeMo Relay CLI
    • About
    • Basic Usage
    • Claude Code
    • Codex
    • Cursor
    • Hermes Agent
  • Supported Integrations
    • About
    • OpenClaw Plugin Guide
    • LangChain Integration Guide
    • LangGraph Integration Guide
    • Deep Agents Integration Guide
  • Instrument Applications
    • About
    • Adding Scopes and Marks
    • Instrument a Tool Call
    • Instrument an LLM Call
    • Add Middleware
    • Code Examples
  • Observability Plugin
    • About
    • Configuration
    • Agent Trajectory Interchange Format (ATIF)
    • Agent Trajectory Observability Format (ATOF)
    • OpenTelemetry
    • OpenInference
  • Adaptive Plugin
    • About
    • Configuration
    • Adaptive Cache Governor (ACG)
    • Adaptive Hints
  • NeMo Guardrails Plugin
    • About
    • Configuration
  • Integrate into Frameworks
    • About
    • Adding Scopes
    • Wrap Tool Calls
    • Wrap LLM Calls
    • Handle Non-Serializable Data
    • Using Codecs
    • Provider Codecs
    • Provider Response Codecs
    • Code Examples
  • Build Plugins
    • About
    • Define a Plugin
    • Validate Plugin Configuration
    • Plugin Configuration Files
    • Register Plugin Behavior
    • Design Plugin Configuration
    • NeMo Guardrails Example Plugin
    • Code Examples
  • Contribute
    • About
    • Development Setup
    • Workflow and Reviews
    • Testing and Documentation
  • Reference
    • APIs
    • Performance
  • Resources
    • Support and FAQs
    • Glossary
    • Troubleshooting Guide
    • Community
    • Legal
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogo
On this page
  • What You Build
  • Before You Start
  • What Response Codecs Decode
  • Built-in Response Codecs
  • Attach a Built-in Response Codec
  • Read Annotated Responses
  • Custom Response Codecs
  • Streaming Responses
  • Validation Checklist
  • Common Issues
  • Next Steps
Integrate into Frameworks

Provider Response Codecs

||View as Markdown|
Previous

Provider Codecs

Next

Code Examples

Use this guide when subscribers, exporters, or diagnostics need a provider-neutral view of raw LLM responses.

What You Build

You will attach a response codec to a managed LLM wrapper so NeMo Relay can decode provider responses into AnnotatedLLMResponse data for LLM end events.

Response codecs are observability-only:

  • They do not rewrite the value returned to the application.
  • They do not run response middleware.
  • They attach normalized response data to lifecycle events for subscribers and exporters.
  • Decode failures are non-fatal; the LLM call still returns the provider response and the end event is emitted without an annotation.

Before You Start

You need:

  • A managed LLM boundary from Wrap LLM Calls.
  • A raw provider response that is JSON-compatible.
  • A built-in response codec or a custom response codec for the provider response shape.
  • A subscriber or exporter that consumes annotated_response from LLM end events.

What Response Codecs Decode

Response codecs normalize provider output into fields that subscribers can inspect consistently:

FieldPurpose
idProvider response identifier.
modelModel that served the request, when the provider returns it.
messagePrimary assistant message content.
tool_callsTool calls requested by the model.
finish_reasonNormalized completion reason, such as complete, length, tool_use, or content_filter.
usageToken accounting, including cache-read and cache-write counts when available.
api_specificProvider-specific fields that do not fit the common model.
extraAdditional unmodeled response fields.

Use these annotations for observability, export, and debugging. Keep business logic that changes the caller-visible response in the framework or provider adapter, not in the response codec.

Built-in Response Codecs

The built-in provider codecs also implement response decoding:

  • OpenAIChatCodec
  • OpenAIResponsesCodec
  • AnthropicMessagesCodec

Choose the codec that matches the actual provider response shape. For example, do not use OpenAIChatCodec for an OpenAI Responses API payload only because both came from an OpenAI-compatible provider.

Attach a Built-in Response Codec

The examples below attach built-in response codecs for supported provider response shapes.

Python
Node.js
Rust
1import nemo_relay
2from nemo_relay import LLMRequest
3from nemo_relay.codecs import OpenAIChatCodec
4
5async def invoke_provider(request: LLMRequest):
6 return {
7 "id": "chatcmpl-demo",
8 "model": request.content["model"],
9 "choices": [
10 {
11 "finish_reason": "stop",
12 "message": {"role": "assistant", "content": "Hello from the provider."},
13 }
14 ],
15 "usage": {"prompt_tokens": 8, "completion_tokens": 5, "total_tokens": 13},
16 }
17
18codec = OpenAIChatCodec()
19response = await nemo_relay.llm.execute(
20 "openai-chat",
21 LLMRequest({}, {"model": "gpt-4o-mini", "messages": []}),
22 invoke_provider,
23 model_name="gpt-4o-mini",
24 response_codec=codec,
25)

Read Annotated Responses

Subscribers can inspect annotated_response on LLM end events. The exact event category fields are binding-provided, so defensive checks should confirm the annotation exists before reading it.

Python
Node.js
1import nemo_relay
2
3def on_event(event):
4 annotated = getattr(event, "annotated_response", None)
5 if annotated is None:
6 return
7
8 print("model", annotated.model)
9 print("text", annotated.response_text())
10 print("usage", annotated.usage)
11
12nemo_relay.subscribers.register("response-debugger", on_event)

Custom Response Codecs

Use a custom response codec when the provider or framework response does not match a built-in shape.

In Python, a custom response codec can route to built-in codecs and return their native AnnotatedLLMResponse values:

1from nemo_relay.codecs import OpenAIChatCodec, OpenAIResponsesCodec
2
3class OpenAIRoutingResponseCodec:
4 def __init__(self):
5 self.chat = OpenAIChatCodec()
6 self.responses = OpenAIResponsesCodec()
7
8 def decode_response(self, response):
9 if response.get("object") == "response":
10 return self.responses.decode_response(response)
11 return self.chat.decode_response(response)

In Node.js, implement decodeResponse and return the normalized response JSON shape:

1import type { JsonValue, LlmResponseCodec } from 'nemo-relay-node/typed';
2
3const frameworkResponseCodec: LlmResponseCodec = {
4 decodeResponse(response: JsonValue): JsonValue {
5 const raw = response as {
6 id?: string;
7 model_name?: string;
8 text?: string;
9 stop_reason?: string;
10 token_usage?: {
11 input?: number;
12 output?: number;
13 };
14 };
15
16 return {
17 id: raw.id ?? null,
18 model: raw.model_name ?? null,
19 message: raw.text ?? '',
20 finish_reason: raw.stop_reason === 'max_tokens' ? 'length' : 'complete',
21 usage: {
22 prompt_tokens: raw.token_usage?.input ?? null,
23 completion_tokens: raw.token_usage?.output ?? null,
24 total_tokens:
25 raw.token_usage?.input === undefined || raw.token_usage?.output === undefined
26 ? null
27 : raw.token_usage.input + raw.token_usage.output,
28 },
29 provider_stop_reason: raw.stop_reason ?? null,
30 };
31 },
32};

In Rust, implement LlmResponseCodec directly:

1use nemo_relay::codec::request::MessageContent;
2use nemo_relay::codec::response::{AnnotatedLlmResponse, FinishReason, Usage};
3use nemo_relay::codec::traits::LlmResponseCodec;
4use nemo_relay::error::{FlowError, Result};
5use serde::Deserialize;
6use serde_json::{Map, Value as Json};
7
8#[derive(Deserialize)]
9struct FrameworkResponse {
10 id: Option<String>,
11 model_name: Option<String>,
12 text: Option<String>,
13 input_tokens: Option<u64>,
14 output_tokens: Option<u64>,
15}
16
17struct FrameworkResponseCodec;
18
19impl LlmResponseCodec for FrameworkResponseCodec {
20 fn decode_response(&self, response: &Json) -> Result<AnnotatedLlmResponse> {
21 let raw: FrameworkResponse = serde_json::from_value(response.clone())
22 .map_err(|error| FlowError::Internal(error.to_string()))?;
23 let total_tokens = match (raw.input_tokens, raw.output_tokens) {
24 (Some(input), Some(output)) => Some(input + output),
25 _ => None,
26 };
27
28 Ok(AnnotatedLlmResponse {
29 id: raw.id,
30 model: raw.model_name,
31 message: raw.text.map(MessageContent::Text),
32 tool_calls: None,
33 finish_reason: Some(FinishReason::Complete),
34 usage: Some(Usage {
35 prompt_tokens: raw.input_tokens,
36 completion_tokens: raw.output_tokens,
37 total_tokens,
38 cache_read_tokens: None,
39 cache_write_tokens: None,
40 }),
41 api_specific: None,
42 extra: Map::new(),
43 })
44 }
45}

Streaming Responses

Streaming LLM wrappers decode the aggregated response produced by the stream finalizer. The response codec does not see each token or chunk. Use stream collectors for chunk-level behavior, and use response codecs for the final normalized end-event annotation.

Validation Checklist

Use this checklist to confirm the implementation preserves the expected runtime contract.

  • The response codec matches the actual provider response shape.
  • decode_response returns a normalized response with safe, JSON-compatible fields.
  • The provider response returned to the application is unchanged.
  • Subscribers see annotated_response only on LLM end events where decode succeeds.
  • Decode errors are tested and do not break the LLM call.
  • Streaming finalizers produce the same shape the response codec expects.

Common Issues

Check these symptoms first when the workflow does not behave as expected.

  • No annotation appears: The response codec returned an error or the raw provider response did not match the codec.
  • Returned response changed unexpectedly: Response codecs are not the right place to mutate caller-visible output.
  • Tool calls are missing: The codec did not map the provider’s tool-call structure into tool_calls.
  • Usage is inconsistent across providers: Normalize known token fields and preserve provider-specific usage details in api_specific or extra.

Next Steps

Use these links to continue from this workflow into the next related task.

  • Use Provider Codecs for request-side provider codecs and full request/response examples.
  • Use Wrap LLM Calls to add the managed LLM boundary first.
  • Use Observability after annotations are visible in local subscribers.