Provider Response Codecs

Use this guide when subscribers, exporters, or diagnostics need a provider-neutral view of raw LLM responses.

What You Build

You will attach a response codec to a managed LLM wrapper so NeMo Relay can decode provider responses into AnnotatedLLMResponse data for LLM end events.

Response codecs are observability-only:

They do not rewrite the value returned to the application.
They do not run response middleware.
They attach normalized response data to lifecycle events for subscribers and exporters.
Decode failures are non-fatal; the LLM call still returns the provider response and the end event is emitted without an annotation.

Before You Start

You need:

A managed LLM boundary from Wrap LLM Calls.
A raw provider response that is JSON-compatible.
A built-in response codec or a custom response codec for the provider response shape.
A subscriber or exporter that consumes annotated_response from LLM end events.

What Response Codecs Decode

Response codecs normalize provider output into fields that subscribers can inspect consistently:

Field	Purpose
`id`	Provider response identifier.
`model`	Model that served the request, when the provider returns it.
`message`	Primary assistant message content.
`tool_calls`	Tool calls requested by the model.
`finish_reason`	Normalized completion reason, such as `complete`, `length`, `tool_use`, or `content_filter`.
`usage`	Token accounting, including cache-read and cache-write counts when available.
`api_specific`	Provider-specific fields that do not fit the common model.
`extra`	Additional unmodeled response fields.

Use these annotations for observability, export, and debugging. Keep business logic that changes the caller-visible response in the framework or provider adapter, not in the response codec.

Built-in Response Codecs

The built-in provider codecs also implement response decoding:

OpenAIChatCodec
OpenAIResponsesCodec
AnthropicMessagesCodec

Choose the codec that matches the actual provider response shape. For example, do not use OpenAIChatCodec for an OpenAI Responses API payload only because both came from an OpenAI-compatible provider.

Attach a Built-in Response Codec

The examples below attach built-in response codecs for supported provider response shapes.

Python

Node.js

Rust

1 import nemo_relay
2 from nemo_relay import LLMRequest
3 from nemo_relay.codecs import OpenAIChatCodec
4 
5 async def invoke_provider(request: LLMRequest):
6     return {
7         "id": "chatcmpl-demo",
8         "model": request.content["model"],
9         "choices": [
10             {
11                 "finish_reason": "stop",
12                 "message": {"role": "assistant", "content": "Hello from the provider."},
13             }
14         ],
15         "usage": {"prompt_tokens": 8, "completion_tokens": 5, "total_tokens": 13},
16     }
17 
18 codec = OpenAIChatCodec()
19 response = await nemo_relay.llm.execute(
20     "openai-chat",
21     LLMRequest({}, {"model": "gpt-4o-mini", "messages": []}),
22     invoke_provider,
23     model_name="gpt-4o-mini",
24     response_codec=codec,
25 )

Read Annotated Responses

Subscribers can inspect annotated_response on LLM end events. The exact event category fields are binding-provided, so defensive checks should confirm the annotation exists before reading it.

Python

Node.js

1 import nemo_relay
2 
3 def on_event(event):
4     annotated = getattr(event, "annotated_response", None)
5     if annotated is None:
6         return
7 
8     print("model", annotated.model)
9     print("text", annotated.response_text())
10     print("usage", annotated.usage)
11 
12 nemo_relay.subscribers.register("response-debugger", on_event)

Custom Response Codecs

Use a custom response codec when the provider or framework response does not match a built-in shape.

In Python, a custom response codec can route to built-in codecs and return their native AnnotatedLLMResponse values:

1 from nemo_relay.codecs import OpenAIChatCodec, OpenAIResponsesCodec
2 
3 class OpenAIRoutingResponseCodec:
4     def __init__(self):
5         self.chat = OpenAIChatCodec()
6         self.responses = OpenAIResponsesCodec()
7 
8     def decode_response(self, response):
9         if response.get("object") == "response":
10             return self.responses.decode_response(response)
11         return self.chat.decode_response(response)

In Node.js, implement decodeResponse and return the normalized response JSON shape:

1 import type { JsonValue, LlmResponseCodec } from 'nemo-relay-node/typed';
2 
3 const frameworkResponseCodec: LlmResponseCodec = {
4   decodeResponse(response: JsonValue): JsonValue {
5     const raw = response as {
6       id?: string;
7       model_name?: string;
8       text?: string;
9       stop_reason?: string;
10       token_usage?: {
11         input?: number;
12         output?: number;
13       };
14     };
15 
16     return {
17       id: raw.id ?? null,
18       model: raw.model_name ?? null,
19       message: raw.text ?? '',
20       finish_reason: raw.stop_reason === 'max_tokens' ? 'length' : 'complete',
21       usage: {
22         prompt_tokens: raw.token_usage?.input ?? null,
23         completion_tokens: raw.token_usage?.output ?? null,
24         total_tokens:
25           raw.token_usage?.input === undefined || raw.token_usage?.output === undefined
26             ? null
27             : raw.token_usage.input + raw.token_usage.output,
28       },
29       provider_stop_reason: raw.stop_reason ?? null,
30     };
31   },
32 };

In Rust, implement LlmResponseCodec directly:

1 use nemo_relay::codec::request::MessageContent;
2 use nemo_relay::codec::response::{AnnotatedLlmResponse, FinishReason, Usage};
3 use nemo_relay::codec::traits::LlmResponseCodec;
4 use nemo_relay::error::{FlowError, Result};
5 use serde::Deserialize;
6 use serde_json::{Map, Value as Json};
7 
8 #[derive(Deserialize)]
9 struct FrameworkResponse {
10     id: Option<String>,
11     model_name: Option<String>,
12     text: Option<String>,
13     input_tokens: Option<u64>,
14     output_tokens: Option<u64>,
15 }
16 
17 struct FrameworkResponseCodec;
18 
19 impl LlmResponseCodec for FrameworkResponseCodec {
20     fn decode_response(&self, response: &Json) -> Result<AnnotatedLlmResponse> {
21         let raw: FrameworkResponse = serde_json::from_value(response.clone())
22             .map_err(|error| FlowError::Internal(error.to_string()))?;
23         let total_tokens = match (raw.input_tokens, raw.output_tokens) {
24             (Some(input), Some(output)) => Some(input + output),
25             _ => None,
26         };
27 
28         Ok(AnnotatedLlmResponse {
29             id: raw.id,
30             model: raw.model_name,
31             message: raw.text.map(MessageContent::Text),
32             tool_calls: None,
33             finish_reason: Some(FinishReason::Complete),
34             usage: Some(Usage {
35                 prompt_tokens: raw.input_tokens,
36                 completion_tokens: raw.output_tokens,
37                 total_tokens,
38                 cache_read_tokens: None,
39                 cache_write_tokens: None,
40             }),
41             api_specific: None,
42             extra: Map::new(),
43         })
44     }
45 }

Streaming Responses

Streaming LLM wrappers decode the aggregated response produced by the stream finalizer. The response codec does not see each token or chunk. Use stream collectors for chunk-level behavior, and use response codecs for the final normalized end-event annotation.

Validation Checklist

Use this checklist to confirm the implementation preserves the expected runtime contract.

The response codec matches the actual provider response shape.
decode_response returns a normalized response with safe, JSON-compatible fields.
The provider response returned to the application is unchanged.
Subscribers see annotated_response only on LLM end events where decode succeeds.
Decode errors are tested and do not break the LLM call.
Streaming finalizers produce the same shape the response codec expects.

Common Issues

Check these symptoms first when the workflow does not behave as expected.

No annotation appears: The response codec returned an error or the raw provider response did not match the codec.
Returned response changed unexpectedly: Response codecs are not the right place to mutate caller-visible output.
Tool calls are missing: The codec did not map the provider’s tool-call structure into tool_calls.
Usage is inconsistent across providers: Normalize known token fields and preserve provider-specific usage details in api_specific or extra.

Next Steps

Use these links to continue from this workflow into the next related task.

Use Provider Codecs for request-side provider codecs and full request/response examples.
Use Wrap LLM Calls to add the managed LLM boundary first.
Use Observability after annotations are visible in local subscribers.