Provider Response Codecs
Use this guide when subscribers, exporters, or diagnostics need a provider-neutral view of raw LLM responses.
What You Build
You will attach a response codec to a managed LLM wrapper so NeMo Relay can decode provider responses into AnnotatedLLMResponse data for LLM end events.
Response codecs are observability-only:
- They do not rewrite the value returned to the application.
- They do not run response middleware.
- They attach normalized response data to lifecycle events for subscribers and exporters.
- Decode failures are non-fatal; the LLM call still returns the provider response and the end event is emitted without an annotation.
Before You Start
You need:
- A managed LLM boundary from Wrap LLM Calls.
- A raw provider response that is JSON-compatible.
- A built-in response codec or a custom response codec for the provider response shape.
- A subscriber or exporter that consumes
annotated_responsefrom LLM end events.
What Response Codecs Decode
Response codecs normalize provider output into fields that subscribers can inspect consistently:
Use these annotations for observability, export, and debugging. Keep business logic that changes the caller-visible response in the framework or provider adapter, not in the response codec.
Built-in Response Codecs
The built-in provider codecs also implement response decoding:
OpenAIChatCodecOpenAIResponsesCodecAnthropicMessagesCodec
Choose the codec that matches the actual provider response shape. For example, do not use OpenAIChatCodec for an OpenAI Responses API payload only because both came from an OpenAI-compatible provider.
Attach a Built-in Response Codec
The examples below attach built-in response codecs for supported provider response shapes.
Python
Node.js
Rust
Read Annotated Responses
Subscribers can inspect annotated_response on LLM end events. The exact event category fields are binding-provided, so defensive checks should confirm the annotation exists before reading it.
Python
Node.js
Custom Response Codecs
Use a custom response codec when the provider or framework response does not match a built-in shape.
In Python, a custom response codec can route to built-in codecs and return their native AnnotatedLLMResponse values:
In Node.js, implement decodeResponse and return the normalized response JSON shape:
In Rust, implement LlmResponseCodec directly:
Streaming Responses
Streaming LLM wrappers decode the aggregated response produced by the stream finalizer. The response codec does not see each token or chunk. Use stream collectors for chunk-level behavior, and use response codecs for the final normalized end-event annotation.
Validation Checklist
Use this checklist to confirm the implementation preserves the expected runtime contract.
- The response codec matches the actual provider response shape.
decode_responsereturns a normalized response with safe, JSON-compatible fields.- The provider response returned to the application is unchanged.
- Subscribers see
annotated_responseonly on LLM end events where decode succeeds. - Decode errors are tested and do not break the LLM call.
- Streaming finalizers produce the same shape the response codec expects.
Common Issues
Check these symptoms first when the workflow does not behave as expected.
- No annotation appears: The response codec returned an error or the raw provider response did not match the codec.
- Returned response changed unexpectedly: Response codecs are not the right place to mutate caller-visible output.
- Tool calls are missing: The codec did not map the provider’s tool-call structure into
tool_calls. - Usage is inconsistent across providers: Normalize known token fields and preserve provider-specific usage details in
api_specificorextra.
Next Steps
Use these links to continue from this workflow into the next related task.
- Use Provider Codecs for request-side provider codecs and full request/response examples.
- Use Wrap LLM Calls to add the managed LLM boundary first.
- Use Observability after annotations are visible in local subscribers.