Highlights

This page focuses on the changes that are most likely to affect how you run, instrument, or extend NeMo Relay.

NVIDIA NeMo Relay 0.6

NVIDIA NeMo Relay 0.6 focuses on reliable evidence. It starts Relay before a coding agent needs it, keeps sensitive data out of emitted events, preserves the right context in each export, and gives embedding hosts direct control of dynamic plugins.

Coding-Agent Gateway and Skills

Coding agents now connect through nemo-relay mcp, which starts or adopts the authenticated gateway before it reads MCP protocol frames. Codex, Claude Code, and Hermes clients can share the gateway, keep it alive while they are connected, and coordinate a single recovery attempt before idle shutdown.
Installed integrations now use agent-owned hooks, generated MCP entries, generation fencing, transactional rollback, and user-scoped configuration. Generated configuration carries approved environment variable names, not their secret values.
Relay now emits tool-parented skill.load marks when an agent uses a first-class skill tool or requests a complete read of a SKILL.md file. The behavior covers coding agents, direct tools, LangChain, LangGraph, and Deep Agents.
The public skill catalog now starts from the task at hand: install Relay, reach a first useful result, instrument an application, configure a plugin, migrate, or debug. Its guidance favors installers pinned to immutable release tags, synthetic non-sensitive exporter checks, runtime-injected secrets, and tightly scoped migration writes.

Event Security and Runtime Semantics

You can now sanitize mark, scope-start, and scope-end events globally, within a scope, or from a plugin context. Rust, Python, and Node.js expose the same lifecycle, along with experimental Go and C bindings, native plugins, and grpc-v1 workers.
The built-in PII redaction plugin now cleans mark data, category profiles, metadata, and generic scope inputs and outputs before subscribers or exporters receive them.
Relay now tracks LLM event-history freshness per agent. A new agent and the first LLM start after a compaction mark retain the complete sanitized history. Later starts keep the system instructions, latest user turn, and following messages. The provider request itself does not change.
A central provider-codec factory now resolves canonical provider names and the built-in request, response, and streaming codecs.

Observability and Optimization

Observability configuration version 2 gives each ATOF file or stream sink its own settings. One component can send every event to several destinations, including named HTTP sinks whose credentials come from environment variables.
OpenTelemetry and OpenInference now project top-level event data into typed OTLP attributes instead of raw JSON payload attributes. Optional attribute_mappings keep backend-specific aliases available during migration.
Trace exporters can show selected marks as visible, zero-duration tool spans. OpenInference also exposes the projected tool name, output, and metadata in its standard fields.
A single session.start mark now carries the IDs that correlate startup sessions across ATOF, ATIF, OpenTelemetry, and OpenInference.
Relay now accepts plugin-neutral LLM optimization evidence and combines it into bounded summaries of baseline and effective models, token effects, estimated cost, evidence quality, and pricing provenance.
ATIF now attributes each LLM step to the effective normalized response model. ATOF remains the canonical mark stream, while ATIF remains a step-oriented trajectory projection.

Dynamic Plugin Hosts

Embedding hosts can now own a PluginHostActivation for explicit native and worker plugins. Static configuration initializes first, dynamic components are added in the same transaction, and teardown clears callbacks and flushes subscribers before releasing resources that it can unload safely.
Python and Node.js can now activate explicitly specified dynamic plugins and clean them up asynchronously. Experimental C and Go entry points expose the same owned lifecycle for source-first evaluation.

CLI Configuration and Operations

nemo-relay plugins edit now understands recursive lists, string maps, and tagged unions. You can edit nested Adaptive, NeMo Guardrails, PII, Observability, and other built-in configuration without editing raw JSON.
Operational commands now initialize process-wide logging before they run. Logs always go to stderr and can also flow to bounded asynchronous file sinks in human-readable or JSONL form. nemo-relay config and nemo-relay plugins edit skip initialization so invalid logging configuration cannot block its own repair.

Switchyard Decision Routing (Experimental)

The published nemo-relay-switchyard crate and optional CLI feature connect Relay to a separately running Switchyard Decision API. Relay validates the selected backend, translates supported provider protocols in process, and dispatches to targets that Relay owns.
You can run in enforce or observe-only mode, use bounded retry-aware routing, use trusted same-protocol fallbacks, and route buffered or streaming calls. Relay records the resulting model-routing optimization evidence.
Startup health checks validate the Switchyard service, and named ATOF stream-sink checks protect history-backed profiles. When every configured target uses the inbound protocol, Relay preserves supported provider extensions. Mixed-protocol configurations still enforce portability checks.

Compatibility and Migration

Reinstall persistent coding-agent integrations so they use the MCP-owned gateway lifecycle.
Move ATOF plugin configuration to Observability configuration version 2, and use typed sinks for manually constructed exporters. Move OTLP queries to the typed attribute paths at the same time.
Check ATIF and event consumers for assumptions about marks, requested model names, or complete request histories.
Update Rust exhaustive matches, direct struct literals, and editor metadata consumers for the new public variants and fields.
Migrate Rust code that directly constructs LlmJsonStream, and update Go call sites to handle the error returned by LlmStream.Close. Call an explicit close method when a consumer stops reading an LLM stream early.
Point any automation that opens public skills by directory name at the new task-oriented catalog.

For the exact migration actions, documentation, and originating PRs, refer to Known Issues and Compatibility Notes. The complete PR-by-PR history and release artifacts are available on GitHub Releases.