> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/relay/llms.txt.
> For full documentation content, see https://docs.nvidia.com/nemo/relay/llms-full.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/relay/_mcp/server.

# Performance

NeMo Relay keeps runtime overhead focused around the work that is active for the current scope and call.

## Runtime Model

These points summarize the runtime behaviors that matter most for performance-sensitive
paths.

* Scope stacks define active ownership and scope-local visibility.
* Middleware registries are priority ordered and lazily sorted.
* Managed tool and LLM helpers resolve visible middleware before executing the user callback.
* Subscribers receive emitted events after runtime work creates them.

## Practical Guidance

Use these practices when applying the concept in application or integration code.

* Prefer scope-local middleware for request-specific behavior so cleanup happens when the scope closes.
* Keep subscriber callbacks lightweight or move expensive export work out of the hot path.
* Use execution intercepts when you need to wrap real execution and sanitize guardrails when you only need to change emitted observability payloads.
* Use binding-native typed wrappers and codecs when provider payload conversion would otherwise be repeated at many call sites.

## Related Topics

Use these links to continue into adjacent concepts and workflows.

* [Architecture](/about-nemo-relay/architecture)
* [Middleware](/about-nemo-relay/concepts/middleware)
* [Subscribers](/about-nemo-relay/concepts/subscribers)
* [Typed Wrappers And Codecs](/integrate-into-frameworks/using-codecs)