> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/relay/llms.txt.
> For full documentation content, see https://docs.nvidia.com/nemo/relay/llms-full.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/relay/_mcp/server.

# Basic Usage

The `nemo-relay` binary observes coding agents that do not expose every
LLM call site directly. It combines agent-specific hook endpoints with a
passthrough LLM gateway so NeMo Relay owns both the agent lifecycle and the model
request lifecycle.

Use the gateway when you need one observability boundary for OpenAI Codex,
Claude Code, Cursor, and Hermes without replacing each agent's canonical hook
payload.

## Hook Endpoints

Each hook endpoint accepts the agent's native hook JSON directly. Do not wrap
the payload in a shared gateway envelope.

* `POST /hooks/codex` accepts Codex hook JSON and returns the Codex-compatible
  hook response object.
* `POST /hooks/claude-code` accepts Claude Code hook JSON and returns
  Claude-compatible fields such as `continue` and permission decisions when the
  hook event supports them.
* `POST /hooks/cursor` accepts Cursor hook JSON and returns Cursor-compatible
  fields such as `continue`, `permission`, `user_message`, and `agent_message`
  when the hook event supports them.
* `POST /hooks/hermes` accepts Hermes shell hook JSON and returns the empty JSON
  object expected by Hermes hook commands.

The adapters preserve vendor fields such as session IDs, working directories,
transcript paths, model names, tool payloads, shell payloads, MCP payloads, file
payloads, user identity, and subagent metadata in NeMo Relay event metadata.

## Gateway Routes

Route all coding-agent LLM traffic through the gateway when full LLM lifecycle
observability is required.

* `POST /v1/responses`
* `POST /v1/chat/completions`
* `POST /v1/messages`
* `POST /v1/messages/count_tokens`
* `GET /v1/models`

The gateway forwards raw provider JSON without rewriting OpenAI or Anthropic
payload schemas. It removes only hop-by-hop transport headers, forwards
streaming responses as streams, and emits NeMo Relay LLM start and end events
under the active session scope.

## Transparent Run

Use the agent shortcuts for no-install local observability. The wrapper starts
a gateway on a dynamic `127.0.0.1` port, injects the resolved hook and gateway
configuration into the launched coding agent, and stops the gateway when the
agent exits.

```bash
nemo-relay codex
nemo-relay claude
nemo-relay cursor
nemo-relay hermes
```

Use `nemo-relay run -- <command>` when you want to launch an explicit command
instead of the built-in shortcut:

```bash
nemo-relay run -- codex
```

If a launcher or wrapper hides the real agent name, set that wrapper as the
configured command and pass `--agent`. The same pattern applies to Claude Code,
Codex, Cursor, and Hermes:

```toml
[agents.codex]
command = "my-codex-wrapper"
```

```bash
nemo-relay run --agent codex
```

Hermes is different from the other transparent modes: `run --agent hermes`
starts the gateway and exports the dynamic `NEMO_RELAY_GATEWAY_URL`, but Hermes
shell hooks still need to be installed or otherwise approved in Hermes config.

Use `--dry-run --print` to inspect the generated hook config, gateway
environment, gateway URL, and final command without launching the agent.

## Shared Configuration

Shared TOML config is optional. The gateway loads defaults, then system config,
then project config, then user config. User config takes priority over system
and project config. CLI flags and environment variables override file config.

Config file locations are:

* `/etc/nemo-relay/config.toml`
* `.nemo-relay/config.toml`
* `$XDG_CONFIG_HOME/nemo-relay/config.toml`
* `~/.config/nemo-relay/config.toml`

Example:

```toml
[upstream]
openai_base_url = "https://api.openai.com/v1"
anthropic_base_url = "https://api.anthropic.com"

[agents.claude]
command = "claude"

[agents.codex]
command = "codex"

[agents.cursor]
command = "cursor-agent"
patch_restore_hooks = true

[agents.hermes]
command = "hermes"
```

Observability exporters are configured in `plugins.toml`. Use
`nemo-relay plugins edit` for the user file, `nemo-relay plugins edit --project`
for `.nemo-relay/plugins.toml`, or write the plugin config directly:

```toml
version = 1

[[components]]
kind = "observability"
enabled = true

[components.config.atif]
enabled = true
output_directory = ".nemo-relay/atif"

[components.config.openinference]
enabled = true
endpoint = "http://127.0.0.1:4318/v1/traces"
```

Transparent runs always bind the managed gateway to `127.0.0.1:0`. The selected
port is discovered by the wrapper and exposed to hooks through
`NEMO_RELAY_GATEWAY_URL`.

Common environment variables for direct gateway server use are:

* `NEMO_RELAY_GATEWAY_BIND`
* `NEMO_RELAY_OPENAI_BASE_URL`
* `NEMO_RELAY_ANTHROPIC_BASE_URL`

Plugin configuration controls process-level Observability exporters. Per-session
configuration controls structured metadata on the top-level agent begin event
and the plugin configuration metadata associated with the session.

`hook-forward` can also pass per-session configuration through headers:

* `x-nemo-relay-config-profile`
* `x-nemo-relay-session-metadata`
* `x-nemo-relay-plugin-config`
* `x-nemo-relay-gateway-mode`

The accepted gateway mode values are `hook-only`, `passthrough`, and
`required`. The gateway records this value as session metadata so downstream
exporters and review tooling can distinguish hook-only traces from sessions
where provider traffic was expected to pass through the gateway.

## Runtime Mapping

The gateway normalizes vendor hook payloads into private internal events before
calling NeMo Relay APIs.

* Agent start opens a top-level `ScopeType::Agent` scope on a dedicated
  `ScopeStackHandle`.
* Subagent start opens a child `ScopeType::Agent` scope. Subagent stop closes
  that scope when it is still active.
* Tool pre-use starts a NeMo Relay tool span. Tool post-use, denial, or failure
  closes it.
* Prompt, response, agent-thought, and Hermes LLM hooks are retained as
  private correlation hints. They are not emitted as NeMo Relay events.
* Compaction, notification, and unknown hook events become mark events under
  the active session scope.
* Gateway requests emit NeMo Relay LLM start and end events under the active
  session scope. Before each LLM start, the gateway uses explicit subagent
  headers, pending hints, shared conversation/generation/request identifiers,
  and the previous correlated owner to choose the parent scope.
* LLM responses that contain future tool-use suggestions are retained as
  private tool-call hints. The next matching tool hook can then inherit the
  subagent scope that owned the LLM response, even when the hook payload does
  not include a subagent id.

Gateway requests can provide explicit correlation identifiers with these
headers:

* `x-nemo-relay-session-id`
* `x-nemo-relay-subagent-id`
* `x-nemo-relay-conversation-id`
* `x-nemo-relay-generation-id`
* `x-nemo-relay-request-id`

When those headers are absent, the gateway also looks for
`conversation_id`/`conversationId`/`conversation.id`,
`generation_id`/`generationId`/`generation.id`, and
`request_id`/`requestId`/`request.id` fields in the provider request body.
Correlation hints expire after five minutes. If the gateway cannot select one
unambiguous hint, it falls back to the previous LLM owner, then to the only
active subagent, then to the top-level agent scope.

Every gateway LLM event includes `llm_correlation_status` metadata. Possible
values are `explicit`, `single_hint`, `matched_hint`, `sticky_last_owner`,
`active_subagent`, `agent_fallback`, and `ambiguous_fallback`. Matched hints can
also add `llm_correlation_source`, `llm_correlation_subagent_id`,
`llm_correlation_conversation_id`, `llm_correlation_generation_id`,
`llm_correlation_request_id`, and `llm_correlation_agent_type`.

Generated hook bundles subscribe to the events needed for that mapping:

| Agent       | Correlation hint hooks                                                | Scope, tool, and mark hooks                                                                                                                                                                                |
| ----------- | --------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Claude Code | `UserPromptSubmit`, `AfterAgentResponse`, `AfterAgentThought`, `Stop` | `SessionStart`, `SessionEnd`, `SubagentStart`, `SubagentStop`, `PreToolUse`, `PostToolUse`, `PostToolUseFailure`, `Notification`, `PreCompact`                                                             |
| Codex       | `UserPromptSubmit`, `AfterAgentResponse`, `AfterAgentThought`, `Stop` | `SessionStart`, `SessionEnd`, `SubagentStart`, `SubagentStop`, `PreToolUse`, `PostToolUse`, `PostToolUseFailure`, `Notification`, `PreCompact`                                                             |
| Cursor      | `beforeSubmitPrompt`, `afterAgentResponse`, `afterAgentThought`       | `sessionStart`, `sessionEnd`, `subagentStart`, `subagentStop`, `preToolUse`, `postToolUse`, `beforeShellExecution`, `afterShellExecution`, `beforeMCPExecution`, `afterMCPExecution`, `preCompact`, `stop` |
| Hermes      | `pre_llm_call`, `post_llm_call`                                       | `on_session_start`, `on_session_end`, `on_session_finalize`, `on_session_reset`, `subagent_start`, `subagent_stop`, `pre_tool_call`, `post_tool_call`                                                      |

Cursor hook-only mode observes agent, subagent, and tool lifecycle. To observe
Cursor LLM lifecycle completely, configure Cursor model traffic to use the
gateway.

## Hook Forwarding

Hooks generated by the wrapper (Claude/Codex/Cursor ephemeral, Hermes via
setup) invoke `nemo-relay hook-forward <agent>` from stdin. Inside the wrapper
the gateway URL comes from `NEMO_RELAY_GATEWAY_URL` injected on every run;
outside the wrapper (Hermes standalone, IDE-launched Claude/Codex) the hook
command falls back to its embedded `--gateway-url`.

`hook-forward` reads the canonical hook payload from standard input, sends it
to the matching endpoint, and prints the endpoint response. It fails open by
default so observability outages do not block the coding agent. Add
`--fail-closed` only when policy requires hook delivery to block the agent.

Optional flags map to gateway headers:

* `--session-metadata` sets `x-nemo-relay-session-metadata`.
* `--plugin-config` sets `x-nemo-relay-plugin-config`.
* `--profile` sets `x-nemo-relay-config-profile`.
* `--gateway-mode` sets `x-nemo-relay-gateway-mode`.

## Agent Guides

Use the per-agent guide for end-to-end setup, smoke tests, and GUI or
application-mode caveats.

* [Claude Code](/nemo-relay-cli/claude-code)
* [Codex](/nemo-relay-cli/codex)
* [Cursor](/nemo-relay-cli/cursor)
* [Hermes Agent](/nemo-relay-cli/hermes)

Each guide covers transparent run setup, gateway routing, hook smoke tests,
Agent Trajectory Interchange Format (ATIF) export verification on session end,
and troubleshooting missing LLM lifecycle data.