Trusted Computing Base for Gateway Lifecycle Control
NemoClaw uses a small set of host and sandbox components to restart built-in gateways and change shields posture without granting lifecycle authority to the sandbox agent. This page defines the trusted computing base for those operations and the evidence required when the boundary changes.
Security Boundary
The operator, NemoClaw host CLI, OpenShell control plane, container runtime, and immutable image contents are trusted. The agent process, agent-writable configuration and state, mutable environment variables, status files written by the sandbox user, and network responses are untrusted. Host root compromise and replacement of root-owned image files are outside this boundary because either condition already controls the sandbox runtime.
The lifecycle boundary maintains these invariants.
- Only a registry-selected sandbox can receive a host lifecycle request.
- Only root-owned installed helpers can perform privileged lifecycle or filesystem transitions.
- A mutable path, status file, process ID, command line, or listener alone never grants authority.
- Process decisions bind the observed process ID to its start identity, parent chain, user identity, PID namespace, executable shape, and listener ownership where the topology exposes those signals.
- Filesystem transitions open trusted parents by descriptor, reject symlinks and unsafe hard links, bound traversal and input size, and verify the resulting inode state.
- A failed or ambiguous proof stops the operation without reporting recovery or a locked shields posture.
- The OpenShell-managed topology authenticates the host action but does not create gateway and agent UID isolation.
Changes to a component or invariant on this page require sensitive-path review and focused regression coverage before merge. A successful build does not replace review of privilege, process identity, descriptor safety, rollback, and fail-closed behavior.
Component Map
The root-owned /usr/local/bin/nemoclaw-gateway-control entry point and the sourced gateway-supervisor.sh library are adjacent trusted entry points.
The entry point validates the action and nonce, rejects nonroot callers, classifies the direct and OpenShell-managed topologies, and forwards only to the matching controller.
The supervisor owns the direct PID 1 request channel and publishes bounded status for the matching nonce.
Interaction Model
The host CLI first resolves the sandbox from host-owned registry state and selects the built-in agent topology.
Gateway restart generates a fresh nonce and enters nemoclaw-gateway-control as root with injection-capable environment variables cleared.
The direct topology publishes a root-owned request to PID 1, while the OpenShell-managed topology executes managed-gateway-control.py directly.
Both paths prove the exact replacement gateway and health state before the host repairs port forwards or reports success.
Shields mutations acquire the host transition lock before changing network policy, config posture, timer authority, or host state.
The coordinator invokes the agent-specific config guard and state-dir-guard.py, verifies the resulting posture, then commits host state and audit output.
Rollback keeps the same lock and transaction token so a stale callback cannot adopt the transition.
Filesystem and Descriptor Proofs
Production Python guards bind privileged helpers and parser dependencies to fixed installed paths.
The managed controller enables alternate procfs and filesystem roots only when a source checkout also sets the explicit NEMOCLAW_MANAGED_CONTROL_ALLOW_NONROOT_TEST=1 test flag.
They open trusted directories and files with no-follow and close-on-exec flags, compare descriptor metadata, and reject unsafe owners, modes, link counts, mounts, and inode replacement.
They bound file size, entry count, logical bytes, journal size, and process-table traversal before reading attacker-influenced state.
Atomic replacement uses fresh files in the already opened parent directory and verifies the installed inode before the transaction commits.
OpenClaw writes from NemoClaw must be strict JSON.
Existing OpenClaw runtime config can contain JSON5 comments or trailing commas, so restart validation retains the packaged JSON5 parser for read compatibility.
The guard verifies /usr/local/bin/node and /opt/nemoclaw/node_modules/json5 as root-owned, non-writable installed objects before invoking the parser with a fixed environment, closed file descriptors, bounded input, and a timeout.
The image build repeats the ownership and writability checks after lockfile-based npm ci installation.
Process and Listener Proofs
The direct supervisor records the gateway process ID and kernel start identity and verifies the parent relationship before signalling or reaping the process.
The Docker health fallback reads the same two-field record, verifies the process state and start identity before and after reading its command line, and accepts only the exact installed gateway arguments for the configured port or a known rewritten gateway process title.
A PID namespace inode would not distinguish PID reuse inside the same container because every process shares that namespace; PID 1 exit also terminates the container and removes the /tmp record.
The managed controller uses a stronger process key because OpenShell remains PID 1 while nonroot children can be replaced. Its key includes PID, start identity, parent PID, user IDs, namespace PID and inode, command line, process state, and procfs device and inode metadata. It verifies listener ownership in the same network namespace and uses a pidfd for signalling where the platform supports it. For managed Hermes replacement, the controller binds the exact-exit authorization to the gateway process ID and start identity plus its own process ID and start identity. The nonroot supervisor accepts that authorization only while the same root controller with the fixed installed command shape remains live. The authorization records authenticated host intent for the exact exit rather than proving that the host signal was its sole cause; a mismatched or orphaned authorization counts as an unexpected exit.
Topology Limits
The direct topology separates the root supervisor, gateway UID, and sandbox UID.
The root process belongs to the sandbox supplementary group because capability-dropped PID 1 still needs descriptor-safe access to the mutable tree before it can validate or transition the tree.
This membership is an intentional base-image invariant rather than a stale-base compatibility exception.
The OpenShell-managed topology runs the supervisor, gateway, and agent under the same sandbox UID.
The managed controller prevents accidental cross-process signalling and PID reuse, but it cannot prove provenance against a malicious same-UID agent or create UID isolation.
Remove this compatibility path only after the minimum supported OpenShell provides a root-owned lifecycle supervisor or a gateway UID distinct from the agent.
The macOS VM compatibility path accepts a nonroot NemoClaw PID 1 when the VM ownership remap prevents a root-owned readiness lease. The guard first proves either the live NemoClaw PID 1 shape, a readiness lease bound to the current PID 1, or the complete OpenShell-managed supervisor shape. Remove this path when the minimum supported OpenShell macOS VM preserves root ownership for the readiness lease.
Review and Removal Conditions
Reviewers must re-check this page when a lifecycle helper, shields coordinator, config guard, installed mode, process proof, or test seam changes. The following conditions govern current compatibility code and architecture work.
- Remove the final OpenClaw stale-base group repair when the minimum supported OpenClaw sandbox base tag is
v0.0.71or newer, the source provisioning contract guarantees both memberships, and both final-image tests pass. - Keep root membership in the base image while capability-dropped root PID 1 owns lifecycle validation for sandbox-group mutable state.
- Remove the managed shared-UID controller when the supported OpenShell topology provides a root-owned supervisor or a distinct gateway UID.
- Remove JSON5 restart validation only when every supported OpenClaw config is guaranteed to be strict JSON; NemoClaw write transactions already reject JSON5-only syntax.
- Decompose the host shields coordinator only with behavior-preserving changes that keep policy, config, timer, rollback, state, and audit ordering under one typed transaction contract.
- Keep the managed controller’s source-path and fake-root overrides disabled unless the explicit source test flag is present, and keep installed helpers bound to fixed production paths.
Final-image validation must cover both built-in images.
It checks helper owners and modes, root and gateway supplementary groups, root execution of the read-only probe path, and refusal before helper entry when the sandbox user attempts execution.