Appendix E. Failure Modes and Acceptance Tests#

The runbook covers node reboot, pod restart, pod reschedule, GPU reset, firmware drift, runtime-class mismatch, guest measurement mismatch, runtime-policy mismatch, attestation collateral expiry or revocation, KBS or KMS/HSM outage, model artifact rotation, certificate rotation, and emergency disablement of key release.

Table 14: Failure Modes

Failure mode

Component expected to raise it

Operator-facing signal

Pod missing RuntimeClass, GPU request, node selector, or approved image setting

Kubernetes admission, scheduler, runtime, policy controller

Admission, scheduling, or sandbox launch denial naming missing or unsupported setting

CPU, GPU, guest, image, runtime-policy, or policy evidence mismatch

Guest Attestation Agent, Attestation Service, Trustee

Attestation denial with measurement, collateral, or policy reason code

GPU not in required confidential mode

GPU Operator, host driver, guest driver, GPU management tooling, attestation verifier

Node, pod, device-health, or attestation denial naming GPU CC state

Reference value, collateral, or revocation data missing or expired

Attestation Service, Trustee, reference-value service

Verification failure naming missing collateral, expiry, or unsupported evidence

KBS, KMS/HSM, or key-release policy denies request

Trustee KBS, KMS/HSM integration

Key-release denial or dependency error with key ID, policy version, request identity

exec, debug, privileged pod, or node-level attach path blocked

Kubernetes policy, runtime policy, guest hardening, break-glass workflow

Explicit denial identifying the administrative path and policy that blocked it

Table 15: Acceptance Tests

Test

Expected result

Evidence to retain

Approved confidential pod attests and receives a sample key

Pod reaches healthy state; attestation succeeds; KBS releases the non-sensitive test key only after attestation succeeds

Pod events, runtime logs, verifier decision, KBS key-release audit, service health

Unapproved workload image, guest image, or runtime policy is denied

Pod does not receive the key

Attestation denial with image, guest, runtime-policy, measurement, or reference-value ID

Tampered pod spec or launch parameters

Pod does not receive the key when RuntimeClass, image digest, GPU assignment, guest, or runtime policy differs from the approved build

Admission, runtime, or attestation denial with changed measurement or build ID

GPU is not in the required confidential-computing mode

Pod launch, attestation, or key release fails closed

GPU/node condition or attestation reason

Expired or missing attestation collateral

Attestation fails before key release

Verifier error and collateral identifier

Trustee/KBS/KMS outage or policy denial

Workload fails closed with an actionable error

Trustee/KMS error class, key ID, request ID, policy version

KBS secret, app, or model key disabled by model provider

Model decryption fails during boot or startup; service does not run with stale access

KBS/KMS denial, app/key ID, policy version, guest startup error

exec, debug, privileged pod, node attach, or memory dump attempted

Administrative bypass fails or is controlled as break-glass without model/key exposure

Kubernetes, node, guest, runtime, firewall, or policy denial; forensics record without model data or sensitive payloads

Artifact or key rotation

New artifact/key approved; retired key unavailable per policy

New measurements/digests and audit record