Appendix E. Failure Modes and Acceptance Tests#

The runbook covers host reboot, CVM restart, GPU reset, firmware drift, guest measurement mismatch, attestation collateral expiry or revocation, key broker or KMS/HSM outage, model artifact rotation, certificate rotation, and emergency disablement of key release.

Table 16: Failure Modes

Failure mode

Component expected to raise it

Operator-facing signal

CVM launch configuration missing TEE, firmware, device assignment, or approved image setting

CVM launch stack and orchestration

Launch denial or startup event naming missing launch parameter, image identity, or device assignment

CPU, GPU, firmware, guest, or policy evidence mismatch

Guest attestation client and verifier

Attestation denial with measurement, collateral, or policy reason code

GPU not in required confidential mode

Host driver, guest driver, GPU management tooling, attestation verifier

Host/guest error, device-health signal, or attestation denial naming GPU CC state

Reference value, collateral, or revocation data missing or expired

Verifier and reference-value service

Verification failure naming missing collateral, expiry, or unsupported evidence

KMS/HSM unavailable or key-release policy denies request

Key broker and KMS/HSM integration

Key-release denial or dependency error with key ID, policy version, request identity

SSH, debug, or privileged guest access blocked

Guest hardening, launch policy, break-glass workflow

Explicit denial identifying the administrative path and policy that blocked it

Table 17: Acceptance Tests

Test

Expected result

Evidence to retain

Approved CVM attests and receives a sample key

CVM reaches healthy state; the key-release service releases the non-sensitive test key only after attestation succeeds

Guest logs, verifier decision, key-release audit, service health

Unapproved guest image or launch measurement is denied

CVM does not receive the key

Attestation denial with image, measurement, or reference-value ID

Tampered launch parameters

CVM does not receive the key when OVMF, vCPU count/type, UKI, disk image, or launch policy differs from the approved build

Attestation denial with changed measurement or build ID

GPU is not in the required confidential-computing mode

CVM launch, attestation, or key release fails closed

GPU/host condition or attestation reason

Expired or missing attestation collateral

Attestation fails before key release

Verifier error and collateral identifier

KMS/HSM outage or policy denial

Workload fails closed with an actionable error

KMS/HSM error class, key ID, request ID, policy version

App or model key disabled by model provider

Model decryption fails during boot or startup; service does not run with stale access

Key-release denial, app/key ID, policy version, guest startup error

SSH, debug, console, host attach, or QEMU dump path attempted

Administrative bypass fails or is governed as break-glass without model/key exposure

Host, guest, firewall, or policy denial; forensics record without model data or sensitive payloads

Artifact or key rotation

New artifact/key approved; retired key unavailable per policy

New measurements/digests and audit record