Architecture Summary#

The inference service is packaged as an encrypted CVM image. The model provider ships a hardened guest image and encrypted model artifacts. The platform operator launches that image on approved confidential-computing hardware.

The model stays encrypted until the HW and SW infrastructure is cryptographically attested. If attestation passes, the key-release service releases the model key into the CVM. The host can run the service, but it cannot read the model or inference payloads.

Figure 2 shows the untrusted, trusted and attestation boundaries.

_images/cvm-trust-boundaries.png — Figure 2 Confidential VM Trust boundaries#

The inference runtime, such as NIM, Triton, vLLM, TensorRT-LLM, or a custom model server, runs inside the measured CVM. Encrypted model artifacts are mounted or fetched as ciphertext and are decrypted only after attestation-gated key release. Any component that can access plaintext model weights, prompts, responses, KV cache, intermediate tensors, or request metadata must be inside the attested trust boundary.

Figure 3 shows the data flow, isolation layers, and policy boundaries.

_images/cvm-runtime-architecture.png — Figure 3 Confidential VM Runtime Architecture#

Table 3: Confidential VM Runtime Architecture Flow

Figure step	Layer	What it does	Confidentiality role
1-2	Application and ingress	Authenticates callers and forwards approved HTTPS traffic.	Keeps prompts, responses, model data, and keys out of logs and traces.
3-4	Firmware, host, and CVM launch stack	Prepares firmware, KVM/QEMU, OVMF, GPU assignment, the VM image, and measured CVM launch.	The platform operator controls launch and availability, but is not trusted with unencrypted model weights, model keys, or inference payloads.
5	Confidential VM and GPU	Runs the model server in the measured guest and uses an attested GPU channel for execution.	Model weights are readable only inside protected CPU/GPU memory and are never exposed to the host as plaintext.
6-8	Attestation and key release	Presents attestation evidence, evaluates release policy, and releases the model key only to the approved workload.	Releases model secrets only to an approved workload state.