Architecture Summary#
The inference service is packaged as an encrypted CVM image. The model provider ships a hardened guest image and encrypted model artifacts. The platform operator launches that image on approved confidential-computing hardware.
The model stays encrypted until the HW and SW infrastructure is cryptographically attested. If attestation passes, the key-release service releases the model key into the CVM. The host can run the service, but it cannot read the model or inference payloads.
Figure 2 shows the untrusted, trusted and attestation boundaries.
Figure 2 Confidential VM Trust boundaries#
The inference runtime, such as NIM, Triton, vLLM, TensorRT-LLM, or a custom model server, runs inside the measured CVM. Encrypted model artifacts are mounted or fetched as ciphertext and are decrypted only after attestation-gated key release. Any component that can access plaintext model weights, prompts, responses, KV cache, intermediate tensors, or request metadata must be inside the attested trust boundary.
Figure 3 shows the data flow, isolation layers, and policy boundaries.
Figure 3 Confidential VM Runtime Architecture#
Table 3: Confidential VM Runtime Architecture Flow
Figure step |
Layer |
What it does |
Confidentiality role |
|---|---|---|---|
1-2 |
Application and ingress |
Authenticates callers and forwards approved HTTPS traffic. |
Keeps prompts, responses, model data, and keys out of logs and traces. |
3-4 |
Firmware, host, and CVM launch stack |
Prepares firmware, KVM/QEMU, OVMF, GPU assignment, the VM image, and measured CVM launch. |
The platform operator controls launch and availability, but is not trusted with unencrypted model weights, model keys, or inference payloads. |
5 |
Confidential VM and GPU |
Runs the model server in the measured guest and uses an attested GPU channel for execution. |
Model weights are readable only inside protected CPU/GPU memory and are never exposed to the host as plaintext. |
6-8 |
Attestation and key release |
Presents attestation evidence, evaluates release policy, and releases the model key only to the approved workload. |
Releases model secrets only to an approved workload state. |