Architecture Summary#

The inference service is packaged as an encrypted CVM image. The model provider ships a hardened guest image and encrypted model artifacts. The platform operator launches that image on approved confidential-computing hardware.

The model stays encrypted until the HW and SW infrastructure is cryptographically attested. If attestation passes, the key-release service releases the model key into the CVM. The host can run the service, but it cannot read the model or inference payloads.

Figure 2 shows the untrusted, trusted and attestation boundaries.

_images/cvm-trust-boundaries.png

Figure 2 Confidential VM Trust boundaries#

The inference runtime, such as NIM, Triton, vLLM, TensorRT-LLM, or a custom model server, runs inside the measured CVM. Encrypted model artifacts are mounted or fetched as ciphertext and are decrypted only after attestation-gated key release. Any component that can access plaintext model weights, prompts, responses, KV cache, intermediate tensors, or request metadata must be inside the attested trust boundary.

Figure 3 shows the data flow, isolation layers, and policy boundaries.

_images/cvm-runtime-architecture.png

Figure 3 Confidential VM Runtime Architecture#

Table 3: Confidential VM Runtime Architecture Flow

Figure step

Layer

What it does

Confidentiality role

1-2

Application and ingress

Authenticates callers and forwards approved HTTPS traffic.

Keeps prompts, responses, model data, and keys out of logs and traces.

3-4

Firmware, host, and CVM launch stack

Prepares firmware, KVM/QEMU, OVMF, GPU assignment, the VM image, and measured CVM launch.

The platform operator controls launch and availability, but is not trusted with unencrypted model weights, model keys, or inference payloads.

5

Confidential VM and GPU

Runs the model server in the measured guest and uses an attested GPU channel for execution.

Model weights are readable only inside protected CPU/GPU memory and are never exposed to the host as plaintext.

6-8

Attestation and key release

Presents attestation evidence, evaluates release policy, and releases the model key only to the approved workload.

Releases model secrets only to an approved workload state.