Scope#

This Reference Architecture covers one deployment pattern: GPU-accelerated inference inside a confidential virtual machine. The goal is to let a model provider deliver encrypted model assets into a customer-controlled environment without giving the enterprise unencrypted weights or giving the model provider access to customer data, outputs, or telemetry unless the enterprise data owner allows it.

The technical scope is the Confidential Virtual Machine (CVM), with CPU and GPU confidential computing, remote attestation, policy-controlled key release, model image lifecycle, network controls, operational signals, and the reference implementations in Reference Implementations.

The design is constrained by these requirements:

Protect model-provider secrets from infrastructure administrators.
Keep enterprise data inside the customer-controlled environment.
Include the GPU in the attested trust boundary.
Preserve platform-operator responsibility for availability and operations.
Keep the architecture portable across equivalent implementations.

Kubernetes-native confidential containers, training, fine-tuning, VM fleet orchestration, and model-server features such as built-in authorization, guardrails, and application-level multi-tenancy are out of scope.

Production key release requires successful attestation against approved CPU, GPU, guest, and policy state.