Scope#

This paper covers one deployment pattern: GPU-accelerated inference as a confidential container workload on Kubernetes. The goal is to let a model provider deliver encrypted model assets into a customer-controlled Kubernetes environment without giving the enterprise unencrypted weights or giving the model provider access to customer data, outputs, or telemetry unless the enterprise data owner allows it.

The technical scope is the confidential runtime class, the Kubernetes pod sandbox, the lightweight confidential VM, GPU confidential computing, remote attestation, policy-controlled key release, model image lifecycle, network controls, operational signals, and the reference implementations in Reference Implementations.

The design is constrained by these requirements:

  • Protect model-provider secrets from cluster and node administrators.

  • Keep enterprise data inside the customer-controlled environment.

  • Include the GPU in the attested trust boundary.

  • Preserve platform-operator responsibility for scheduling, availability, and operations.

  • Keep the architecture portable across equivalent implementations.

  • Verify all participating components received after attestation.

Standalone confidential VMs, training, fine-tuning, model-server features such as built-in authorization and guardrails, and business application workflows are out of scope. The companion CVM paper covers standalone VM deployments.

Production key release requires successful attestation against approved CPU, GPU, guest, image, runtime-policy, and key-release policy state.