Introduction#
This Reference Architecture provides a proven architecture for frontier and proprietary model vendors to securely deploy their latest models in enterprise-controlled environments, where customers can protect sensitive data, reduce dependency on metered API usage, and improve cost predictability for high-volume AI inference workloads.
Most inference today is a compromise between competing security interests. Frontier AI labs protect model IP by serving through cloud APIs. Enterprise customers, however, want to process data inside environments they control: cloud accounts, on-premises AI factories, or compute resources behind a VPC they configure.
This Reference Architecture proposes a better path: using NVIDIA Confidential Computing, the model provider retains exclusive control of model weights and source code when the model is deployed on-premises. The enterprise data owner controls which inputs enter, where outputs are delivered, and what operational data may be logged or retained. Both parties can verify that inference runs in the approved confidential environment free from tampering.
Figure 1 Secure model deployment workflow#
The basic concept of this Reference Architecture is that models are delivered to the compute environment fully encrypted. An attestation process verifies that the hardware and software stack have not been tampered with: no unauthorized process can read data or change inputs or outputs. The key-release service verifies the attestation evidence, and only then can the confidential workload read the model weights inside its protected memory. Inference then happens inside the customer-controlled environment, where the model provider does not see the inputs or outputs.
This Reference implementation is described using Canonical and vendor-partner (Fortanix) components and are detailed in Reference Implementations.
Comparison of Integration Methods#
Table 1: Comparison of Integration Methods
Deployment method |
SaaS API |
Open weights in a customer-controlled environment |
Confidential Computing* |
|---|---|---|---|
Protects |
Model provider IP |
Enterprise data privacy |
Both |
*NVIDIA Confidential Computing gives both parties the protection they want most: sensitive IP can process sensitive data under a zero-trust operating model.