Sizing and Performance#

Performance and sizing numbers are determined after validation on the target reference stack (HW, Infrastructure SW and target workload).

The benchmark plan should consider the key metrics (KPI) that influence the measurements, such as: pod-schedule-to-ready time, sandbox launch time, attestation latency, key-release latency, model decrypt and load time, first-token latency, steady-state throughput, p50/p95/p99 request latency, GPU and memory utilization, restart and reschedule time, node drain behavior, and how the system behaves after firmware, driver, GPU Operator, runtime, or key-release updates, along with confidential-container deployment environment considerations.

Multi-node or disaggregated inference is out of scope on platforms that do not support multi-node with Confidential Computing capabilities.