Kubernetes As a Service (KaaS) Requirements
Kubernetes As a Service (KaaS) Requirements
Kubernetes As a Service (KaaS) Requirements
Kubernetes Conformance, Versioning, & Compliance
| Req ID | Test Details (Legend) | Requirement Area | Description |
|---|---|---|---|
| K8S01 | TBD | Certified Versions | Certified Upstream Versions: Official CNCF-certified versions only; no proprietary forks. |
| K8S02 | INFO | Version Updates | Support the three most recent minor releases (in the maintenance window); new minor versions must be available within 4-6 weeks of the upstream release; automated control plane security patching. |
| K8S03 | INFO | EOL Policy | Defined notification periods for version deprecation. |
| K8S04 | INFO | Kubernetes Security Response | Must participate in the Kubernetes Security Response Committee (SRC) process. Must be able to: Responsibly disclose any discovered vulnerabilities to the Kubernetes SRC Receive and respond to embargo notifications from the SRC Patch disclosed vulnerabilities in the managed service during embargo prior to public disclosure and in compliance with direction provided from the Kubernetes SRC ensuring that the patching process does not violate embargo or SRC guidance. |
Kubernetes Operational Excellence
| Req ID | Test Details (Legend) | Requirement Area | Description |
|---|---|---|---|
| K8S05 | #2, #5, #6 | Lifecycle Management - Control Plane | API/CLI for CRUD provisioning; <30 min control plane bring-up. Strong preference for terraform provider. |
| K8S06 | add | Lifecycle Management - Node Pool | API/CLI for CRUD provisioning ( e.g., create node pool, update node pool, delete node pool, scale a node pool to a target count). Strong preference for terraform provider. Must be able to specify node type (specific CPU or GPU instance type) Ability to specify default node labels and node taints within a node pool when a node joins the cluster. |
| K8S07 | API Server Metrics | Share API Server metrics in a Prometheus scrapable format to allow NVIDIA to measure API Server SLO | |
| K8S8 | INFO | Versioning | Provider-managed control plane upgrade processes. |
| K8S9 | INFO | Zero-Downtime Upgrades | Minor version control plane updates without app downtime or maintenance windows. |
| K8S10 | TBD | Node Upgrades | User-initiated rolling updates respecting disruption budgets. |
| K8S11 | INFO | HA Control Plane | Redundant architecture with etcd separation. |
| K8S12 | INFO | Backup & Disaster Recovery | Supported recovery within defined RPO/RTO; needs to be auditable & testable |
| K8S13 | INFO | Kubernetes Security Response | Participate in Kubernetes security response & disclosure process; provide CVE patches prior to public disclosure (https://github.com/kubernetes/committee-security-response) |
Robust K8s Security
| Req ID | Test Details (Legend) | Requirement Area | Description |
|---|---|---|---|
| K8S14 | TBD | Control Plane Isolation | Per-tenant k8s control plane nodes must be separate from worker nodes and outside of the tenant cluster/VPC. |
| K8S15 | TBD | Access Controls | Cluster endpoint must provide network access controls. |
| K8S16 | TBD | IAM Integration | Kubernetes Service Accounts shall integrate with the platform IAM system to enable workloads to assume platform-managed identities and roles with appropriate scopes. |
| K8S17 | TBD | Service Accounts | Kubernetes shall support standard Service Accounts and projected tokens as the workload identity mechanism, including an OIDC issuer for federation. |
| K8S18 | Public OIDC Endpoint | The cluster must be able to support OIDC-based workload identity via a cluster-specific OIDC Issuer endpoint | |
| K8S19 | INFO | Encryption | At-rest encryption for etcd and secrets |
| K8S20 | add | Logging | Ability to view or export Kubernetes control plane logs (apiserver, kcm). |
Kubernetes Component and Extension Requirements
| Req ID | Test Details (Legend) | Requirement Area | Description |
|---|---|---|---|
| K8S21 | add | API Extensions | Mandatory support for CRDs and Validating/Mutating Admission Controllers. |
| K8S22 | add | CNI | Standard compliance; supports Network Policies; IPv4/IPv6 dual-stack desired. Preference for Calico. |
| K8S23 | add | CSI | NCP provides CSI Driver installable by NVIDIA (Helm or Kustomize) for Block, shared FS, and NFS Support for static and dynamic provisioning, snapshots, and resizing via PVs and PVCs. CSI credentials are tenant cluster scoped (no cross cluster) APIs to query storage usage against overall cluster quota with per PVC/Volume usage to manage utilization across PVCs Vendor provided storage kernel modules and tools provided via (1) installed by CSI driver, (2) pre-installed in NCP provided machine image or (3) installable packages provided |
| K8S24 | TBD | DRA | Enabled Dynamic Resource Allocation (DRA) regardless of upstream feature status (Beta/GA). Some DRA features require enabling feature gates for the control plane, in case our customers want to run AI workload with new DRA features |
| K8S25 | add | Operator Support | Support standard operator-based management of hardware accelerators and associated drivers. Provider-default accelerator operators and drivers shall be replaceable or overridable to allow installation of tenant-required operator and driver versions (e.g., GPU Operator, Network Operator). |
Kubernetes Functionality
| Req ID | Test Details (Legend) | Requirement Area | Description |
|---|---|---|---|
| K8S26 | add | Clusters | Support multiple clusters in the same tenancy; support multiple clusters in the same VPC. |
| K8S27 | add | Managed CP (CP Pinning) | Pin Control Plane instances to handle a particular load-limit |
| K8S28 | Performance | Meet the standard Kubernetes performance test certified up to 5000 nodes (or to the maximum size of the cluster, whichever is smaller). Managed Kubernetes Control Plane SLO and Performance meets or better than the Kubernetes standards results. |