Kubernetes As a Service (KaaS) Requirements

View as Markdown

Kubernetes As a Service (KaaS) Requirements

Kubernetes Conformance, Versioning, & Compliance

Req IDTest Details (Legend)Requirement AreaDescription
K8S01TBDCertified VersionsCertified Upstream Versions: Official CNCF-certified versions only; no proprietary forks.
K8S02INFOVersion UpdatesSupport the three most recent minor releases (in the maintenance window); new minor versions must be available within 4-6 weeks of the upstream release; automated control plane security patching.
K8S03INFOEOL PolicyDefined notification periods for version deprecation.
K8S04INFOKubernetes Security ResponseMust participate in the Kubernetes Security Response Committee (SRC) process. Must be able to: Responsibly disclose any discovered vulnerabilities to the Kubernetes SRC Receive and respond to embargo notifications from the SRC Patch disclosed vulnerabilities in the managed service during embargo prior to public disclosure and in compliance with direction provided from the Kubernetes SRC ensuring that the patching process does not violate embargo or SRC guidance.

Kubernetes Operational Excellence

Req IDTest Details (Legend)Requirement AreaDescription
K8S05#2, #5, #6Lifecycle Management - Control PlaneAPI/CLI for CRUD provisioning; <30 min control plane bring-up. Strong preference for terraform provider.
K8S06addLifecycle Management - Node PoolAPI/CLI for CRUD provisioning ( e.g., create node pool, update node pool, delete node pool, scale a node pool to a target count). Strong preference for terraform provider. Must be able to specify node type (specific CPU or GPU instance type) Ability to specify default node labels and node taints within a node pool when a node joins the cluster.
K8S07API Server MetricsShare API Server metrics in a Prometheus scrapable format to allow NVIDIA to measure API Server SLO
K8S8INFOVersioningProvider-managed control plane upgrade processes.
K8S9INFOZero-Downtime UpgradesMinor version control plane updates without app downtime or maintenance windows.
K8S10TBDNode UpgradesUser-initiated rolling updates respecting disruption budgets.
K8S11INFOHA Control PlaneRedundant architecture with etcd separation.
K8S12INFOBackup & Disaster RecoverySupported recovery within defined RPO/RTO; needs to be auditable & testable
K8S13INFOKubernetes Security ResponseParticipate in Kubernetes security response & disclosure process; provide CVE patches prior to public disclosure (https://github.com/kubernetes/committee-security-response)

Robust K8s Security

Req IDTest Details (Legend)Requirement AreaDescription
K8S14TBDControl Plane IsolationPer-tenant k8s control plane nodes must be separate from worker nodes and outside of the tenant cluster/VPC.
K8S15TBDAccess ControlsCluster endpoint must provide network access controls.
K8S16TBDIAM IntegrationKubernetes Service Accounts shall integrate with the platform IAM system to enable workloads to assume platform-managed identities and roles with appropriate scopes.
K8S17TBDService AccountsKubernetes shall support standard Service Accounts and projected tokens as the workload identity mechanism, including an OIDC issuer for federation.
K8S18Public OIDC EndpointThe cluster must be able to support OIDC-based workload identity via a cluster-specific OIDC Issuer endpoint
K8S19INFOEncryptionAt-rest encryption for etcd and secrets
K8S20addLoggingAbility to view or export Kubernetes control plane logs (apiserver, kcm).

Kubernetes Component and Extension Requirements

Req IDTest Details (Legend)Requirement AreaDescription
K8S21addAPI ExtensionsMandatory support for CRDs and Validating/Mutating Admission Controllers.
K8S22addCNIStandard compliance; supports Network Policies; IPv4/IPv6 dual-stack desired. Preference for Calico.
K8S23addCSINCP provides CSI Driver installable by NVIDIA (Helm or Kustomize) for Block, shared FS, and NFS Support for static and dynamic provisioning, snapshots, and resizing via PVs and PVCs. CSI credentials are tenant cluster scoped (no cross cluster) APIs to query storage usage against overall cluster quota with per PVC/Volume usage to manage utilization across PVCs Vendor provided storage kernel modules and tools provided via (1) installed by CSI driver, (2) pre-installed in NCP provided machine image or (3) installable packages provided
K8S24TBDDRAEnabled Dynamic Resource Allocation (DRA) regardless of upstream feature status (Beta/GA). Some DRA features require enabling feature gates for the control plane, in case our customers want to run AI workload with new DRA features
K8S25addOperator SupportSupport standard operator-based management of hardware accelerators and associated drivers. Provider-default accelerator operators and drivers shall be replaceable or overridable to allow installation of tenant-required operator and driver versions (e.g., GPU Operator, Network Operator).

Kubernetes Functionality

Req IDTest Details (Legend)Requirement AreaDescription
K8S26addClustersSupport multiple clusters in the same tenancy; support multiple clusters in the same VPC.
K8S27addManaged CP (CP Pinning)Pin Control Plane instances to handle a particular load-limit
K8S28PerformanceMeet the standard Kubernetes performance test certified up to 5000 nodes (or to the maximum size of the cluster, whichever is smaller). Managed Kubernetes Control Plane SLO and Performance meets or better than the Kubernetes standards results.