Architecture Overview#
We are designing this system to enable Kubernetes as a Service with concurrent multi-tenancy. NVIDIA AI Enterprise software components such as NVIDIA NIM and NVIDIA NeMo microservices will run on Kubernetes worker nodes hosted on worker hosts in the tenant’s environments. We have included an OpenStack based example for orchestration and provisioning tooling, however, NVIDIA Cloud Partners may choose to work with any 3rd party vendor to build out tooling for provisioning, orchestration, and tenant management.

Figure 1 Reference Architecture Diagram#
We have made the following assumptions when developing this reference architecture
Infrastructure (HW) configuration follows recommendations from the NVIDIA Hardware Reference Architecture for NCPs
The following IaaS / PaaS services are available in the local environment:
A datacenter orchestration solution (e.g. openstack or similar)
Storage as a service (multi-tenant capable or many per-tenant single tenant capable)
IAM, logging, telemetry, etc. (non-functional services)
NVIDIA AI Enterprise Subscription
NGC Registry Access
This architecture utilizes hosts with a hypervisor such as KVM to provide virtual machine nodes, which indirectly provide services to the NCP’s tenants. Each tenant is assigned several control plane virtual machines (“control plane nodes”) and zero or more worker virtual machines (“worker nodes”). Each tenant has their own private Kubernetes cluster with control plane nodes and worker nodes. The NCP may choose to share physical hosts between tenants to maximize hardware utilization, or they may choose to assign worker hosts and / or control plane hosts exclusively to a single tenant for maximum security isolation. The NCP’s datacenter orchestration solution manages the virtual machine lifecycle and assigns them dynamically to the tenants’ Kubernetes clusters.
In this design, a tenant never has direct access to physical worker hosts, and they also do not have direct access to the physical infrastructure control plane. While tenants have access to their instance of the Kubernetes control plane, we recommend that direct or indirect access to control plane nodes is not permitted to tenants. Tenants may launch privileged workloads that give them access to the worker nodes. If this is undesirable, it is the responsibility of the Operator to enforce policies that prevent such workload specifications. We further recommend security hardening of the control plane host and worker host OS, hypervisor, and k8s configurations to industry best practices, and we recommend similar security hardening for the images and configurations of the control plane nodes and worker nodes that are exposed to the tenant.
The container runtime should not introduce nested virtualization that could impede performance. Typically, a runtime with nested virtualization also provides weaker isolation between pods than hardware virtualization based runtimes. It is the responsibility of the tenants to ensure that workloads that require a higher level of confidentiality and thereby isolation, are not co-deployed into the same node (virtual machine). The Operator may further choose to support the deployment option of running certain workloads that may require the strictest isolation in dedicated nodes on dedicated physical hosts.
The tenant’s Kubernetes cluster is configured to retrieve code and data from NVIDIA NGC Container Registry that the respective tenant has access to. For this, the tenant provides their NVIDIA AI Enterprise access credentials to the Operators Kubernetes cluster provisioning process.