Introduction#

This document outlines a software reference guide intended to help NVIDIA Cloud Partners (NCPs), Cloud Service Providers (CSPs), and Independent Software Vendors (ISVs) build AI cloud services on NCP hardware platforms. It presents an infrastructure-native Northstar reference that supports multi-tenancy and elastic resource allocation.

As inference workloads grow in importance, organizations increasingly require fungible compute resources that can dynamically satisfy multi-tenant training and inference workloads. This shift has driven the need for a cloud-native approach to building and operating AI service stacks.

This document describes how to build such a solution using Kubernetes and modern AI Platform-as-a-Service (PaaS) offerings. The document also provides guidance on optimizing the performance of AI inference and training within a virtualized environment, where workloads can run on either shared or dedicated physical hosts.

Built on top of the NCP Hardware Reference Design, the architecture stack follows a layered approach:

Infrastructure-as-a-Service (IaaS) for bare metal and virtual machine provisioning
Container-as-a-Service (CaaS) with managed Kubernetes
AI Platform-as-a-Service (PaaS) for tenant-facing AI workloads

The layered design enables NCPs to deliver dynamic, multi-tenant AI services competitive with hyperscale cloud service providers. The architecture is infrastructure-native; compute, storage, and networking resources are allocated in an on-demand model rather than statically provisioned.

NCPs can work with an ecosystem of integrated services vendors (ISVs) or implement open-source tools to deliver security and workload isolation. This enables the operation of a concurrent multitenant private cloud that leverages the performance-optimized stack outlined in this reference architecture. NVIDIA certifies the performance of third-party solutions so that NCPs can confidently choose their partner of choice.

Document Structure#

The document is organized into two sections:

NCP Software Reference Guide: Describes a layered multi-tenant AI cloud architecture, covering the capabilities that are desired at each layer; IaaS, CaaS, and PaaS. Considerations for network management, workload isolation, observability, and break-fix are mentioned.
Key Software Components: Provides detailed information on NVIDIA-provided software addressing specific capabilities mentioned in the Software Reference Architecture section. Each entry explains the software component and maps it back to the architectural capabilities. The use of software provided by NVIDIA is optional and depends on architectural decisions.

Audience and Scope#

This document is a reference guide, not an implementation manual. It describes capabilities required at each layer and identifies where software provided by NVIDIA may be integrated.

NCPs: Use this as a checklist of capabilities needed for operating a multi-tenant AI Cloud service. Each section describes what key capabilities are desired in each layer. This reference can be used to assess your current software offering stack to identify gaps and make informed build and buy decisions.
ISVs: Map your software offerings to this layered architecture framework. The NVIDIA software identified in each section represents integration points and complementary components for your provided solutions.

User Personas#

User personas listed in the table below are the roles that interact with, administer, or use the system implemented using the NCP Software Reference Guide.

User Personas#
Persona	Definition
Vendor	This is NVIDIA or a third-party provider of software, hardware, or services that are integrated into the NCP offering.
Operator	The entity that owns and operates (physical and virtual) infrastructure in an NCP deployment. This entity is responsible for IaaS/CaaS/PaaS platform services, tenant isolation, security, and break-fix operations.
Tenant	The entity that consumes resources within a cluster and represents a customer of the NCP. Tenants purchase services from the NCP and either directly consume NCP-provided services or build services on top of them. Tenants include two types of users: Tenant Administrators and Tenant End Users.
Administrators	Tenant Administrators are responsible for managing the tenant’s services on top of the NCP’s platform.
End User	Tenant End Users represent the customers of the tenant. An end user can also be a user within the tenant organization (for example, an employee). This group includes AI practitioners (ML Engineers, Data Scientists) who develop and deploy models, or application users who consume AI-powered services.

Terms and Definitions#

Terms and definitions in this document are defined in the table below.

Terms and Definitions#
Term	Definition
NVIDIA Cloud Partner (NCP)	An NVIDIA partner who leverages the NVIDIA platform to build data centers and offer AI cloud services.
Kubernetes as a Service (KaaS)	An automated mechanism to quickly stand-up K8s clusters for end customers.
Concurrent multitenancy	Scenarios where more than one tenant has access (directly or indirectly) / is active on a physical host at any point in time.
Worker host	A bare metal server that forms the foundation for the control plane as well as worker hosts.
Worker node	A virtualized compute resource on a physical host, commonly called “Guest Virtual Machine”. The worker node is the resource that is used by tenants to execute their workloads.
Shared host	A worker host that is shared between two tenants (concurrent multitenancy on the host).
Dedicated control plane	A K8s control plane (or similar) that operates workloads of one tenant and one tenant only within the cluster.