> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/dsx/llms.txt.
> For full documentation content, see https://docs.nvidia.com/dsx/llms-full.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/dsx/_mcp/server.

# Introduction

This document outlines a software reference guide intended to help NVIDIA Cloud Partners (NCPs), Cloud Service Providers (CSPs), and Independent Software Vendors (ISVs) build AI cloud services on NCP hardware platforms. It presents an infrastructure-native Northstar reference that supports multi-tenancy and elastic resource allocation.

As inference workloads grow in importance, organizations increasingly require fungible compute resources that can dynamically satisfy multi-tenant training and inference workloads. This shift has driven the need for a cloud-native approach to building and operating AI service stacks.

This document describes how to build such a solution using Kubernetes and modern AI Platform-as-a-Service (PaaS) offerings. The document also provides guidance on optimizing the performance of AI inference and training within a virtualized environment, where workloads can run on either shared or dedicated physical hosts.

Built on top of the [NCP Hardware Reference
Design](https://www.nvidia.com/en-us/data-center/gpu-cloud-computing/partners/),
the architecture stack follows a layered approach:

* **Infrastructure-as-a-Service (IaaS)** for bare metal and virtual
  machine provisioning
* **Container-as-a-Service (CaaS)** with managed Kubernetes
* **AI Platform-as-a-Service (PaaS)** for tenant-facing AI workloads

The layered design enables NCPs to deliver dynamic, multi-tenant AI
services competitive with hyperscale cloud service providers. The
architecture is infrastructure-native; compute, storage, and networking
resources are allocated in an on-demand model rather than statically
provisioned.

NCPs can work with an ecosystem of integrated services vendors (ISVs) or implement open-source tools to deliver security and workload isolation. This enables the operation of a concurrent multitenant private cloud that leverages the performance-optimized stack outlined in this reference architecture. NVIDIA certifies the performance of third-party solutions so that NCPs can confidently choose their partner of choice.

## Document Structure

The document is organized into two sections:

* [Part 1: Software Reference Guide](/dsx/ncp/part-1-software-reference-guide/ncp-software-reference-guide): Describes a layered
  multi-tenant AI cloud architecture, covering the capabilities that are
  desired at each layer; IaaS, CaaS, and PaaS. Considerations for
  network management, workload isolation, observability, and break-fix
  are mentioned.
* [Part 2: Software Components](/dsx/ncp/part-2-software-components/nvidia-software-components): Provides detailed information on
  NVIDIA-provided software addressing specific capabilities mentioned
  in the Software Reference Architecture section. Each entry explains
  the software component and maps it back to the architectural
  capabilities. The use of software provided by NVIDIA is optional and
  depends on architectural decisions.

## Audience and Scope

This document is a reference guide, not an implementation manual. It
describes capabilities required at each layer and identifies where
software provided by NVIDIA may be integrated.

* **NCPs**: Use this as a checklist of capabilities needed for operating
  a multi-tenant AI Cloud service. Each section describes what key
  capabilities are desired in each layer. This reference can be used to
  assess your current software offering stack to identify gaps and make
  informed build and buy decisions.
* **ISVs**: Map your software offerings to this layered architecture
  framework. The NVIDIA software identified in each section represents
  integration points and complementary components for your provided
  solutions.

## User Personas

User personas listed in the table below are the roles that interact
with, administer, or use the system implemented using the NCP Software
Reference Guide.

**User Personas**

| Persona        | Definition                                                                                                                                                                                                                                                                                                  |
| -------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Vendor         | This is NVIDIA or a third-party provider of software, hardware, or services that are integrated into the NCP offering.                                                                                                                                                                                      |
| Operator       | The entity that owns and operates (physical and virtual) infrastructure in an NCP deployment. This entity is responsible for IaaS/CaaS/PaaS platform services, tenant isolation, security, and break-fix operations.                                                                                        |
| Tenant         | The entity that consumes resources within a cluster and represents a customer of the NCP. Tenants purchase services from the NCP and either directly consume NCP-provided services or build services on top of them. Tenants include two types of users: Tenant Administrators and Tenant End Users.        |
| Administrators | Tenant Administrators are responsible for managing the tenant's services on top of the NCP's platform.                                                                                                                                                                                                      |
| End User       | Tenant End Users represent the customers of the tenant. An end user can also be a user within the tenant organization (for example, an employee). This group includes AI practitioners (ML Engineers, Data Scientists) who develop and deploy models, or application users who consume AI-powered services. |

## Terms and Definitions

Terms and definitions in this document are defined in the table below.

**Terms and Definitions**

| Term                           | Definition                                                                                                                                                                      |
| ------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| NVIDIA Cloud Partner (NCP)     | An NVIDIA partner who leverages the NVIDIA platform to build data centers and offer AI cloud services.                                                                          |
| Kubernetes as a Service (KaaS) | An automated mechanism to quickly stand-up K8s clusters for end customers.                                                                                                      |
| Concurrent multitenancy        | Scenarios where more than one tenant has access (directly or indirectly) / is active on a physical host at any point in time.                                                   |
| Worker host                    | A bare metal server that forms the foundation for the control plane as well as worker hosts.                                                                                    |
| Worker node                    | A virtualized compute resource on a physical host, commonly called "Guest Virtual Machine". The worker node is the resource that is used by tenants to execute their workloads. |
| Shared host                    | A worker host that is shared between two tenants (concurrent multitenancy on the host).                                                                                         |
| Dedicated control plane        | A K8s control plane (or similar) that operates workloads of one tenant and one tenant only within the cluster.                                                                  |