> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/dsx/llms.txt.
> For full documentation content, see https://docs.nvidia.com/dsx/llms-full.txt.

# NCP Software Reference Guide

The NCP Software Reference Guide introduces the concept
of a layered approach for implementing AI Services over the NCP Hardware
RA. The abstracted layered architecture can be broken into two views:
the Tenant Compute view, and the Operator view.

## Tenant Compute View

The tenant-consumed compute resources can be broken into the following
abstracted layers, as shown in the Tenant Compute View diagram.

![Tenant View of the Software Reference Architecture](https://files.buildwithfern.com/nvidia-dsx.docs.buildwithfern.com/dsx/89395a298ab80c6c3e5fde1637937103c46bd3ac156de5240a188763f5bae9e4/_dot_dot_/docs/guides/ncp-software-reference-guide/assets/images/ncp-srg-tenant-view.png)

* **Infrastructure-as-a-Service (IaaS)** — This layer is responsible for
  both Bare Metal (BM) and Virtual Machines (VM) as a consumable
  infrastructure. To address the dynamic resource allocation, this
  service responds to UI or API calls to create isolated and sanitized
  infrastructure for a tenant.
* **Container-as-a-Service (CaaS)** — This is the managed K8s layer
  built on top of the IaaS layer and provides the end user all the
  advantages of K8s (such as extensibility, modularity, API-driven,
  auto-scaling, simplified scheduling) while providing the
  operational abstraction and automation of a managed service. This CaaS
  layer can be disaggregated and provided independently, or as part of
  an integrated platform solution by the NCP.
* **AI Platform-as-a-Service (PaaS)**: This is the primary application
  to enable GPU-based AI workloads. Slurm is widely used for training
  and HPC use cases today, but there is an increasing migration to other
  Cloud Native AI platforms that are good for model development,
  inference and training (for example, Run.AI, and any number of industry
  platforms).
* **Slurm**: Slurm, while not a cloud-native AI PaaS, is a well-known
  single-tenant AI platform especially useful for HPC and training jobs.
  When running Slurm, the NCP can use the open-source version or NVIDIA®
  BCM Slurm, which is tailored to work well with NVIDIA GPUs.
* **Ancillary compute/Native workloads**: These are general-purpose
  compute servers available in the core/ancillary services in the data
  center. These workloads (such as business logic, load balancers, database
  services) must be serviced as first-class citizens by the NCP
  software stack.

## Operator View

The generalized view of the services and features expected to run
the **AI workload execution** and the **Control & Management** stacks is
pictured below:

![Operator View of the Software Reference Architecture](https://files.buildwithfern.com/nvidia-dsx.docs.buildwithfern.com/dsx/36cfd18e2d4da35bedf48e411b03e7c66d11978cab032afef2d77cd547db6799/_dot_dot_/docs/guides/ncp-software-reference-guide/assets/images/ncp-srg-operator-view.png)

A few of these services are key technologies for the NCP Software
Reference Guide, such as the Software Defined
Networking (SDN) controller and the AI Platform control planes, and will
be discussed later in this document.

This operator view presents the core capabilities that each layer in the software reference architecture provides.