> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/dsx/llms.txt.
> For full documentation content, see https://docs.nvidia.com/dsx/llms-full.txt.

# Storage Requirements

## Storage Requirements

NCP must provide shared storage solutions (where applicable) that are manageable via standard APIs and UI, including auditing rights for NVIDIA access.

### Home Directory Storage

* **Quota Feature:** Configurable filesystem-wide limit, default user/gid quota settings, and per uid/gid overrides.
* **Accounting:** Usage accounting for uid/gids must be available when the feature is enabled.

| Req ID    | Test Details [(Legend)](/dsx/guides/nvidia-requirements-for-ai-clouds/appendix#test-legend) | Requirement Area                   | Description                                                                                                                                                          |
| :-------- | :------------------------------------------------------------------------------------------ | :--------------------------------- | :------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **DIR01** | INFO                                                                                        | File Service uid/gid Quota feature | Configurable filesystem-wide limit, default user/gid quota settings, and per uid/gid overrides available. Usage accounting for uid/gids when the feature is enabled. |
| **DIR02** | INFO                                                                                        | Must be NFS storage                | NVIDIA requires NFSv4 protocol shared storage to work  Access control based on DLs requires POSIX                                                                    |

### High-Speed Storage Service Requirements

| Req ID    | Test Details [(Legend)](/dsx/guides/nvidia-requirements-for-ai-clouds/appendix#test-legend) | Requirement Area       | Description                                                                                                                                                      |
| :-------- | :------------------------------------------------------------------------------------------ | :--------------------- | :--------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **HSS01** | add                                                                                         | Provisioning APIs      | Storage provisioning may be via vendor portal/API or NCP portal/API.                                                                                             |
| **HSS02** | INFO                                                                                        | Performance            | Must provision needed throughput requested for minimum bandwidth and IOPS.                                                                                       |
| **HSS03** | INFO                                                                                        | Integration            | K8s: CSI support  Breakfix API required to report storage issues                                                                                                 |
| **HSS04** | INFO                                                                                        | Quota Support          | Ability for quota limits to be set on specific user workloads / volumes                                                                                          |
| **HSS05** | INFO                                                                                        | Upgrade, Maintenance   | Provider / NCP initiates desired maintenance. NVIDIA can schedule actual maintenance and can defer maintenance up to 2 weeks. Upgrades should be non-disruptive. |
| **HSS06** | INFO                                                                                        | RDMA Memory Protection | Storage systems using RDMA must enforce memory protection via authorization keys for both local and remote access                                                |

### High-Speed Storage Filesystem Requirements

| Req ID    | Test Details [(Legend)](/dsx/guides/nvidia-requirements-for-ai-clouds/appendix#test-legend) | Requirement Area                               | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
| :-------- | :------------------------------------------------------------------------------------------ | :--------------------------------------------- | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **HSS07** | INFO                                                                                        | Parallel High Speed Filesystem                 | Parallel or multi-path high-speed filesystem that supports scaling to thousands of simultaneous clients while sustaining requested performance.                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
| **HSS08** | INFO                                                                                        | Single File System Size                        | It must be possible to allocate a file system of at least 1 PiB even if the initial request is less. Growing to > 10PiB as cluster size increases.  This hard requirement may be higher for a specific site and if so will be communicated via the ancillary services document.                                                                                                                                                                                                                                                                                                                                  |
| **HSS09** | INFO                                                                                        | Multiple Filesystems (Fungible Total Capacity) | Can have >1 filesystem within our total capacity. Minimal file system size \<= 50 TiB.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| **HSS10** | INFO                                                                                        | Filesystem Expansion                           | Live file system expansion is supported, in terms of capacity, inodes, IO performance, and metadata operations performance. Performance should scale linearly with capacity.                                                                                                                                                                                                                                                                                                                                                                                                                                     |
| **HSS11** | INFO                                                                                        | Client                                         | Ability to describe your client: In-Kernel, userspace, or bare-metal client installation requirements. Support integration with client kernels / OS used by NVIDIA, as needed.  DKMS-enabled packages available for Ubuntu 20.04, 22.04, and 24.04-based operating systems. ARM64 versions compatible with GB200-ready kernels are mandatory, e.g. Linux 6.8.x.  Managed Storage Service Provider will provide client configuration best practices and configuration guidelines for filesystem options and kernel module configuration to reliably achieve optimal performance on ARM and x86\_64-based clients. |
| **HSS12** | INFO                                                                                        | Quota (User, Project & Group)                  | Must support soft and hard quotas - uid / gid / project(directory)-id quotas with enforcement.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
| **HSS13** | INFO                                                                                        | Root-squash                                    | Nvidia needs to be able to enable or disable and manage root-squash at any time.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
| **HSS14** | INFO                                                                                        | flock                                          | It must be possible to mount the file system with flock.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
| **HSS15** | INFO                                                                                        | Ability to Audit Changes                       | Enable Nvidia to have access to changelog data for filesystem auditing and detailed user operations tracking. Tracking by uid/gid, create files, create dirs, rename files, rename dirs, delete files,  delete dirs                                                                                                                                                                                                                                                                                                                                                                                             |
| **HSS16** | INFO                                                                                        | HA                                             | All services are required to tolerate any critical component failure in the backend and provide continued client access to all storage services in such cases.                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
| **HSS17** | INFO                                                                                        | Multi-Node Coherency                           | One second or less for client attribute and dentry cache updates/invalidates                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
| **HSS18** | INFO                                                                                        | Client Multipathing                            | Clients must have multipathing to all storage servers                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |

### Data Movement Systems Requirements

The Data Movement system is used to copy data from an external data source (NVIDIA, other Cloud, etc) to the NCP data center.

| Req ID    | Test Details [(Legend)](/dsx/guides/nvidia-requirements-for-ai-clouds/appendix#test-legend) | Requirement Area           | Description                                                                                                                                     |
| :-------- | :------------------------------------------------------------------------------------------ | :------------------------- | :---------------------------------------------------------------------------------------------------------------------------------------------- |
| **DMS01** | add                                                                                         | Dedicated K8s Cluster      | Provider managed k8s cluster (or ability to stand up our own) for Data Mover stack available ahead of the GPU cluster bringup to pre-stage data |
| **DMS02** | INFO                                                                                        | Data Mover Nodes (CPU)     | Dedicated CPU nodes for running data mover - needs high performance networking (exact quantity will be communicated via ancillary services doc) |
| **DMS03** | INFO                                                                                        | Access to Same GPU Storage | Same filesystem as mounted on GPU nodes mounted on the Data Mover nodes (or ability to mount the same filesystem via CSI)                       |
| **DMS04** | INFO                                                                                        | Access to NVIDIA Corp Net  | Dedicate link (as described in network transport) to NVIDIA corp net, preferably with vpn, but otherwise with stable IP for allowlisting        |
| **DMS05** | add                                                                                         | Stable Egress IP           | Stable IP to IP allowlist access to Nvidia services. (e.g. similar to NAT Gateway)                                                              |

### DGXC-Managed Storage System Deployment

For scenarios where the storage system software will be deployed and managed by DGXC rather than the NCP, the following requirements apply. These requirements enable DGXC to operate storage systems (such as high-speed parallel filesystems, capacity object storage, or block storage) using NCP-provided infrastructure while maintaining operational control.

#### Host Provisioning and Lifecycle

| Req ID    | Test Details [(Legend)](/dsx/guides/nvidia-requirements-for-ai-clouds/appendix#test-legend) | Requirement Area              | Description                                                                                                                                                                                                                                                                                                                                                                                              |
| :-------- | :------------------------------------------------------------------------------------------ | :---------------------------- | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **STG01** | INFO                                                                                        | Operating System Support      | NCP must support a workflow that allows DGXC storage operators to integrate vendor-provided or storage-specific operating system images via bare-metal or VM provisioning for storage servers. The workflow must: (a) Allow DGXC to deploy custom OS images (e.g., vendor-enhanced kernels for Lustre, Rocky Linux, Ubuntu 20.04/22.04/24.04).                                                           |
| **STG02** | add                                                                                         | Drive Sanitization Policy     | Cryptographically erase data drive contents between storage system tenants with full attestation of host firmware. Must support an optional flag to skip drive sanitization during break/fix flows (e.g., power supply replacement) where tenancy does not change. Critical hardware component replacements may require sanitization without override, this is inclusive of GPU / CPU node local storage |
| **STG03** | INFO                                                                                        | Stable IP Assignment          | Storage nodes must support static IP addressing that remains stable during host lifecycle operations and does not reset between maintenance events.                                                                                                                                                                                                                                                      |
| **STG04** | INFO                                                                                        | Out-of-Band Failure Detection | NCP must provide the ability to detect system failures out-of-band, including device, network, memory, and drive failures, enabling DGXC to proactively respond to hardware issues.                                                                                                                                                                                                                      |
| **STG05** | INFO                                                                                        | Topology Observability        | NCP must provide visibility into failure domains to enable DGXC to provision storage nodes with physical diversity. Storage systems must be able to provision nodes that purposefully span failure domains for resilience.                                                                                                                                                                               |
| **STG06** | INFO                                                                                        | BlueField/DPU Support         | For storage systems utilizing BlueField-based architectures, the host provisioning system must support lifecycle management and specific configuration requirements for BlueField "JBOF" systems that export NVMe-oF to hosts.                                                                                                                                                                           |