Storage Requirements

NCP must provide shared storage solutions (where applicable) that are manageable via standard APIs and UI, including auditing rights for NVIDIA access.

Home Directory Storage

Quota Feature: Configurable filesystem-wide limit, default user/gid quota settings, and per uid/gid overrides.
Accounting: Usage accounting for uid/gids must be available when the feature is enabled.

Req ID	Test Details (Legend)	Requirement Area	Description
DIR01	INFO	File Service uid/gid Quota feature	Configurable filesystem-wide limit, default user/gid quota settings, and per uid/gid overrides available. Usage accounting for uid/gids when the feature is enabled.
DIR02	INFO	Must be NFS storage	NVIDIA requires NFSv4 protocol shared storage to work Access control based on DLs requires POSIX

High-Speed Storage Service Requirements

Req ID	Test Details (Legend)	Requirement Area	Description
HSS01	add	Provisioning APIs	Storage provisioning may be via vendor portal/API or NCP portal/API.
HSS02	INFO	Performance	Must provision needed throughput requested for minimum bandwidth and IOPS.
HSS03	INFO	Integration	K8s: CSI support Breakfix API required to report storage issues
HSS04	INFO	Quota Support	Ability for quota limits to be set on specific user workloads / volumes
HSS05	INFO	Upgrade, Maintenance	Provider / NCP initiates desired maintenance. NVIDIA can schedule actual maintenance and can defer maintenance up to 2 weeks. Upgrades should be non-disruptive.
HSS06	INFO	RDMA Memory Protection	Storage systems using RDMA must enforce memory protection via authorization keys for both local and remote access

High-Speed Storage Filesystem Requirements

Req ID	Test Details (Legend)	Requirement Area	Description
HSS07	INFO	Parallel High Speed Filesystem	Parallel or multi-path high-speed filesystem that supports scaling to thousands of simultaneous clients while sustaining requested performance.
HSS08	INFO	Single File System Size	It must be possible to allocate a file system of at least 1 PiB even if the initial request is less. Growing to > 10PiB as cluster size increases. This hard requirement may be higher for a specific site and if so will be communicated via the ancillary services document.
HSS09	INFO	Multiple Filesystems (Fungible Total Capacity)	Can have >1 filesystem within our total capacity. Minimal file system size <= 50 TiB.
HSS10	INFO	Filesystem Expansion	Live file system expansion is supported, in terms of capacity, inodes, IO performance, and metadata operations performance. Performance should scale linearly with capacity.
HSS11	INFO	Client	Ability to describe your client: In-Kernel, userspace, or bare-metal client installation requirements. Support integration with client kernels / OS used by NVIDIA, as needed. DKMS-enabled packages available for Ubuntu 20.04, 22.04, and 24.04-based operating systems. ARM64 versions compatible with GB200-ready kernels are mandatory, e.g. Linux 6.8.x. Managed Storage Service Provider will provide client configuration best practices and configuration guidelines for filesystem options and kernel module configuration to reliably achieve optimal performance on ARM and x86_64-based clients.
HSS12	INFO	Quota (User, Project & Group)	Must support soft and hard quotas - uid / gid / project(directory)-id quotas with enforcement.
HSS13	INFO	Root-squash	Nvidia needs to be able to enable or disable and manage root-squash at any time.
HSS14	INFO	flock	It must be possible to mount the file system with flock.
HSS15	INFO	Ability to Audit Changes	Enable Nvidia to have access to changelog data for filesystem auditing and detailed user operations tracking. Tracking by uid/gid, create files, create dirs, rename files, rename dirs, delete files, delete dirs
HSS16	INFO	HA	All services are required to tolerate any critical component failure in the backend and provide continued client access to all storage services in such cases.
HSS17	INFO	Multi-Node Coherency	One second or less for client attribute and dentry cache updates/invalidates
HSS18	INFO	Client Multipathing	Clients must have multipathing to all storage servers

Data Movement Systems Requirements

The Data Movement system is used to copy data from an external data source (NVIDIA, other Cloud, etc) to the NCP data center.

Req ID	Test Details (Legend)	Requirement Area	Description
DMS01	add	Dedicated K8s Cluster	Provider managed k8s cluster (or ability to stand up our own) for Data Mover stack available ahead of the GPU cluster bringup to pre-stage data
DMS02	INFO	Data Mover Nodes (CPU)	Dedicated CPU nodes for running data mover - needs high performance networking (exact quantity will be communicated via ancillary services doc)
DMS03	INFO	Access to Same GPU Storage	Same filesystem as mounted on GPU nodes mounted on the Data Mover nodes (or ability to mount the same filesystem via CSI)
DMS04	INFO	Access to NVIDIA Corp Net	Dedicate link (as described in network transport) to NVIDIA corp net, preferably with vpn, but otherwise with stable IP for allowlisting
DMS05	add	Stable Egress IP	Stable IP to IP allowlist access to Nvidia services. (e.g. similar to NAT Gateway)

DGXC-Managed Storage System Deployment

For scenarios where the storage system software will be deployed and managed by DGXC rather than the NCP, the following requirements apply. These requirements enable DGXC to operate storage systems (such as high-speed parallel filesystems, capacity object storage, or block storage) using NCP-provided infrastructure while maintaining operational control.

Host Provisioning and Lifecycle

Req ID	Test Details (Legend)	Requirement Area	Description
STG01	INFO	Operating System Support	NCP must support a workflow that allows DGXC storage operators to integrate vendor-provided or storage-specific operating system images via bare-metal or VM provisioning for storage servers. The workflow must: (a) Allow DGXC to deploy custom OS images (e.g., vendor-enhanced kernels for Lustre, Rocky Linux, Ubuntu 20.04/22.04/24.04).
STG02	add	Drive Sanitization Policy	Cryptographically erase data drive contents between storage system tenants with full attestation of host firmware. Must support an optional flag to skip drive sanitization during break/fix flows (e.g., power supply replacement) where tenancy does not change. Critical hardware component replacements may require sanitization without override, this is inclusive of GPU / CPU node local storage
STG03	INFO	Stable IP Assignment	Storage nodes must support static IP addressing that remains stable during host lifecycle operations and does not reset between maintenance events.
STG04	INFO	Out-of-Band Failure Detection	NCP must provide the ability to detect system failures out-of-band, including device, network, memory, and drive failures, enabling DGXC to proactively respond to hardware issues.
STG05	INFO	Topology Observability	NCP must provide visibility into failure domains to enable DGXC to provision storage nodes with physical diversity. Storage systems must be able to provision nodes that purposefully span failure domains for resilience.
STG06	INFO	BlueField/DPU Support	For storage systems utilizing BlueField-based architectures, the host provisioning system must support lifecycle management and specific configuration requirements for BlueField “JBOF” systems that export NVMe-oF to hosts.