For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
NCP Software Reference GuideNVIDIA Inference Reference ArchitectureNVIDIA Requirements for AI Clouds
NCP Software Reference GuideNVIDIA Inference Reference ArchitectureNVIDIA Requirements for AI Clouds
  • NVIDIA Requirements for AI Clouds
    • Introduction
    • Service Delivery SLAs
    • Compute and Network Provisioning
    • Kubernetes As a Service (KaaS) Requirements
    • Security and Identity Management
    • Breakfix Requirements
    • Telemetry Requirements
    • Storage Requirements
    • Network Transport and Fabric Visibility
    • Transport and Networking Requirements
    • Capacity and Fleet Management
    • Appendix
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoDSX Documentation
On this page
  • Service Delivery SLAs
  • Services Delivery Timelines
  • SLA and SLO
  • Managed K8s
  • Storage
  • Operational Requirements
  • Telemetry Delivery Method
  • Exemplar Cloud Workload Performance
NVIDIA Requirements for AI Clouds

Service Delivery SLAs

||View as Markdown|
Previous

Introduction

Next

Compute and Network Provisioning

Service Delivery SLAs

NCPs should be able to demonstrate ability to meet below SLA by category and operational requirements to be considered for offtake.

Services Delivery Timelines

The NCP must demonstrate API readiness, transport establishment at least 12 weeks ahead of GPU delivery, and the ability to provide Dev capacity (CPU nodes only) with the API integrated 6 weeks prior to GPU and cluster delivery.

One key request is for early access to ancillary compute nodes to act as the Data Mover function. This will help us pre-position data into the data center for use when GPUs are available. Access to Data Mover compute (and target storage) should be available ~2 weeks ahead of GPU cluster delivery.

SLA and SLO

Managed K8s

  • Control Plane SLA target: Financially-backed 99.95%+ uptime for production.

Storage

  • Performance (QoS): Must provision needed throughput requested for minimum bandwidth and IOPS.
  • Home Directory Storage:
    • Availability: Over 99% availability for unplanned incidents. Exclusive of scheduled maintenance.
    • Durability: Over 99.99% for any FS less than 1 PB
  • High Speed Storage Service Requirements:
    • Availability (SLO): Must meet 99.99% availability in a 30-day rolling SLO exclusive of maintenance
  • High-Speed Storage Filesystem Requirements
    • End to End Availability: Over 99.5% uptime per PB
    • Durability: Over 99.999% durability per PB annually

Operational Requirements

  • Dedicated Technical specialist/engineer available to NVIDIA
  • Slack channel monitored by technical specialist / engineer
  • 24x7 support available per partner standard incident severity procedures
  • Service impacting incidents, planned, and unplanned maintenance events are communicated to NVIDIA.
  • For planned maintenance, NVIDIA can schedule maintenance windows via APIs / console tools - avoiding unexpected outages + the ability for NVIDIA to provide feedback.
  • NCP to remediate critical vulnerabilities in a timely manner while providing transparent disclosures of any issues

Telemetry Delivery Method

NCP shall deliver all required telemetry, including metrics and logs, in a manner that allows for ingestion into DGX Cloud systems. The preferred methodology is natively via the OpenTelemetry Protocol with a latency of no longer than 120 seconds.

Exemplar Cloud Workload Performance

NVIDIA Exemplar Cloud seeks to improve performance per TCO with hardware and software recipes, references, tools, and capabilities. Run the latest publicly available release from https://github.com/NVIDIA/dgxc-benchmarking (Always pick the latest release version from the GH repo) to be successfully completed on 1 uniform HW cluster type. Please run all the workloads for a given release and share the results in the template below.

Test IDFeatureMin SizeDescription
BM01Benchmarking for exemplar cloud512 GPU clusterAchieve within 5% of an NVIDIA provided target performance number