For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
NCP Software Reference GuideNVIDIA Inference Reference ArchitectureNVIDIA Requirements for AI Clouds
NCP Software Reference GuideNVIDIA Inference Reference ArchitectureNVIDIA Requirements for AI Clouds
  • NVIDIA Requirements for AI Clouds
    • Introduction
    • Service Delivery SLAs
    • Compute and Network Provisioning
    • Kubernetes As a Service (KaaS) Requirements
    • Security and Identity Management
    • Breakfix Requirements
    • Telemetry Requirements
    • Storage Requirements
    • Network Transport and Fabric Visibility
    • Transport and Networking Requirements
    • Capacity and Fleet Management
    • Appendix
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoDSX Documentation
On this page
  • Telemetry Requirements
NVIDIA Requirements for AI Clouds

Telemetry Requirements

||View as Markdown|

Telemetry Requirements

The telemetry requirements are comprised of two core components that require alignment between DGX Cloud and the NCP:

  1. Delivery Method: How telemetry will be delivered by NCP to DGX Cloud for ingestion
  2. Telemetry Scope: What telemetry the NCP will deliver to DGX Cloud

Delivery Method
NCP shall deliver all required telemetry, including metrics and logs, in a manner that allows for ingestion into DGX Cloud systems. The preferred methodology is natively via the OpenTelemetry Protocol with a latency of no longer than 120 seconds.

Telemetry Scope
DGX Cloud will provide the NCP with a detailed specification document with the required metrics and logs. Upon receipt, the NCP shall be required to provide a formal written response detailing the following:

  • Confirmation of its ability to deliver the specified metrics and logs.
  • Projected timelines for delivery.
  • Specific technical details, including metric names, label names, and label values.

Network Telemetry
The NCP shall provide network telemetry across the following domains:

  • North-South (Front-End) Network (client-facing and external interconnects)
  • East-West (Back-end) Network (GPU/GPU interconnects)
  • Management Network (control plane and orchestration traffic)
  • NVSwitch Fabric (intra-node GPU switching, applicable for only GB200 and beyond clusters)
  • Host Network (NIC-level and server connectivity)

Logs
DGX Cloud will require the NCP to provide logs from various network technologies, including but not limited to:

  1. Fabric Manager logs for the NVLink domain (where applicable)
  2. Subnet Manager logs for the NVLink domain (where applicable)
  3. VPC Flow logs (all ingress/egress traffic)
  4. UFM Event logs
  5. General Switch Logs
  6. Switch syslogs
  7. Switch kernel logs
  8. BMC SEL logs
  9. syslogs
Previous

Breakfix Requirements

Next

Storage Requirements