Redfish API

Overview

Redfish is a standard RESTful API specification developed by the Distributed Management Task Force (DMTF) for managing and monitoring server hardware, storage, networking, and converged infrastructure. It provides a standardized way to interact with Baseboard Management Controllers (BMCs) and other hardware management interfaces.

DPS uses Redfish as its primary interface for communicating with NVIDIA DGX compute nodes. Through Redfish, DPS can:

  • Set and monitor power limits for nodes, GPUs, CPUs, and memory
  • Collect power consumption metrics and performance telemetry
  • Manage Workload Power Profile Settings (WPPS) for supported platforms

Supported Platforms

DPS supports the following NVIDIA DGX systems via the Redfish API:

DGX H100

DGX B200

DGX GB200 NVL

  • Power Management: EnvironmentMetrics (per-device power limits)
  • WPPS Support: Yes
  • Minimum BMC Version: 25.2.0

DGX B300

  • Power Management: Node Manager (domain-based power capping)
  • WPPS Support: Yes
  • Minimum BMC Version: 46.02.07

DGX GB300 NVL

  • Power Management: EnvironmentMetrics (per-device power limits)
  • WPPS Support: Yes
  • Minimum BMC Version: 25.2.0

Session Management

DPS creates authenticated sessions with the BMC for all subsequent API calls. Sessions are maintained as long-lived connections and reused for efficiency.

For detailed session management specifications, see DMTF Redfish Specification DSP0266 v1.23.0 - Section 9.2.4 Session Management.

Create Session

  • Endpoint: POST /redfish/v1/SessionService/Sessions
  • Arguments (request payload):
    • UserName (string) - BMC username from Kubernetes secret
    • Password (string) - BMC password from Kubernetes secret
  • Attributes (response headers):
    • X-Auth-Token - Session token for subsequent API requests
    • Location - Session URL for deletion

Example:

POST /redfish/v1/SessionService/Sessions HTTP/1.1
Content-Type: application/json

{
  "UserName": "admin",
  "Password": "password123"
}
HTTP/1.1 201 Created
X-Auth-Token: abc123-session-token-xyz789
Location: /redfish/v1/SessionService/Sessions/1

Example Error Response (Invalid Credentials):

HTTP/1.1 401 Unauthorized
Content-Type: application/json

{
  "error": {
    "code": "Base.1.13.0.SecurityAccessDenied",
    "message": "While attempting to establish a connection to /redfish/v1/SessionService/Sessions, the service was denied access.",
    "@Message.ExtendedInfo": [
      {
        "@odata.type": "#Message.v1_1_1.Message",
        "MessageId": "Base.1.13.0.SecurityAccessDenied",
        "Message": "While attempting to establish a connection to /redfish/v1/SessionService/Sessions, the service was denied access.",
        "Severity": "Critical",
        "Resolution": "Attempt to ensure that the URI is correct and that the service has the appropriate credentials."
      }
    ]
  }
}

Delete Session

  • Endpoint: DELETE /redfish/v1/SessionService/Sessions/{session_id}
  • Path Parameters:
    • {session_id} - Session ID from Location header
  • Request Headers:
    • X-Auth-Token - Session token from create response

Example:

DELETE /redfish/v1/SessionService/Sessions/1 HTTP/1.1
X-Auth-Token: abc123-session-token-xyz789
HTTP/1.1 204 No Content

DGX H100 Systems

Reference: NVIDIA DGX H100/H200 Redfish API Guide

DGX H100 systems use the Node Manager API for power management through domains and policies. DPS creates a custom domain named dps-managed-domain to manage power allocation.

Node Manager Domains

Domains represent power management scopes. DPS manages power through a dedicated domain.

List Domains

  • Endpoint: GET /redfish/v1/Managers/BMC/NodeManager/Domains
  • Attributes (response):
    • Members[].@odata.id - URLs to individual domain resources

Example:

GET /redfish/v1/Managers/BMC/NodeManager/Domains HTTP/1.1
HTTP/1.1 200 OK
Content-Type: application/json

{
  "@odata.id": "/redfish/v1/Managers/BMC/NodeManager/Domains",
  "Members": [
    { "@odata.id": "/redfish/v1/Managers/BMC/NodeManager/Domains/0" },
    { "@odata.id": "/redfish/v1/Managers/BMC/NodeManager/Domains/1" }
  ],
  "Members@odata.count": 2
}

Get Domain

  • Endpoint: GET /redfish/v1/Managers/BMC/NodeManager/Domains/{id}
  • Attributes (response):
    • Id - Domain identifier for update/delete operations
    • Name - Domain name; "dps-managed-domain" indicates managed domain
    • Capabilities.Max - Maximum power capability in Watts
    • Capabilities.Min - Minimum power capability in Watts
    • Policies.Members[] - Nested policy objects

Example:

GET /redfish/v1/Managers/BMC/NodeManager/Domains/1 HTTP/1.1
HTTP/1.1 200 OK
Content-Type: application/json

{
  "@odata.id": "/redfish/v1/Managers/BMC/NodeManager/Domains/1",
  "Id": "1",
  "Name": "dps-managed-domain",
  "Status": { "State": "Enabled" },
  "Capabilities": {
    "Max": 10200,
    "Min": 2500
  },
  "Policies": {
    "Members": [
      {
        "ComponentId": "COMP_GPU",
        "Limit": 5600.0,
        "PercentageOfDomainBudget": 75.0,
        "Status": { "State": "Selected" }
      },
      {
        "ComponentId": "COMP_CPU",
        "Limit": 700.0,
        "PercentageOfDomainBudget": 10.0,
        "Status": { "State": "Selected" }
      }
    ]
  }
}

Create Domain

  • Endpoint: POST /redfish/v1/Managers/BMC/NodeManager/Domains
  • Arguments (request payload):
    • Id - Domain identifier (default: "0")
    • Name - Always "dps-managed-domain" (identifies managed domains)
    • Status.State - Always "Enabled" (required for active state)
    • Capabilities.Max - Node maximum power limit in Watts
    • Capabilities.Min - Node minimum power limit in Watts
    • Policies.Members[] - Array of policy objects (see Policy Object Fields)

Example:

POST /redfish/v1/Managers/BMC/NodeManager/Domains HTTP/1.1
Content-Type: application/json

{
  "Id": "1",
  "Name": "dps-managed-domain",
  "Status": { "State": "Enabled" },
  "Capabilities": {
    "Max": 10200,
    "Min": 2500
  },
  "Policies": {
    "Members": [
      {
        "ComponentId": "COMP_GPU",
        "Limit": 5600.0,
        "PercentageOfDomainBudget": 75.0,
        "Status": { "State": "Selected" }
      },
      {
        "ComponentId": "COMP_CPU",
        "Limit": 700.0,
        "PercentageOfDomainBudget": 10.0,
        "Status": { "State": "Selected" }
      },
      {
        "ComponentId": "COMP_MEMORY",
        "Limit": 200.0,
        "PercentageOfDomainBudget": 5.0,
        "Status": { "State": "Selected" }
      }
    ]
  }
}
HTTP/1.1 204 No Content

Update Domain

  • Endpoint: PATCH /redfish/v1/Managers/BMC/NodeManager/Domains/{id}
  • Arguments (request payload):
    • Capabilities.Max - Maximum power limit in Watts
    • Capabilities.Min - Minimum power limit in Watts
    • Policies.Members[] - Array of policy objects (see Policy Object Fields)

Example:

PATCH /redfish/v1/Managers/BMC/NodeManager/Domains/1 HTTP/1.1
Content-Type: application/json

{
  "Capabilities": {
    "Max": 9500,
    "Min": 2000
  },
  "Policies": {
    "Members": [
      {
        "ComponentId": "COMP_GPU",
        "Limit": 5000.0,
        "PercentageOfDomainBudget": 70.0,
        "Status": { "State": "Selected" }
      },
      {
        "ComponentId": "COMP_CPU",
        "Limit": 600.0,
        "PercentageOfDomainBudget": 10.0,
        "Status": { "State": "Selected" }
      },
      {
        "ComponentId": "COMP_MEMORY",
        "Limit": 180.0,
        "PercentageOfDomainBudget": 5.0,
        "Status": { "State": "Selected" }
      }
    ]
  }
}
HTTP/1.1 200 OK
Content-Type: application/json

{
  "@Message.ExtendedInfo": [
    {
      "@odata.type": "#Message.v1_1_1.Message",
      "Message": "The request completed successfully.",
      "MessageId": "Base.1.18.1.Success",
      "Severity": "OK",
      "Resolution": "None."
    }
  ]
}

Delete Domain

  • Endpoint: DELETE /redfish/v1/Managers/BMC/NodeManager/Domains/{id}
  • Path Parameters:
    • {id} - Domain identifier

Example:

DELETE /redfish/v1/Managers/BMC/NodeManager/Domains/1 HTTP/1.1
HTTP/1.1 204 No Content

Domain Policies

Policies define power limits for specific components within a domain. Each policy object in Policies.Members[] contains the following fields:

Policy Object Fields (arguments when creating/updating)

  • ComponentId (string) - Component type identifier (exact string required):
    • "COMP_CPU" - CPU component
    • "COMP_MEMORY" - Memory/DRAM component
    • "COMP_GPU" - GPU component
  • Limit (float) - Power limit in Watts
  • PercentageOfDomainBudget (float) - Percentage of total domain budget (0.0-100.0)
  • Status.State (string) - Always "Selected" (required for active policies)

Get Policy

  • Endpoint: GET /redfish/v1/Managers/BMC/NodeManager/Domains/{domain_id}/Policies/{policy_id}
  • Path Parameters:
    • {domain_id} - Domain identifier
    • {policy_id} - Policy identifier
  • Attributes (response):
    • ComponentId - Component type ("COMP_CPU", "COMP_MEMORY", "COMP_GPU")
    • Limit - Power limit in Watts
    • PercentageOfDomainBudget - Budget percentage allocation

Example:

GET /redfish/v1/Managers/BMC/NodeManager/Domains/1/Policies/0 HTTP/1.1
HTTP/1.1 200 OK
Content-Type: application/json

{
  "@odata.id": "/redfish/v1/Managers/BMC/NodeManager/Domains/1/Policies/0",
  "Id": "0",
  "ComponentId": "COMP_GPU",
  "Limit": 5600.0,
  "PercentageOfDomainBudget": 75.0,
  "Status": { "State": "Selected" }
}

PSU Policies

PSU policies define power limits based on PSU redundancy configuration. DPS validates that requested power caps don’t exceed the active PSU policy’s LimitMax.

List PSU Policies

  • Endpoint: GET /redfish/v1/Managers/BMC/NodeManager/PSUPolicies

Example:

GET /redfish/v1/Managers/BMC/NodeManager/PSUPolicies HTTP/1.1
HTTP/1.1 200 OK
Content-Type: application/json

{
  "@odata.id": "/redfish/v1/Managers/BMC/NodeManager/PSUPolicies",
  "Members": [
    { "@odata.id": "/redfish/v1/Managers/BMC/NodeManager/PSUPolicies/0" },
    { "@odata.id": "/redfish/v1/Managers/BMC/NodeManager/PSUPolicies/1" },
    { "@odata.id": "/redfish/v1/Managers/BMC/NodeManager/PSUPolicies/2" }
  ],
  "Members@odata.count": 3
}

Get PSU Policy

  • Endpoint: GET /redfish/v1/Managers/BMC/NodeManager/PSUPolicies/{id}
  • Path Parameter Values:
    • "0" - Limp mode (minimal PSU configuration)
    • "1" - No Redundancy
    • "2" - Full Redundancy
  • Attributes (response):
    • LimitMax - Maximum power limit in Watts; power caps must not exceed this value

Example (Full Redundancy):

GET /redfish/v1/Managers/BMC/NodeManager/PSUPolicies/2 HTTP/1.1
HTTP/1.1 200 OK
Content-Type: application/json

{
  "@odata.id": "/redfish/v1/Managers/BMC/NodeManager/PSUPolicies/2",
  "Id": "2",
  "Name": "Full Redundancy",
  "LimitMax": 12000,
  "MaxPSU": 6,
  "MinPSU": 4,
  "Status": { "State": "Selected" }
}

Telemetry Service

DGX H100 systems use the TelemetryService for real-time power and performance metrics.

Get Metric Reports

  • Endpoint: GET /redfish/v1/TelemetryService/MetricReports/NvidiaNMMetrics_0
  • Attributes (response):
    • MetricValues[].MetricId - Metric identifier (case-sensitive)
    • MetricValues[].MetricValue - Metric value as string (e.g., "8500.0" not 8500.0)

Example (partial response):

GET /redfish/v1/TelemetryService/MetricReports/NvidiaNMMetrics_0 HTTP/1.1
HTTP/1.1 200 OK
Content-Type: application/json

{
  "@odata.id": "/redfish/v1/TelemetryService/MetricReports/NvidiaNMMetrics_0",
  "MetricValues": [
    { "MetricId": "dcPlatformPower_avg", "MetricValue": "8500.0" },
    { "MetricId": "AvblNoGPU", "MetricValue": "8" },
    { "MetricId": "AvblNoCPU", "MetricValue": "2" },
    { "MetricId": "gpuPower_avg_0", "MetricValue": "63.0" },
    { "MetricId": "gpuPowerLimit_0", "MetricValue": "700.0" },
    { "MetricId": "gpuPowerCapabilitiesMin_0", "MetricValue": "200.0" },
    { "MetricId": "gpuPowerCapabilitiesMax_0", "MetricValue": "700.0" },
    { "MetricId": "cpuPackagePower_avg_0", "MetricValue": "182.0" },
    { "MetricId": "dramPower_avg_0", "MetricValue": "45.0" }
  ]
}

Node-Level Metrics

  • dcPlatformPower_avg - Total DC platform power in Watts
  • AvblNoGPU - Available GPU count
  • AvblNoCPU - Available CPU count

Per-GPU Metrics (for each {id} from 0 to 7)

  • gpuPower_avg_{id} - GPU average power in Watts
  • gpuPowerLimit_{id} - GPU power limit in Watts
  • gpuPowerCapabilitiesMin_{id} - GPU minimum power limit in Watts
  • gpuPowerCapabilitiesMax_{id} - GPU maximum power limit in Watts

Per-CPU Metrics (for each {id} from 0 to 1)

  • cpuPackagePower_avg_{id} - CPU average power in Watts
  • cpuPackagePowerLimit1_{id} - CPU power limit 1 in Watts
  • cpuPackagePowerCapabilitiesMin_{id} - CPU minimum power in Watts
  • cpuPackagePowerCapabilitiesMax_{id} - CPU maximum power in Watts
  • cpuEnergy_{id} - CPU energy in kWh

Per-Memory Metrics (for each {id} from 0 to 1)

  • dramPower_avg_{id} - DRAM average power in Watts
  • dramPowerLimit_{id} - DRAM power limit in Watts
  • dramPackagePowerCapabilitiesMin_{id} - DRAM minimum power in Watts
  • dramPackagePowerCapabilitiesMax_{id} - DRAM maximum power in Watts
  • dramEnergy_{id} - DRAM energy in kWh

Firmware Validation

  • Endpoint: GET /redfish/v1/UpdateService/FirmwareInventory/HGX_FW_BMC_0
  • Attributes (response):
    • Version - BMC firmware version; must be >= 24.0.0

Example:

GET /redfish/v1/UpdateService/FirmwareInventory/HGX_FW_BMC_0 HTTP/1.1
HTTP/1.1 200 OK
Content-Type: application/json

{
  "@odata.id": "/redfish/v1/UpdateService/FirmwareInventory/HGX_FW_BMC_0",
  "Id": "HGX_FW_BMC_0",
  "Name": "HGX BMC",
  "Version": "24.08.25.01"
}

DGX B200 Systems

Reference: NVIDIA DGX B200 Redfish API Guide

DGX B200 systems use the Node Manager API for power management, like DGX H100 systems.

For power management endpoints and request fields, see DGX H100 Node Manager Domains and Domain Policies. DGX B200 power policy application creates or updates the same dps-managed-domain domain and does not set per-device power limits through processor EnvironmentMetrics.

Processors Collection

  • Endpoint: GET /redfish/v1/Systems/HGX_Baseboard_0/Processors
  • Attributes (response):
    • Members[].@odata.id - URLs to individual processors:
      • GPU_SXM_1 through GPU_SXM_8 - GPUs
      • CPU_0, CPU_1 - CPUs

Example:

GET /redfish/v1/Systems/HGX_Baseboard_0/Processors HTTP/1.1
HTTP/1.1 200 OK
Content-Type: application/json

{
  "@odata.id": "/redfish/v1/Systems/HGX_Baseboard_0/Processors",
  "Members": [
    { "@odata.id": "/redfish/v1/Systems/HGX_Baseboard_0/Processors/GPU_SXM_1" },
    { "@odata.id": "/redfish/v1/Systems/HGX_Baseboard_0/Processors/GPU_SXM_2" },
    { "@odata.id": "/redfish/v1/Systems/HGX_Baseboard_0/Processors/GPU_SXM_8" },
    { "@odata.id": "/redfish/v1/Systems/HGX_Baseboard_0/Processors/CPU_0" },
    { "@odata.id": "/redfish/v1/Systems/HGX_Baseboard_0/Processors/CPU_1" }
  ],
  "Members@odata.count": 10
}

Get Processor

  • Endpoint: GET /redfish/v1/Systems/HGX_Baseboard_0/Processors/{processor_id}
  • Path Parameters:
    • {processor_id} - Processor ID from collection (GPU_SXM_1-GPU_SXM_8 or CPU_0-CPU_1)
  • Attributes (response):
    • Id - Processor identifier
    • Oem.Nvidia.WorkloadPowerProfile.@odata.id - Presence indicates WPPS support; provides URL to WPPS endpoint

Example (GPU with WPPS support):

GET /redfish/v1/Systems/HGX_Baseboard_0/Processors/GPU_SXM_1 HTTP/1.1
HTTP/1.1 200 OK
Content-Type: application/json

{
  "@odata.id": "/redfish/v1/Systems/HGX_Baseboard_0/Processors/GPU_SXM_1",
  "Id": "GPU_SXM_1",
  "Name": "GPU SXM 1",
  "Oem": {
    "Nvidia": {
      "WorkloadPowerProfile": {
        "@odata.id": "/redfish/v1/Systems/HGX_Baseboard_0/Processors/GPU_SXM_1/Oem/Nvidia/WorkloadPowerProfile"
      }
    }
  }
}

HGX Platform EnvironmentMetrics

DGX B200 metrics are read from the consolidated HGX platform report. This report contains GPU, CPU, memory, and chassis power telemetry in one response.

  • Endpoint: GET /redfish/v1/TelemetryService/MetricReports/HGX_PlatformEnvironmentMetrics_0
  • Attributes (response):
    • MetricValues[].MetricProperty - Sensor URL for a GPU, CPU, memory, or chassis sensor
    • MetricValues[].MetricValue - Sensor value as a string

Example (partial response):

GET /redfish/v1/TelemetryService/MetricReports/HGX_PlatformEnvironmentMetrics_0 HTTP/1.1
HTTP/1.1 200 OK
Content-Type: application/json

{
  "@odata.id": "/redfish/v1/TelemetryService/MetricReports/HGX_PlatformEnvironmentMetrics_0",
  "MetricValues": [
    {
      "MetricProperty": "/redfish/v1/Chassis/HGX_GPU_SXM_1/Sensors/HGX_GPU_SXM_1_Power_0",
      "MetricValue": "64.342"
    },
    {
      "MetricProperty": "/redfish/v1/Chassis/HGX_Chassis_0/Sensors/HGX_Chassis_0_Power_0",
      "MetricValue": "8500.0"
    }
  ]
}

Workload Power Profiles (WPPS)

WPPS allows fine-grained control over GPU power behavior for specific workload types. Profiles are identified by index (0-255) and represented as hex bitmasks.

Get WPPS Configuration

  • Endpoint: GET /redfish/v1/Systems/HGX_Baseboard_0/Processors/{gpu_id}/Oem/Nvidia/WorkloadPowerProfile
  • Path Parameters:
    • {gpu_id} - GPU ID (GPU_SXM_1-GPU_SXM_8)
  • Attributes (response):
    • SupportedProfileMask - Hex mask of available profiles on this GPU
    • RequestedProfileMask - Hex mask of requested profiles (may differ from enforced during transition)
    • EnforcedProfileMask - Hex mask of currently active profiles

Example:

GET /redfish/v1/Systems/HGX_Baseboard_0/Processors/GPU_SXM_1/Oem/Nvidia/WorkloadPowerProfile HTTP/1.1
HTTP/1.1 200 OK
Content-Type: application/json

{
  "@odata.id": "/redfish/v1/Systems/HGX_Baseboard_0/Processors/GPU_SXM_1/Oem/Nvidia/WorkloadPowerProfile",
  "SupportedProfileMask": "0x10f98",
  "RequestedProfileMask": "0x0",
  "EnforcedProfileMask": "0x0"
}

Enable Workload Profiles

  • Endpoint: POST /redfish/v1/Systems/HGX_Baseboard_0/Processors/{gpu_id}/Oem/Nvidia/WorkloadPowerProfile/Actions/NvidiaWorkloadPower.EnableProfiles
  • Path Parameters:
    • {gpu_id} - GPU ID (GPU_SXM_1-GPU_SXM_8)
  • Arguments (request payload):
    • ProfileMask - Hex mask of profiles to enable; must use 0x prefix (e.g., "0x7" for profiles 0, 1, 2)

Example:

POST /redfish/v1/Systems/HGX_Baseboard_0/Processors/GPU_SXM_1/Oem/Nvidia/WorkloadPowerProfile/Actions/NvidiaWorkloadPower.EnableProfiles HTTP/1.1
Content-Type: application/json

{
  "ProfileMask": "0x7"
}
HTTP/1.1 200 OK
Content-Type: application/json

{
  "@Message.ExtendedInfo": [
    {
      "@odata.type": "#Message.v1_1_1.Message",
      "Message": "The request completed successfully.",
      "MessageId": "Base.1.18.1.Success",
      "Severity": "OK",
      "Resolution": "None."
    }
  ]
}

Disable Workload Profiles

  • Endpoint: POST /redfish/v1/Systems/HGX_Baseboard_0/Processors/{gpu_id}/Oem/Nvidia/WorkloadPowerProfile/Actions/NvidiaWorkloadPower.DisableProfiles
  • Path Parameters:
    • {gpu_id} - GPU ID (GPU_SXM_1-GPU_SXM_8)
  • Arguments (request payload):
    • ProfileMask - Hex mask of profiles to disable; must use 0x prefix (e.g., "0x7" to disable profiles 0, 1, 2)

Example:

POST /redfish/v1/Systems/HGX_Baseboard_0/Processors/GPU_SXM_1/Oem/Nvidia/WorkloadPowerProfile/Actions/NvidiaWorkloadPower.DisableProfiles HTTP/1.1
Content-Type: application/json

{
  "ProfileMask": "0x7"
}
HTTP/1.1 200 OK
Content-Type: application/json

{
  "@Message.ExtendedInfo": [
    {
      "@odata.type": "#Message.v1_1_1.Message",
      "Message": "The request completed successfully.",
      "MessageId": "Base.1.18.1.Success",
      "Severity": "OK",
      "Resolution": "None."
    }
  ]
}

Profile Mask Format

The ProfileMask is a hex string with 0x prefix where each bit position represents a profile ID:

  • "0x0" - No profiles enabled
  • "0x1" - Profile 0 only (bit 0 set)
  • "0x7" - Profiles 0, 1, and 2 (bits 0, 1, 2 set)

Firmware Validation

  • Endpoint: GET /redfish/v1/UpdateService/FirmwareInventory/HostBMC_0
  • Attributes (response):
    • Version - BMC firmware version; must be >= 25.0.0

Example:

GET /redfish/v1/UpdateService/FirmwareInventory/HostBMC_0 HTTP/1.1
HTTP/1.1 200 OK
Content-Type: application/json

{
  "@odata.id": "/redfish/v1/UpdateService/FirmwareInventory/HostBMC_0",
  "Id": "HostBMC_0",
  "Name": "Host BMC",
  "Version": "25.04.17.00"
}

DGX B300 Systems

DGX B300 systems use the Node Manager API for power management and the HGX platform EnvironmentMetrics report for telemetry, like DGX B200 systems.

For power management endpoints and request fields, see DGX H100 Node Manager Domains and Domain Policies. DGX B300 power policy application creates or updates dps-managed-domain and does not set per-device power limits through processor EnvironmentMetrics.

HGX Platform EnvironmentMetrics

Same API as DGX B200, with zero-based processor paths:

  • GPU path labels: /redfish/v1/Systems/HGX_Baseboard_0/Processors/GPU_{id}
    • {id} - GPU ID (0-7)
  • CPU path labels: /redfish/v1/Systems/HGX_Baseboard_0/Processors/CPU_{id}
    • {id} - CPU ID (0-1)
  • Report endpoint: GET /redfish/v1/TelemetryService/MetricReports/HGX_PlatformEnvironmentMetrics_0

Workload Power Profiles (WPPS)

Same WPPS operations as DGX B200. Use B300 GPU IDs GPU_0 through GPU_7. See DGX B200 WPPS section.

Firmware Validation

  • Endpoint: GET /redfish/v1/UpdateService/FirmwareInventory/HostBMC_0
  • Attributes (response):
    • Version - BMC firmware version; must be >= 46.02.07

DGX GB200 and GB300 NVL Systems

DGX GB200 and GB300 NVL systems use the EnvironmentMetrics API for power management. DPS sets node, GPU, and CPU power limits through device-specific EnvironmentMetrics paths. It reads telemetry from the consolidated HGX platform EnvironmentMetrics report.

Processor EnvironmentMetrics Power Limits

Power limit operations use processor EnvironmentMetrics paths:

  • GPU Endpoint: GET/PATCH /redfish/v1/Systems/HGX_Baseboard_0/Processors/{gpu_id}/EnvironmentMetrics
    • {gpu_id} - GPU ID (GPU_0-GPU_3, four GPUs total)
  • CPU Endpoint: GET/PATCH /redfish/v1/Systems/HGX_Baseboard_0/Processors/{cpu_id}/EnvironmentMetrics
    • {cpu_id} - CPU ID (CPU_0-CPU_1, two CPUs total)
  • Arguments (PATCH request payload):
    • PowerLimitWatts.SetPoint - Power limit in Watts

Example:

PATCH /redfish/v1/Systems/HGX_Baseboard_0/Processors/GPU_0/EnvironmentMetrics HTTP/1.1
Content-Type: application/json

{
  "PowerLimitWatts": {
    "SetPoint": 700
  }
}
HTTP/1.1 200 OK
Content-Type: application/json

{
  "@Message.ExtendedInfo": [
    {
      "@odata.type": "#Message.v1_1_1.Message",
      "Message": "The request completed successfully.",
      "MessageId": "Base.1.18.1.Success",
      "Severity": "OK",
      "Resolution": "None."
    }
  ]
}

HGX Platform EnvironmentMetrics

Telemetry uses the same consolidated report as DGX B200 and DGX B300:

  • Endpoint: GET /redfish/v1/TelemetryService/MetricReports/HGX_PlatformEnvironmentMetrics_0
  • Attributes (response):
    • MetricValues[].MetricProperty - Sensor URL for a GPU, CPU, memory, or chassis sensor
    • MetricValues[].MetricValue - Sensor value as a string

Processor Module EnvironmentMetrics

For module-level node power management:

Get Module Metrics

  • Endpoint: GET /redfish/v1/Chassis/HGX_ProcessorModule_{index}/EnvironmentMetrics
  • Path Parameters:
    • {index} - Processor module index (0 or 1)
  • Attributes (response): Same as processor EnvironmentMetrics - PowerWatts.Reading, PowerLimitWatts.* fields

Example:

GET /redfish/v1/Chassis/HGX_ProcessorModule_0/EnvironmentMetrics HTTP/1.1
HTTP/1.1 200 OK
Content-Type: application/json

{
  "@odata.id": "/redfish/v1/Chassis/HGX_ProcessorModule_0/EnvironmentMetrics",
  "PowerWatts": { "Reading": 2500.0 },
  "PowerLimitWatts": {
    "Reading": 2820.0,
    "SetPoint": 2820,
    "AllowableMin": 1500,
    "AllowableMax": 3000,
    "DefaultSetPoint": 2820
  }
}

Set Module Power Limit

  • Endpoint: PATCH /redfish/v1/Chassis/HGX_ProcessorModule_{index}/EnvironmentMetrics
  • Path Parameters:
    • {index} - Processor module index (0 or 1)
  • Arguments (request payload):
    • PowerLimitWatts.SetPoint - Module power limit in Watts

Example:

PATCH /redfish/v1/Chassis/HGX_ProcessorModule_0/EnvironmentMetrics HTTP/1.1
Content-Type: application/json

{
  "PowerLimitWatts": {
    "SetPoint": 2820
  }
}
HTTP/1.1 200 OK
Content-Type: application/json

Workload Power Profiles (WPPS)

Same WPPS operations as DGX B200. Use GB200 and GB300 GPU IDs GPU_0 through GPU_3. See DGX B200 WPPS section.

Firmware Validation

  • Endpoint: GET /redfish/v1/UpdateService/FirmwareInventory/FW_BMC_0
  • Attributes (response):
    • Oem.Nvidia.ActiveFirmwareSlot.Version - BMC firmware version (NVIDIA OEM format); must be >= 25.2.0

Example (NVIDIA OEM format):

GET /redfish/v1/UpdateService/FirmwareInventory/FW_BMC_0 HTTP/1.1
HTTP/1.1 200 OK
Content-Type: application/json

{
  "@odata.id": "/redfish/v1/UpdateService/FirmwareInventory/FW_BMC_0",
  "Id": "FW_BMC_0",
  "Name": "BMC firmware",
  "Oem": {
    "Nvidia": {
      "ActiveFirmwareSlot": {
        "Version": "25.25.2",
        "FirmwareState": "Activated"
      }
    }
  }
}

Complete Endpoint Reference

Session Management (All Platforms)

  • POST /redfish/v1/SessionService/Sessions - Create session
  • DELETE /redfish/v1/SessionService/Sessions/{id} - Delete session

Firmware Inventory (All Platforms)

  • GET /redfish/v1/UpdateService/FirmwareInventory/HGX_FW_BMC_0 - DGX H100
  • GET /redfish/v1/UpdateService/FirmwareInventory/HostBMC_0 - DGX B200, B300
  • GET /redfish/v1/UpdateService/FirmwareInventory/FW_BMC_0 - DGX GB200 NVL, GB300 NVL

Node Manager (DGX H100, B200, and B300)

  • GET /redfish/v1/Managers/BMC/NodeManager/Domains - List domains
  • GET /redfish/v1/Managers/BMC/NodeManager/Domains/{id} - Get domain
  • POST /redfish/v1/Managers/BMC/NodeManager/Domains - Create domain
  • PATCH /redfish/v1/Managers/BMC/NodeManager/Domains/{id} - Update domain
  • DELETE /redfish/v1/Managers/BMC/NodeManager/Domains/{id} - Delete domain
  • GET /redfish/v1/Managers/BMC/NodeManager/Domains/{domain_id}/Policies/{policy_id} - Get policy
  • GET /redfish/v1/Managers/BMC/NodeManager/PSUPolicies - List PSU policies
  • GET /redfish/v1/Managers/BMC/NodeManager/PSUPolicies/{id} - Get PSU policy

NVIDIA Node Manager Telemetry (DGX H100)

  • GET /redfish/v1/TelemetryService/MetricReports - List metric reports
  • GET /redfish/v1/TelemetryService/MetricReports/NvidiaNMMetrics_0 - Get NVIDIA Node Manager metrics

HGX Platform EnvironmentMetrics Telemetry (DGX B200, B300, GB200, and GB300)

  • GET /redfish/v1/TelemetryService/MetricReports/HGX_PlatformEnvironmentMetrics_0 - Get consolidated GPU, CPU, memory, and chassis telemetry

Processors (DGX B200, B300, GB200, and GB300)

  • GET /redfish/v1/Systems/HGX_Baseboard_0/Processors - List processors
  • GET /redfish/v1/Systems/HGX_Baseboard_0/Processors/{id} - Get processor

EnvironmentMetrics Power Limits (DGX GB200 and GB300)

  • GET /redfish/v1/Systems/HGX_Baseboard_0/Processors/{processor_id}/EnvironmentMetrics - Get processor power limit state
  • PATCH /redfish/v1/Systems/HGX_Baseboard_0/Processors/{processor_id}/EnvironmentMetrics - Set power limit
  • GET /redfish/v1/Chassis/HGX_ProcessorModule_{id}/EnvironmentMetrics - Get processor module power limit state
  • PATCH /redfish/v1/Chassis/HGX_ProcessorModule_{id}/EnvironmentMetrics - Set module power limit

Workload Power Profiles (DGX B200, B300, GB200, and GB300)

  • GET /redfish/v1/Systems/HGX_Baseboard_0/Processors/{gpu_id}/Oem/Nvidia/WorkloadPowerProfile - Get WPPS configuration
  • POST /redfish/v1/Systems/HGX_Baseboard_0/Processors/{gpu_id}/Oem/Nvidia/WorkloadPowerProfile/Actions/NvidiaWorkloadPower.EnableProfiles - Enable profiles
  • POST /redfish/v1/Systems/HGX_Baseboard_0/Processors/{gpu_id}/Oem/Nvidia/WorkloadPowerProfile/Actions/NvidiaWorkloadPower.DisableProfiles - Disable profiles

References

NVIDIA Platform Documentation

DMTF Redfish Specification