metrics

dpsctl check metrics Usage Guide

Retrieve node power usage & telemetry metrics.

Usage

dpsctl check metrics

Flags

Includes global dpsctl options.

   --nodes value  nodes to retrieve metrics for
   --help, -h     show help

Examples

$ dpsctl check metrics --nodes node001
{
  "nodes": [
    {
      "name": "node001",
      "num_gpus": 8,
      "num_cpus": 2,
      "num_memory_units": 32,
      "power_usage": 2074.0,
      "gpus": [
        {
          "id": "0",
          "power_usage": 64.0,
          "energy_usage": 195.0,
          "energy_usage_unit": "Joules",
          "set_limit": 700.0,
          "max_limit": 700.0,
          "min_limit": 200.0,
          "temperature_celsius": 45.0
        },
        {
          "id": "1",
          "power_usage": 66.0,
          "energy_usage": 195.0,
          "energy_usage_unit": "Joules",
          "set_limit": 700.0,
          "max_limit": 700.0,
          "min_limit": 200.0,
          "temperature_celsius": 47.0
        }
      ],
      "cpus": [
        {
          "id": "0",
          "power_usage": 195.0,
          "energy_usage": 195.0,
          "energy_usage_unit": "Joules",
          "set_limit": 400.0,
          "max_limit": 350.0,
          "min_limit": 209.0,
          "temperature_celsius": 65.0
        },
        {
          "id": "1",
          "power_usage": 195.0,
          "energy_usage": 195.0,
          "energy_usage_unit": "Joules",
          "set_limit": 400.0,
          "max_limit": 350.0,
          "min_limit": 209.0,
          "temperature_celsius": 63.0
        }
      ],
      "memory": [
        {
          "id": "0",
          "power_usage": 28.0,
          "energy_usage": 28.0,
          "energy_usage_unit": "Joules",
          "set_limit": 300.0,
          "max_limit": 122.0,
          "min_limit": 0.0,
          "temperature_celsius": 35.0
        },
        {
          "id": "1",
          "power_usage": 37.0,
          "energy_usage": 37.0,
          "energy_usage_unit": "Joules",
          "set_limit": 300.0,
          "max_limit": 122.0,
          "min_limit": 0.0,
          "temperature_celsius": 36.0
        }
      ]
    }
  ]
}

Notes

  • Metrics are returned in a standardized format for all node types
  • power_usage shows current power consumption in watts
  • energy_usage shows cumulative energy consumption since last reset
  • set_limit, max_limit, and min_limit show power limit configurations
  • temperature_celsius shows component temperature
  • GPU IDs are typically 0-based within each node
  • CPU and memory components are also 0-based