Metrics#
The OAM Metrics API is used internally by cuPHY-CP components to report metrics (counters, gauges, and histograms). The metrics are exposed via a Prometheus Aerial exporter.
Host Metrics#
Host metrics are provided via the Prometheus node exporter. The node exporter provides many thousands of metrics about the host hardware and OS, such as but not limited to:
CPU statistics
Disk statistics
Filesystem statistics
Memory statistics
Network statistics
See prometheus/node_exporter and https://prometheus.io/docs/guides/node-exporter/ for detailed documentation on the node exporter.
GPU Metrics#
GPU hardware metrics are provided through the GPU Operator via the Prometheus DCGM-Exporter. The DCGM-Exporter provides many thousands of metrics about the GPU and PCIe bus connection, such as but not limited to:
GPU hardware clock rates
GPU hardware temperatures
GPU hardware power consumption
GPU memory utilization
GPU hardware errors including ECC
PCIe throughput
See NVIDIA/gpu-operator for details on the GPU operator.
See NVIDIA/gpu-monitoring-tools for detailed documentation on the DCGM-Exporter.
An example Grafana dashboard is available at https://grafana.com/grafana/dashboards/12239.
Aerial Metric Naming Conventions#
In addition to metrics available through the node exporter and DCGM-Exporter, Aerial exposes several application metrics.
Metric names are per https://prometheus.io/docs/practices/naming/ and follows the format aerial_<component>_<sub-component>_<metricdescription>_<units>.
Metric types are per https://prometheus.io/docs/concepts/metric_types/.
The component and sub-component definitions are in the table below. For each metric, the description, metric type, and metric tags are provided. Tags are a way of providing granularity to metrics without creating new metrics.
Comp onent |
Sub -Component |
Description |
|---|---|---|
cuphycp |
cuPHY Control Plane application |
|
fapi |
L2/L1 interface metrics |
|
cplane |
Fronthaul C-plane metrics |
|
uplane |
Fronthaul U-plane metrics |
|
net |
Generic network interface metrics |
|
cuphy |
cuPHY L1 library |
|
pbch |
Physical Broadcast Channel metrics |
|
pdsch |
Physical Downlink Shared Channel metrics |
|
pdcch |
Physical Downlink Common Channel metrics |
|
pusch |
Physical Uplink Shared Channel metrics |
|
pucch |
Physical Uplink Common Channel metrics |
|
prach |
Physical Random Access Channel metrics |
Metrics Exporter Port#
Aerial metrics are exported on port 8081. Configurable in cuphycontroller YAML file via ‘aerial_metrics_backend_address’.
L2/L1 Interface Metrics#
aerial_cuphycp_slots_total#
Counts the total number of processed slots.
Metric type: counter
Metric tags:
type: “UL” or “DL”
cell: “cell number”
aerial_cuphycp_fapi_rx_packets#
Counts the total number of messages L1 receives from L2.
Metric type: counter
Metric tags:
msg_type: “type of PDU”
cell: “cell number”
aerial_cuphycp_fapi_tx_packets#
Counts the total number of messages L1 transmits to L2.
Metric type: counter
Metric tags:
msg_type: “type of PDU”
cell: “cell number”
Fronthaul Interface Metrics#
aerial_cuphycp_cplane_tx_packets_total#
Counts the total number of C-plane packets transmitted by L1 over O-RAN Fronthaul interface.
Metric type: counter
Metric tags:
cell: “cell number”
aerial_cuphycp_cplane_tx_bytes_total#
Counts the total number of C-plane bytes transmitted by L1 over O-RAN Fronthaul interface.
Metric type: counter
Metric tags:
cell: “cell number”
aerial_cuphycp_uplane_rx_packets_total#
Counts the total number of U-plane packets received by L1 over O-RAN Fronthaul interface.
Metric type: counter
Metric tags:
cell: “cell number”
aerial_cuphycp_uplane_rx_bytes_total#
Counts the total number of U-plane bytes received by L1 over O-RAN Fronthaul interface.
Metric type: counter
Metric tags:
cell: “cell number”
aerial_cuphycp_uplane_tx_packets_total#
Counts the total number of U-plane packets transmitted by L1 over O-RAN Fronthaul interface.
Metric type: counter
Metric tags:
cell: “cell number”
aerial_cuphycp_uplane_tx_bytes_total#
Counts the total number of U-plane bytes transmitted by L1 over O-RAN Fronthaul interface.
Metric type: counter
Metric tags:
cell: “cell number”
aerial_cuphycp_uplane_lost_prbs_total#
Counts the total number of PRBs expected but not received by L1 over O-RAN Fronthaul interface.
Metric type: counter
Metric tags:
cell: “cell number”
channel: One of “prach” or “pusch”
NIC Metrics#
aerial_cuphycp_net_rx_failed_packets_total#
Counts the total number of erroneous packets received.
Metric type: counter
Metric tags:
nic: “nic port BDF address”
aerial_cuphycp_net_rx_nombuf_packets_total#
Counts the total number of receive packets dropped due to the lack of free mbufs.
Metric type: Counter
Metric tags:
nic: “nic port BDF address”
aerial_cuphycp_net_rx_dropped_packets_total#
Counts the total number of receive packets dropped by the NIC hardware.
Metric type: Counter
Metric tags:
nic: “nic port BDF address”
aerial_cuphycp_net_tx_failed_packets_total#
Counts the total number of instances a packet failed to transmit.
Metric type: Counter
Metric tags:
nic: “nic port BDF address”
aerial_cuphycp_net_tx_accu_sched_missed_interrupt_errors_total#
Counts the total number of instances accurate send scheduling missed an interrupt.
Metric type: Counter
Metric tags:
nic: “nic port BDF address”
aerial_cuphycp_net_tx_accu_sched_rearm_queue_errors_total#
Counts the total number of accurate send scheduling rearm queue errors.
Metric type: Counter
Metric tags:
nic: “nic port BDF address”
aerial_cuphycp_net_tx_accu_sched_clock_queue_errors_total#
Counts the total number accurate send scheduling clock queue errors.
Metric type: Counter
Metric tags:
nic: “nic port BDF address”
aerial_cuphycp_net_tx_accu_sched_timestamp_past_errors_total#
Counts the total number of accurate send scheduling timestamp in the past errors.
Metric type: Counter
Metric tags:
nic: “nic port BDF address”
aerial_cuphycp_net_tx_accu_sched_timestamp_future_errors_total#
Counts the total number of accurate send scheduling timestamp in the future errors.
Metric type: Counter
Metric tags:
nic: “nic port BDF address”
aerial_cuphycp_net_tx_accu_sched_clock_queue_jitter_ns#
Current measurement of accurate send scheduling clock queue jitter, in units of nanoseconds.
Metric type: Gauge
Metric tags:
nic: “nic port BDF address”
Details:
This gauge shows the TX scheduling timestamp jitter, that is, how far each individual Clock Queue (CQ) completion is from UTC time.
tx_pp_jitter is the time difference between two consecutive CQ completions.
aerial_cuphycp_net_tx_accu_sched_clock_queue_wander_ns#
Current measurement of the divergence of Clock Queue (CQ) completions from UTC time over a longer time period (~8s).
Metric type: Gauge
Metric tags:
nic: “nic port BDF address”
Application Performance Metrics#
aerial_cuphycp_slot_processing_duration_us#
Counts the total number of slots with GPU processing duration in each 250us-wide histogram bin.
Metric type: Histogram
Metric tags:
cell: “cell number”
channel: one of “pbch”, “pdcch”, “pdsch”, “prach”, or “pusch”
le: histogram less-than-or-equal-to 250us-wide histogram bins, for 250, 500, …, 2000, +inf bins.
aerial_cuphycp_slot_pusch_processing_duration_us#
Counts the total number of PUSCH slots with GPU processing duration in each 250us-wide histogram bin.
Metric type: Histogram
Metric tags:
cell: “cell number”
le: histogram less-than-or-equal-to 250us-wide histogram bins, range 0 to 2000us.
aerial_cuphycp_pusch_rx_tb_bytes_total#
Counts the total number of transport block bytes received in the PUSCH channel.
Metric type: Counter
Metric tags:
cell: “cell number”
aerial_cuphycp_pusch_rx_tb_total#
Counts the total number of transport blocks received in the PUSCH channel.
Metric type: Counter
Metric tags:
cell: “cell number”
aerial_cuphycp_pusch_rx_tb_crc_error_total#
Counts the total number of transport blocks received with CRC errors in the PUSCH channel.
Metric type: Counter
Metric tags:
cell: “cell number”
aerial_cuphycp_pusch_nrofuesperslot#
Counts the total number of UEs processed in each slot per histogram bin PUSCH channel.
Metric type: Histogram
Metric tags:
cell: “cell number”
le: Histogram bin less-than-or-equal-to for 2, 4, …, 24, +inf bins.
PRACH Metrics#
aerial_cuphy_prach_rx_preambles_total#
Counts the total number of detected preambles in PRACH channel.
Metric type: Counter
Metric tags:
cell: “cell number”
PDSCH Metrics#
aerial_cuphycp_slot_pdsch_processing_duration_us#
Counts the total number of PDSCH slots with GPU processing duration in each 250us-wide histogram bin.
Metric type: Histogram
Metric tags:
cell: “cell number”
le: histogram less-than-or-equal-to 250us-wide histogram bins, range 0 to 2000us.
aerial_cuphy_pdsch_tx_tb_bytes_total#
Counts the total number of transport block bytes transmitted in the PDSCH channel.
Metric type: Counter
Metric tags:
cell: “cell number”
aerial_cuphy_pdsch_tx_tb_total#
Counts the total number of transport blocks transmitted in the PDSCH channel.
Metric type: Counter
Metric tags:
cell: “cell number”
aerial_cuphycp_pdsch_nrofuesperslot#
Counts the total number of UEs processed in each slot per histogram bin PDSCH channel.
Metric type: Histogram
Metric tags:
cell: “cell number”
le: Histogram bin less-than-or-equal-to for 2, 4, …, 24, +inf bins.