NVIDIA UFM Enterprise REST API Guide v6.18.0
NVIDIA UFM Enterprise REST API Guide v6.18.0

Telemetry REST API

  • Description – returns information whether the feature is enabled or not

  • Request URL – GET / ufmRest/app/ufm_config

  • Request Content Type – Application/json

  • Response

    Copy
    Copied!
                

    { "ls_auditing": "Disabled", "monitoring_mode": "Disabled", "syslog": "Disabled", "license_state": "valid", "license_state_info": "N\/A", "telemetry": "<telemetry_status>"(Enabled/Disabled) }

  • Description – returns information on the Top X telemetry session

  • Request URL – GET /ufmRest/telemetry?type=topX&membersType=Ports&PickBy=PortTXPackets&limit=15&attributes=[additonal_attributes]

  • Request Content Type – Application/json

  • Response

    Copy
    Copied!
                

    [ { "name": "r-dmz-ufm131 mlx5_0", "guid": "0c42a103008b3bd0_1", "PortRcvPktsExtended_Rate": 1993291398.4024506, "phy_received_bits_Rate": 1993291398.4024506, "PortRcvDataExtended_Rate": 7973165593.609802 }, { "name": "r-dmz-ufm131 mlx5_1", "guid": "0c42a103008b3bd1_2", "PortRcvPktsExtended_Rate": 1993289961.4256535, "phy_received_bits_Rate": 1993289961.4256535, "PortRcvDataExtended_Rate": 7973159845.702614 } ]

  • Description – returns information on the history telemetry session
  • Request URL – GET /ufmRest/telemetry?type=history&membersType=Ports&attributes=[attributes_list]&members=[members_list_guids]&start_time=-1h&end_time=-0min

    Note

    http://localhost:4300/ufmRestV2/telemetry?type=history&membersType=Device&attributes=[Infiniband_PckInRate]&result_format=Port&members=[ec0d9a03007d7f0a]&start_time=-5min&end_time=-0min

  • Request Content Type – Application/json
  • Response

    Copy
    Copied!
                

    { 'data': { '2021-12-01 19:12:36': { 'Port': { 'ec0d9a03007d7f0a_1': { 'statistics': {'Infiniband_PckInRate': 1.0}, 'guid': 'ec0d9a03007d7f0a_1', 'name': 'ufm-host87 mlx5_0' } } } }, 'members': [{ 'description': 'Computer IB Port', 'number': 1, 'external_number': 1, 'physical_state': 'Link Up', 'path': 'default \/ Computer: ufm-host87 \/ HCA-1\/1', 'tier': 1, 'high_ber_severity': 'N\/A', 'lid': 1, 'mirror': 'disable', 'logical_state': 'Active', 'capabilities': ['healthy_operations', 'reset', 'disable'], 'mtu': 4096, 'peer_port_dname': '11', 'severity': 'Info', 'active_speed': 'EDR', 'enabled_speed': ['SDR', 'DDR', 'QDR', 'FDR', 'EDR'], 'supported_speed': ['SDR', 'DDR', 'QDR', 'FDR', 'EDR'], 'active_width': '4x', 'enabled_width': ['1x', '4x'], 'supported_width': ['1x', '4x'], 'dname': 'HCA-1\/1', 'peer_node_name': 'switchib', 'peer': 'ec0d9a030029dba0_11', 'peer_node_guid': 'ec0d9a030029dba0', 'systemID': 'ec0d9a03007d7f0a', 'node_description': 'ufm-host87 mlx5_0', 'name': 'ec0d9a03007d7f0a_1', 'module': 'N\/A', 'peer_lid': 5, 'peer_guid': 'ec0d9a030029dba0', 'peer_node_description': 'switchib:11', 'guid': 'ec0d9a03007d7f0a', 'system_name': 'ufm-host87', 'system_ip': '0.0.0.0', 'peer_ip': '0.0.0.0', 'system_capabilities': ['fw_inband_upgrade', 'mark_device_unhealthy'], 'system_mirroring_template': false }] }

    Possible Attribute Values

    The below are all the available values of the Monitoring attributes.

    • Monitor Class the selected object type for monitoring
    • Monitor Attributes – the selected attributes (counters) for monitoring the monitored objects

    Attribute

    Value

    Description

    Monitoring class

    "Device"

    General device in the fabric (can be switch/ host/bridge, etc.)

    "Port"

    Represents a physical port in the fabric

    Monitor attributes

    "Infiniband_MBOut"

    "Infiniband_MBOutRate"*

    Total number of data octets, divided by 4, transmitted on all VLs from the port, including all octets between (and not including) the start of packet delimiter and the VCRC, and may include packets containing errors.

    All link packets are excluded. Results are reported as a multiple of four octets

    "Infiniband_MBIn"

    "Infiniband_MBInRate"*

    Total number of data octets, divided by 4, received on all VLs at the port.

    All octets between (and not including) the start of packet delimiter and the VCRC are excluded, and may include packets containing errors.

    All link packets are excluded. When the received packet length exceeds the maximum allowed packet length specified in C7-45, the counter may include all data octets exceeding this limit. Results are reported as a multiple of four octets

    "Infiniband_PckOut"

    "Infiniband_PckOutRate"*

    Total number of packets transmitted on all VLs from the port, including packets with errors, and excluding link packets

    "Infiniband_PckIn"

    "Infiniband_PckInRate"*

    Total number of packets, including packets containing errors and excluding link packets, received from all VLs on the port

    "Infiniband_RcvErrors"

    Total number of packets containing errors that were received on the port including:

    • Local physical errors (ICRC, VCRC, LPCRC, and all physical errors that cause entry into the BAD PACKET or BAD PACKET DISCARD states of the packet receiver state machine).

    • Malformed data packet errors (LVer, length, VL).

    • Malformed link packet errors (operand, length, VL).

    • Packets discarded due to buffer overrun (overflow).

    "Infiniband_XmtDiscards"

    Total number of outbound packets discarded by the port when the port is down or congested for the following reasons:

    • Output port is not in the active state

    • Packet length has exceeded NeighborMTU

    • Switch Lifetime Limit exceeded

    • Switch HOQ Lifetime Limit exceeded, including packets discarded while in VLStalled State

    "Infiniband_SymbolErrors"

    Total number of minor link errors detected on one or more physical lanes

    "Infiniband_LinkRecovers"

    Total number of times the Port Training state machine has successfully completed the link error recovery process

    "Infiniband_LinkDowned"

    Total number of times the Port Training state machine has failed the link error recovery process and downed the link

    "Infiniband_LinkIntegrityErrors"

    The number of times that the count of local physical errors exceeded the threshold specified by LocalPhyErrors

    "Infiniband_RcvRemotePhysErrors"

    Total number of packets marked with the EBP delimiter received on the port

    "Infiniband_XmtConstraintErrors"

    Total number of packets not transmitted from the switch physical port for the following reasons:

    • FilterRawOutbound is true and packet is raw.

    • PartitionEnforcementOutbound is true and packet fails partition key check or IP version check

    "Infiniband_RcvConstraintErrors"

    Total number of packets received on the switch physical port that are discarded for the following reasons:

    • FilterRawInbound is true and packet is raw

    • PartitionEnforcementInbound is true and packet fails partition key check or IP version check

    "Infiniband_ExcBufOverrunErrors"

    The number of times that OverrunErrors consecutive flow control update periods occurred, each having at least one overrun error

    "Infiniband_RcvSwRelayErrors"

    Total number of packets received on the port that were discarded when they could not be forwarded by the switch relay for the following reasons:

    • DLID mapping

    • VL mapping

    • Looping (output port = input port)

    "Infiniband_VL15Dropped"

    Number of incoming VL15 packets dropped because of resource limitations (e.g., lack of buffers) in the port

    "Infiniband_XmitWait"

    The number of ticks during which the port selected by PortSelect had data to transmit but no data was sent during the entire tick because of insufficient credits or of lack of arbitration

    "Infiniband_CBW"

    Congestion bandwidth rate, measure the rate of congestion measured by XmitWait counter

    "Infiniband_Normalized_MBOut"

    Effective port bandwidth utilization in %

    XmitData incremental/Link Capacity

    "Infiniband_Normalized_CBW"

    Amount of bandwidth that was suppressed due to congestion (XmitWait incremental/Time) * Link Capacity

    Separate counters are used for Tier 4 ports and for the rest of the ports

    "Infiniband_NormalizedXW"

    Congestion in relation to packets transmitted over the link XmitWait incremental / XmitPackets incremental.

    This event is calculated only for the port directly connected to receiving hosts.

    Separate counters are used for Tier 4 ports and for the rest of the ports

    Note

    *Rate Counter – Counter value that is calculated based on the delta from the previous sampled value divided by elapsed time from previous sample (the ratio between two sequential samples).

© Copyright 2024, NVIDIA. Last updated on Aug 29, 2024.