What can I help you with?
NMX Manager (NMX-M) Documentation v85.1.2000

Key Performance Indicators (KPI)

The KPI REST endpoint is designed to provide fast and actionable insights into the network cluster's status. This API will offer key performance indicators (KPIs) that give a clear overview of the cluster's health, with the option to dive deeper into specific events. By using this endpoint, NOC operators can quickly identify network issues, assess their severity, and take immediate action, such as opening tickets for further investigation. The KPIs will be crucial for monitoring network performance and ensuring prompt response times.

This KPI is an aggregate across all domains. For troubleshooting, users should access per-domain data using compute-node or switch-node REST API calls.

Copy
Copied!
            

curl --request GET \   --url https://xxxx.xxxx.xxxx.xxxx/nmx/v1/kpis \   --user "<username>:<password>" --insecure 

Copy
Copied!
            

wget --quiet \   --method GET \   --user '<username>' \   --password '<password>' \   --output-document \   - https://xxxx.xxxx.xxxx.xxxx/nmx/v1/kpis --no-check-certificate

Note

Replace <username> with either rw-user (read-write access) or ro-user (read-only access).

Replace <password> with the actual password associated with the user.

REST API supports filtering based on different values of the filter parameter. The allowed values are:

  • Filtering based on Health:

    • HEALTH

    • SWITCH_HEALTH

    • GPU_HEALTH

    • DOMAIN_HEALTH

    • COMPUTE_HEALTH

  • Filtering based on Inventory:

    • INVENTORY

    • COMPUTE_ALLOCATION

    • CONNECTION_COUNT

    • CABLE_TYPE

    • CABLE_PN

    • CABLE_FW_VERSION

    • PORT_COUNT

    • LINK_UP_COUNT

    • LINKDOWN_FREQUENCY

    • LINKDOWN_RATE

    • CHIP_TEMPERATURE

    • EFF_BER

    • SYMBOL_BER

    • RAW_BER

The API mentioned above supports filtering by appending the filter parameter to the endpoint, such as: kpis?filter=HEALTH

Sample Output

  • With filtering:

    curl --request GET --url https://xxxx.xxxx.xxxx.xxxx/nmx/v1/kpis?filter=SWITCH_HEALTH --user "<username>:<password>" --insecure

    Copy
    Copied!
                

    {     "Data": [         {             "HEALTHY": 18         },         {             "MISSING_NVLINK": 0         },         {             "UNHEALTHY": 0         },         {             "UNKNOWN": 0         }     ],     "Description": "Number of switch instances per health state",     "Title": "Switch Health Count",     "Type": "histogram" }

  • Without Filtering:

    curl --request GET --url https://xxxx.xxxx.xxxx.xxxx/nmx/v1/kpis --user "<username>:<password>" --insecure

    Copy
    Copied!
                

    { "Health": { "compute-health": { "Data": [ { "UNKNOWN": 0 }, { "HEALTHY": 9 }, { "DEGRADED": 0 }, { "UNHEALTHY": 0 } ], "Description": "Number of compute nodes per health state", "Title": "Compute Health Count", "Type": "histogram" }, "domain-health": { "Data": [ { "UNKNOWN": 0 }, { "HEALTHY": 1 }, { "DEGRADED": 0 }, { "UNHEALTHY": 0 } ], "Description": "Number of domains per health state", "Title": "Domain Health Count", "Type": "histogram" }, "gpu-health": { "Data": [ { "DEGRADED": 0 }, { "NONVLINK": 0 }, { "UNKNOWN": 0 }, { "DEGRADED_BW": 0 }, { "HEALTHY": 36 } ], "Description": "Number of GPU instances per health state", "Title": "GPU Health Count", "Type": "histogram" }, "switch-health": { "Data": [ { "MISSING_NVLINK": 0 }, { "UNHEALTHY": 0 }, { "UNKNOWN": 0 }, { "HEALTHY": 18 } ], "Description": "Number of switch instances per health state", "Title": "Switch Health Count", "Type": "histogram" } }, "Inventory": { "cable-fw-version": { "Data": [ { "N/A": 1296 } ], "Description": "Count the number of entries for each FWVersion.", "Title": "Cable FWVersion Count", "Type": "histogram" }, "cable-pn": { "Data": [ { "N/A": 1296 } ], "Description": "Count the number of entries for each PN.", "Title": "Cable PN Count", "Type": "histogram" }, "cable-type": { "Data": [ { "850 nm VCSEL": 558 } ], "Description": "Count the number of entries for each cable type.", "Title": "Cable Type Count", "Type": "histogram" }, "chip-temperature": { "Data": [ { "30-40": 50 } ], "Description": "Number of instances per temperature", "Title": "Chip Temperature Count", "Type": "histogram" }, "compute-allocation": { "Data": [ { "FULL": 9 }, { "PARTIAL": 0 }, { "ALL": 9 }, { "FREE": 0 } ], "Description": "Number of compute nodes with its GPU instances in allocation state", "Title": "Compute GPU Allocation Count", "Type": "histogram" }, "connection-count": { "Data": [ { "DISCOVERED": 0 }, { "EXPECTED": 0 }, { "EXPECTED_ACTIVE": 0 }, { "EXPECTED_INACTIVE": 0 }, { "UNEXPECTED": 0 } ], "Description": "Number of connection count per connection attribute", "Title": "Topology connection Count", "Type": "histogram" }, "effective-ber": { "Data": [ { "effective_ber": 0, "node_guid": "0xb0cf0e0300db1be0", "port_guid": "0x00000002251f681b" }, { "effective_ber": 0, "node_guid": "0xb0cf0e0300db19a0", "port_guid": "0x00000002251f681b" }, { "effective_ber": 0, "node_guid": "0xb0cf0e0300db1be0", "port_guid": "0x00000002251f681b" } ], "Description": "List of top 100 port that has the highest EFFECTIVE BER readings", "Title": "Top 100 EFFECTIVE BER Ports", "Type": "histogram" }, "link-down-frequency": { "Data": [ { "SWITCH": 0.00020169558757286254 } ], "Description": "Average time between link down events", "Title": "Link Down Frequency", "Type": "counter" }, "link-down-rate": { "Data": [ { "link_down_rate": 16.616666666666685, "node_guid": "0xb0cf0e0300dafa00", "port_guid": "0x00000002251f681b" }, { "link_down_rate": 16.58333333333335, "node_guid": "0xdf5abd57894e3a50", "port_guid": "0x00000002251f681b" }, { "link_down_rate": 16.53333333333335, "node_guid": "0xdf5abd57894e3a50", "port_guid": "0x00000002251f681b" } ], "Description": "List of top 100 port that has the highest link down events", "Title": "Top 100 Link Down Ports", "Type": "histogram" }, "link-up-count": { "Data": [ { "current": 1296 }, { "min": 0 }, { "max": 1296 } ], "Description": "Number of links in UP state out of total link number.", "Title": "Link UP Count", "Type": "histogram" }, "port-count": { "Data": [ { "GPU": 648 }, { "SWITCH_ACCESS": 648 }, { "SWITCH_TRUNK": 0 }, { "UNDEFINED": 0 } ], "Description": "Number of ports per device type.", "Title": "Port Type Count", "Type": "histogram" }, "raw-ber": { "Data": [ { "node_guid": "0xb0cf0e0300dafb60", "port_guid": "0x00000002251f681b", "raw_ber": 0 }, { "node_guid": "0xb0cf0e0300db1be0", "port_guid": "0x00000002251f681b", "raw_ber": 0 }, { "node_guid": "0xfdece0e67e59176f", "port_guid": "0x00000002251f681b", "raw_ber": 0 } ], "Description": "List of top 100 port that has the highest RAW BER readings", "Title": "Top 100 RAW BER Ports", "Type": "histogram" }, "symbol-ber": { "Data": [ { "node_guid": "0x7e4c3b753098777c", "port_guid": "0x00000002251f681b", "symbol_ber": 0 }, { "node_guid": "0xb0cf0e0300db1940", "port_guid": "0x00000002251f681b", "symbol_ber": 0 }, { "node_guid": "0xb0cf0e0300dafa40", "port_guid": "0x00000002251f681b", "symbol_ber": 0 } ], "Description": "List of top 100 port that has the highest SYMBOL BER readings", "Title": "Top 100 SYMBOL BER Ports", "Type": "histogram" } } }

Note: This is a sample response.

  • Download Grafana

    https://grafana.com/grafana/download

  • Configure JSON API Data Source

    1. Click Connections, then select JSON API from the available options. If it is not installed, install it first.

    2. Click Add a new data source.

    3. Provide the details below:

      • Name: Add the prefix kpi (case sensitive).

        Note: If you do not prefix the name with kpi, update the attached JSON file before importing it into the dashboard.

        image-2025-1-28_17-14-47-version-1-modificationdate-1748507960860-api-v2.png

      • URL. Ex- https://10.xxx.xx.xxx/nmx/v1/kpis

      • Authentication: Select Basic Authentication

        Example:

        - User: rw-user

        - Password: Nmx12345

      • Skip TLS Certificate Validation

    4. Save and Test.

  • Configuring Dashboard

    1. Click DashBoards in home screen.

    2. Click on New → import.

    3. Paste the content of kpi-dashboard.json file under the section "Import via dashboard JSON model".

      kpi-dashboard.json

    4. Click on Load, followed by Import.

  • Sample Dashboard

    image-2025-1-28_17-21-14-version-1-modificationdate-1748507961203-api-v2.png

© Copyright 2025, NVIDIA. Last updated on May 29, 2025.