Key Performance Indicators (KPI)
The KPI REST endpoint is designed to provide fast and actionable insights into the network cluster's status. This API will offer key performance indicators (KPIs) that give a clear overview of the cluster's health, with the option to dive deeper into specific events. By using this endpoint, NOC operators can quickly identify network issues, assess their severity, and take immediate action, such as opening tickets for further investigation. The KPIs will be crucial for monitoring network performance and ensuring prompt response times.
This KPI is an aggregate across all domains. For troubleshooting, users should access per-domain data using compute-node or switch-node REST API calls.
curl --request GET \
--url https://xxxx.xxxx.xxxx.xxxx/nmx/v1/kpis \
--user "<username>:<password>"
--insecure
wget --quiet \
--method GET \
--user '<username>'
\
--password '<password>'
\
--output-document \
- https://xxxx.xxxx.xxxx.xxxx/nmx/v1/kpis --no-check-certificate
Replace <username>
with either rw-user
(read-write access) or ro-user
(read-only access).
Replace <password>
with the actual password associated with the user.
REST API supports filtering based on different values of the filter parameter. The allowed values are:
Filtering based on Health:
HEALTH
SWITCH_HEALTH
GPU_HEALTH
DOMAIN_HEALTH
COMPUTE_HEALTH
Filtering based on Inventory:
INVENTORY
COMPUTE_ALLOCATION
CONNECTION_COUNT
CABLE_TYPE
CABLE_PN
CABLE_FW_VERSION
PORT_COUNT
LINK_UP_COUNT
LINKDOWN_FREQUENCY
LINKDOWN_RATE
CHIP_TEMPERATURE
EFF_BER
SYMBOL_BER
RAW_BER
The API mentioned above supports filtering by appending the filter parameter to the endpoint, such as: kpis?filter=HEALTH
Sample Output
With filtering:
curl --request GET --url https://xxxx.xxxx.xxxx.xxxx/nmx/v1/kpis?filter=SWITCH_HEALTH --user "<username>:<password>" --insecure
{
"Data"
: [ {"HEALTHY"
:18
}, {"MISSING_NVLINK"
:0
}, {"UNHEALTHY"
:0
}, {"UNKNOWN"
:0
} ],"Description"
:"Number of switch instances per health state"
,"Title"
:"Switch Health Count"
,"Type"
:"histogram"
}Without Filtering:
curl --request GET --url https://xxxx.xxxx.xxxx.xxxx/nmx/v1/kpis --user "<username>:<password>" --insecure
{
"Health"
: {"compute-health"
: {"Data"
: [ {"UNKNOWN"
:0
}, {"HEALTHY"
:9
}, {"DEGRADED"
:0
}, {"UNHEALTHY"
:0
} ],"Description"
:"Number of compute nodes per health state"
,"Title"
:"Compute Health Count"
,"Type"
:"histogram"
},"domain-health"
: {"Data"
: [ {"UNKNOWN"
:0
}, {"HEALTHY"
:1
}, {"DEGRADED"
:0
}, {"UNHEALTHY"
:0
} ],"Description"
:"Number of domains per health state"
,"Title"
:"Domain Health Count"
,"Type"
:"histogram"
},"gpu-health"
: {"Data"
: [ {"DEGRADED"
:0
}, {"NONVLINK"
:0
}, {"UNKNOWN"
:0
}, {"DEGRADED_BW"
:0
}, {"HEALTHY"
:36
} ],"Description"
:"Number of GPU instances per health state"
,"Title"
:"GPU Health Count"
,"Type"
:"histogram"
},"switch-health"
: {"Data"
: [ {"MISSING_NVLINK"
:0
}, {"UNHEALTHY"
:0
}, {"UNKNOWN"
:0
}, {"HEALTHY"
:18
} ],"Description"
:"Number of switch instances per health state"
,"Title"
:"Switch Health Count"
,"Type"
:"histogram"
} },"Inventory"
: {"cable-fw-version"
: {"Data"
: [ {"N/A"
:1296
} ],"Description"
:"Count the number of entries for each FWVersion."
,"Title"
:"Cable FWVersion Count"
,"Type"
:"histogram"
},"cable-pn"
: {"Data"
: [ {"N/A"
:1296
} ],"Description"
:"Count the number of entries for each PN."
,"Title"
:"Cable PN Count"
,"Type"
:"histogram"
},"cable-type"
: {"Data"
: [ {"850 nm VCSEL"
:558
} ],"Description"
:"Count the number of entries for each cable type."
,"Title"
:"Cable Type Count"
,"Type"
:"histogram"
},"chip-temperature"
: {"Data"
: [ {"30-40"
:50
} ],"Description"
:"Number of instances per temperature"
,"Title"
:"Chip Temperature Count"
,"Type"
:"histogram"
},"compute-allocation"
: {"Data"
: [ {"FULL"
:9
}, {"PARTIAL"
:0
}, {"ALL"
:9
}, {"FREE"
:0
} ],"Description"
:"Number of compute nodes with its GPU instances in allocation state"
,"Title"
:"Compute GPU Allocation Count"
,"Type"
:"histogram"
},"connection-count"
: {"Data"
: [ {"DISCOVERED"
:0
}, {"EXPECTED"
:0
}, {"EXPECTED_ACTIVE"
:0
}, {"EXPECTED_INACTIVE"
:0
}, {"UNEXPECTED"
:0
} ],"Description"
:"Number of connection count per connection attribute"
,"Title"
:"Topology connection Count"
,"Type"
:"histogram"
},"effective-ber"
: {"Data"
: [ {"effective_ber"
:0
,"node_guid"
:"0xb0cf0e0300db1be0"
,"port_guid"
:"0x00000002251f681b"
}, {"effective_ber"
:0
,"node_guid"
:"0xb0cf0e0300db19a0"
,"port_guid"
:"0x00000002251f681b"
}, {"effective_ber"
:0
,"node_guid"
:"0xb0cf0e0300db1be0"
,"port_guid"
:"0x00000002251f681b"
} ],"Description"
:"List of top 100 port that has the highest EFFECTIVE BER readings"
,"Title"
:"Top 100 EFFECTIVE BER Ports"
,"Type"
:"histogram"
},"link-down-frequency"
: {"Data"
: [ {"SWITCH"
:0.00020169558757286254
} ],"Description"
:"Average time between link down events"
,"Title"
:"Link Down Frequency"
,"Type"
:"counter"
},"link-down-rate"
: {"Data"
: [ {"link_down_rate"
:16.616666666666685
,"node_guid"
:"0xb0cf0e0300dafa00"
,"port_guid"
:"0x00000002251f681b"
}, {"link_down_rate"
:16.58333333333335
,"node_guid"
:"0xdf5abd57894e3a50"
,"port_guid"
:"0x00000002251f681b"
}, {"link_down_rate"
:16.53333333333335
,"node_guid"
:"0xdf5abd57894e3a50"
,"port_guid"
:"0x00000002251f681b"
} ],"Description"
:"List of top 100 port that has the highest link down events"
,"Title"
:"Top 100 Link Down Ports"
,"Type"
:"histogram"
},"link-up-count"
: {"Data"
: [ {"current"
:1296
}, {"min"
:0
}, {"max"
:1296
} ],"Description"
:"Number of links in UP state out of total link number."
,"Title"
:"Link UP Count"
,"Type"
:"histogram"
},"port-count"
: {"Data"
: [ {"GPU"
:648
}, {"SWITCH_ACCESS"
:648
}, {"SWITCH_TRUNK"
:0
}, {"UNDEFINED"
:0
} ],"Description"
:"Number of ports per device type."
,"Title"
:"Port Type Count"
,"Type"
:"histogram"
},"raw-ber"
: {"Data"
: [ {"node_guid"
:"0xb0cf0e0300dafb60"
,"port_guid"
:"0x00000002251f681b"
,"raw_ber"
:0
}, {"node_guid"
:"0xb0cf0e0300db1be0"
,"port_guid"
:"0x00000002251f681b"
,"raw_ber"
:0
}, {"node_guid"
:"0xfdece0e67e59176f"
,"port_guid"
:"0x00000002251f681b"
,"raw_ber"
:0
} ],"Description"
:"List of top 100 port that has the highest RAW BER readings"
,"Title"
:"Top 100 RAW BER Ports"
,"Type"
:"histogram"
},"symbol-ber"
: {"Data"
: [ {"node_guid"
:"0x7e4c3b753098777c"
,"port_guid"
:"0x00000002251f681b"
,"symbol_ber"
:0
}, {"node_guid"
:"0xb0cf0e0300db1940"
,"port_guid"
:"0x00000002251f681b"
,"symbol_ber"
:0
}, {"node_guid"
:"0xb0cf0e0300dafa40"
,"port_guid"
:"0x00000002251f681b"
,"symbol_ber"
:0
} ],"Description"
:"List of top 100 port that has the highest SYMBOL BER readings"
,"Title"
:"Top 100 SYMBOL BER Ports"
,"Type"
:"histogram"
} } }
Note: This is a sample response.
Download Grafana
https://grafana.com/grafana/downloadConfigure JSON API Data Source
Click Connections, then select JSON API from the available options. If it is not installed, install it first.
Click Add a new data source.
Provide the details below:
Name: Add the prefix kpi (case sensitive).
Note: If you do not prefix the name with kpi, update the attached JSON file before importing it into the dashboard.
URL. Ex- https://10.xxx.xx.xxx/nmx/v1/kpis
Authentication: Select Basic Authentication
Example:
- User: rw-user
- Password: Nmx12345
Skip TLS Certificate Validation
Save and Test.
Configuring Dashboard
Click DashBoards in home screen.
Click on New → import.
Paste the content of
kpi-dashboard.json
file under the section "Import via dashboard JSON model".Click on Load, followed by Import.
Sample Dashboard