What can I help you with?
NMX Manager (NMX-M) Documentation v85.1.1000

Key Performance Indicators (KPI)

The KPI REST endpoint is designed to provide fast and actionable insights into the network cluster's status. This API will offer key performance indicators (KPIs) that give a clear overview of the cluster's health, with the option to dive deeper into specific events. By using this endpoint, NOC operators can quickly identify network issues, assess their severity, and take immediate action, such as opening tickets for further investigation. The KPIs will be crucial for monitoring network performance and ensuring prompt response times.

Copy
Copied!
            

curl --request GET \   --url https://10.xxxxx.xxx/nmx/v1/kpis \   --user "rw-user:Nmx12345"–insecure 

Copy
Copied!
            

wget --quiet \   --method GET \   --user 'rw-user' \   --password 'Nmx12345' \   --output-document \   - https://10.xxx.xx.xxx/nmx/v1/kpis --no-check-certificate

Note

The "rw-user" is the user with read-write access, and the password is "Nmx12345".

REST API supports filtering based on different values of the filter parameter. The allowed values are:

  • Filtering based on Health:

    • HEALTH

    • SWITCH_HEALTH

    • GPU_HEALTH

    • DOMAIN_HEALTH

    • COMPUTE_HEALTH

  • Filtering based on Inventory:

    • INVENTORY

    • COMPUTE_ALLOCATION

The API mentioned above supports filtering by appending the filter parameter to the endpoint, such as: kpis?filter=HEALTH

Sample Output

  • With filtering - kpis?filter=SWITCH_HEALTH

    Copy
    Copied!
                

    {     "Data": [         {             "HEALTHY": 18         },         {             "MISSING_NVLINK": 0         },         {             "UNHEALTHY": 0         },         {             "UNKNOWN": 0         }     ],     "Description": "Number of switch instances per health state",     "Title": "Switch Health Count",     "Type": "histogram" }

  • Without Filtering - kpis

    Copy
    Copied!
                

    {     "Health": {         "compute-health": {             "Data": [                 {                     "HEALTHY": 9                 },                 {                     "DEGRADED": 0                 },                 {                     "UNHEALTHY": 0                 },                 {                     "UNKNOWN": 0                 }             ],             "Description": "Number of compute nodes per health state",             "Title": "Compute Health Count",             "Type": "histogram"         },         "domain-health": {             "Data": [                 {                     "HEALTHY": 1                 },                 {                     "DEGRADED": 0                 },                 {                     "UNHEALTHY": 0                 },                 {                     "UNKNOWN": 0                 }             ],             "Description": "Number of domains per health state",             "Title": "Domain Health Count",             "Type": "histogram"         },         "gpu-health": {             "Data": [                 {                     "HEALTHY": 36                 },                 {                     "NONVLINK": 0                 },                 {                     "DEGRADED": 0                 },                 {                     "UNKNOWN": 0                 }, { "DEGRADED_BW": 0 }             ],             "Description": "Number of GPU instances per health state",             "Title": "GPU Health Count",             "Type": "histogram"         },         "switch-health": {             "Data": [                 {                     "HEALTHY": 18                 },                 {                     "MISSING_NVLINK": 0                 },                 {                     "UNHEALTHY": 0                 },                 {                     "UNKNOWN": 0                 }             ],             "Description": "Number of switch instances per health state",             "Title": "Switch Health Count",             "Type": "histogram"         }     },     "Inventory": {         "compute-allocation": {             "Data": [                 {                     "ALL": 9                 },                 {                     "FREE": 0                 },                 {                     "FULL": 9                 },                 {                     "PARTIAL": 0                 }             ],             "Description": "Number of compute nodes with its GPU instances in allocation state",             "Title": "Compute GPU Allocation Count",             "Type": "histogram"         }     } }

  • Configure JSON API Data Source

    1. Click Connections, then select JSON API from the available options. If it is not installed, install it first.

    2. Click Add a new data source.

    3. Provide the details below:

      • Name: Add the prefix kpi (case sensitive).

        Note: If you do not prefix the name with kpi, update the attached JSON file before importing it into the dashboard.

        image-2025-1-28_17-14-47-version-1-modificationdate-1742840459784-api-v2.png

      • URL. Ex- https://10.xxx.xx.xxx/nmx/v1/kpis

      • Authentication: Select Basic Authentication

        Example:

        - User: rw-user

        - Password: Nmx12345

      • Skip TLS Certificate Validation

    4. Save and Test.

  • Configuring Dashboard

    1. Click DashBoards in home screen.

    2. Click on New → import.

    3. Paste the content of kpi-dashboard.json file under the section "Import via dashboard JSON model".

      kpi-dashboard.json

    4. Click on Load, followed by Import.

  • Sample Dashboard

    image-2025-1-28_17-21-14-version-1-modificationdate-1742840459566-api-v2.png

© Copyright 2025, NVIDIA. Last updated on Mar 24, 2025.