NVIDIA BlueField BMC Software v26.01

Storage Health Monitoring

The BMC continuously monitors system resources including CPU utilization, memory usage, and storage space. When these metrics exceed configured thresholds, warning or critical event logs are automatically generated to alert administrators.

Event logs can be viewed via the Redfish API at /redfish/v1/Systems/system/LogServices/EventLog/Entries.

The following table defines the thresholds for CPU utilization alerts:

Metric

Alert Type

Warning

Critical

Action

Description

CPU

Upper

>80%

>95%

Log only

Total BMC CPU utilization

CPU user

Upper

>80%

>95%

Log only

CPU time in user-space applications

CPU kernel

Upper

>80%

>95%

Log only

CPU time in kernel operations

Metric Log Example

To check the current event log for CPU alerts:

Copy
Copied!
            

curl -k -u <usr>:<password> -H 'content-type: application/json' -X GET https://<bmc_ip>/redfish/v1/Systems/Bluefield/LogServices/EventLog/Entries

Example output

Copy
Copied!
            

{       "@odata.id": "/redfish/v1/Systems/Bluefield/LogServices/EventLog/Entries/468",       "@odata.type": "#LogEntry.v1_15_0.LogEntry",       "AdditionalDataURI": "/redfish/v1/Systems/Bluefield/LogServices/EventLog/Entries/468/attachment",       "Created": "2026-01-19T15:57:22+00:00",       "EntryType": "Event",       "Id": "468",       "Message": "CPU sensor crossed a critical high threshold going high. Reading=96.881238 Threshold=95.000000.",       "MessageArgs": [         "CPU",         "96.881238",         "95.000000"       ],       "MessageId": "OpenBMC.0.4.SensorThresholdCriticalHighGoingHigh",       "Name": "System Event Log Entry",       "Resolution": "None",       "Resolved": false,     "Severity": "Critical" }


Metric

Alert Type

Warning

Critical

Action

Description

Memory available

Lower

<30%

<10%

Log only

Memory available for applications

Memory shared

Upper

-

>35%

Log only

Shared memory usage

The BMC system provides real-time monitoring of read-write (RW) flash usage. You can query free storage space, receive notifications when usage crosses defined thresholds, and rely on automatic cleanup when limits are exceeded.

The following table defines the thresholds for storage usage alerts. Note that percentages refer to used space:

Metric

Alert Type

Warning

Critical

Path

Action

Description

Storage RW

Lower

<10%

<5%

/run/initramfs/rw

Auto cleanup

Root overlay filesystem; primary writable storage for BMC runtime data and configuration changes

Storage TMP

Lower

<20%

<5%

/tmp

Log only

Temporary files storage; used by services for transient data, cleared on reboot

Storage LOGGING

Lower

<30%

<20%

/var/lib/logging

Log only

Event logs and dump storage; contains Redfish logs, SEL entries, and debug dumps

Retrieving Free Storage Space

To check the current free RW flash space:

Copy
Copied!
            

curl -k -H "X-Auth-Token: $token" -X GET https://${bmc}/redfish/v1/Managers/Bluefield_BMC/ManagerDiagnosticData'

Example output:

Copy
Copied!
            

{ "@odata.id": "/redfish/v1/Managers/Bluefield_BMC/ManagerDiagnosticData", "@odata.type": "#ManagerDiagnosticData.v1_2_0.ManagerDiagnosticData", "FreeStorageSpaceKiB": 1488, "Id": "ManagerDiagnosticData", "MemoryStatistics": { "AvailableBytes": 725983232, "BuffersAndCacheBytes": 170594304, "FreeBytes": 605347840, "SharedBytes": 60747776, "TotalBytes": 917188608 }, "Name": "Manager Diagnostic Data", "ProcessorStatistics": { "KernelPercent": 0.6058, "UserPercent": 0.5048 }, "ServiceRootUptimeSeconds": 1282378.351 }


Storage Cleanup Notifications

When RW flash usage exceeds 90%, a Redfish event log entry is generated to alert that manual cleanup is required.

Example log:

Copy
Copied!
            

{ "@odata.id": "/redfish/v1/Systems/Bluefield/LogServices/EventLog/Entries/7", "@odata.type": "#LogEntry.v1_15_0.LogEntry", "Created": "2025-09-15T13:30:43+00:00", "EntryType": "Event", "Id": "7", "Message": "Processes consuming HIGH Resource Storage_RW are 91%", "MessageArgs": [ "Storage_RW", "91%" ], "MessageId": "OpenBMC.0.4.BMCSystemResourceInfo", "Name": "System Event Log Entry", "Resolution": "None.", "Resolved": false, "Severity": "OK" }


Automatic Cleanup

When RW flash usage exceeds 95%, the BMC automatically purges space by deleting:

  • All dump files

  • All event logs

  • Files in home directories

  • Files in system log directories

Warning

Exceeding 99% RW flash usage can make BMC functionality unstable and impede automatic cleanup.

After automatic cleanup, a Redfish event log entry is generated. For example:

Copy
Copied!
            

{ "@odata.id": "/redfish/v1/Systems/Bluefield/LogServices/EventLog/Entries/3", "@odata.type": "#LogEntry.v1_15_0.LogEntry", "Created": "2025-09-24T10:40:34+00:00", "EntryType": "Event", "Id": "3", "Message": "RWFS cleanup completed.", "Modified": "2025-09-22T10:40:34+00:00", "Name": "System Event Log Entry", "Resolved": false, "Severity": "OK" }


© Copyright 2026, NVIDIA. Last updated on Feb 28, 2026