Recovery
Recovery is critical for status restoration (both control plane and data plane) for cases such as controller restart, live update, or live migration.
The recovery process relies on JSON files stored in /opt/mellanox/mlnx_virtnet/recovery
, where each device (either PF or VF) has a corresponding file named after its unique VUID.
The following entries are saved to the recovery file and restored when necessary:
Entry |
Type |
Description |
|
String |
RDMA device name the virtio-net device is created on |
|
Number |
ID of PF |
|
Number |
ID of VF, valid for VF only |
|
String |
PF or VF |
|
Number |
Virtio-net device bus:device:function in uint16 type |
|
String |
Static or hotplug (only for PF) |
|
String |
MAC address of device |
|
Number |
PCIe function number |
|
Number |
SF number which was used for this virtio-net device |
|
Number |
Number of multi-queue created for this virtio-net device |
An example of recovery file for a hotplug PF device:
{
"port_ib_dev": "mlx5_0",
"pf_id": 0,
"function_type": "pf",
"bdf_raw": 57611,
"device_type": "hotplug",
"mac": "0c:c4:7a:ff:22:93",
"pf_num": 0,
"sf_num": 2000,
"mq": 3
}
Depending on the actions of the BlueField or host, recovery may or may not be performed. Please refer to the following table for individual scenarios:
DPU Actions |
Host Actions |
|||||||
Restart Controller |
Live Update |
Hot Unplug |
Destroy VFs |
Unload Driver |
Power Cycle Host & DPU |
Warm Reboot |
Live Migration |
|
|
Recover |
Recover |
N/A |
N/A |
Recover |
No recover |
Recover |
Recover |
|
Recover |
Recover |
No recover |
N/A |
Recover |
No recover |
Recover |
Recover |
|
Recover |
Recover |
N/A |
Recovery file deleted |
No Recover |
No recover |
No recover |
Recover |
These recovery files are internal to the controller and should not be modified.
Controller recovery is enabled by default and does not need user configuration or intervention. When the mlxconfig
settings used by the controller take effect, the newly started controller service automatically deletes all recovery files.