Recovery
Recovery is critical for status restoration (both control plane and data plane) for cases such as controller restart, live update, or live migration.
Recovery depends on the JSON files stored in /opt/mellanox/mlnx_virtnet/recovery where there is a file corresponding to each device (either PF or VF). The filename is the unique VUID of the corresponding device.
The following entries are saved to the recovery file and restored when necessary:
Entry |
Type |
Description |
port_ib_dev |
String |
RDMA device name the virtio-net device is created on |
pf_id |
Number |
ID of PF |
vf_id |
Number |
ID of VF, valid for VF only |
function_type |
String |
PF or VF |
bdf_raw |
Number |
Virtio-net device bus:device:function in uint16 type |
device_type |
String |
Static or hotplug (only for PF) |
mac |
String |
MAC address of device |
pf_num |
Number |
PCIe function number |
sf_num |
Number |
SF number which was used for this virtio-net device |
mq |
Number |
Number of multi-queue created for this virtio-net device |
An example of recovery file for a hotplug PF device:
{
"port_ib_dev": "mlx5_0",
"pf_id": 0,
"function_type": "pf",
"bdf_raw": 57611,
"device_type": "hotplug",
"mac": "0c:c4:7a:ff:22:93",
"pf_num": 0,
"sf_num": 2000,
"mq": 3
}
Depending on the actions of the BlueField or host, recovery may or may not be performed. Please refer to the following table for individual scenarios:
DPU Actions |
Host Actions |
|||||||
Restart Controller |
Live Update |
Hot Unplug |
Destroy VFs |
Unload Driver |
Power Cycle Host & DPU |
Warm Reboot |
Live Migration |
|
Static PF |
Recover |
Recover |
N/A |
N/A |
Recover |
No recover |
Recover |
Recover |
Hotplug PF |
Recover |
Recover |
No recover |
N/A |
Recover |
No recover |
Recover |
Recover |
VF |
Recover |
Recover |
N/A |
Recovery file deleted |
No Recover |
No recover |
No recover |
Recover |
These recovery files are internal to the controller and should not be modified.
Controller recovery is enabled by default and does not need user configuration or intervention. When the mlxconfig settings used by the controller take effect, the newly started controller service automatically deletes all recovery files.