What can I help you with?
NVIDIA BlueField Virtio-net v24.10

High Availability

High availability (HA) is essential in network infrastructure to ensure continuous performance with minimal downtime, even during failures.

To support HA, the virtio-net-controller process creates the auxiliary processes virtio-net-emu and virtio-net-ha. The virtio-net-emu process handles primary controller functions, while virtio-net-ha manages HA. virtio-net-ha saves and oversees critical resources from virtio-net-emu and restores it to a working state if a failure occurs. The two processes communicate through IPC messages.

ha-diagram-version-1-modificationdate-1731429695699-api-v2.png

Note

High availability is only supported on BlueField-3 and after.

The following table provides possible expected behaviors:

Scenarios

Behavior

Downtime Per Device (sec)

Fallback Action

Virtio-net-emu process crashes (e.g., Segfault)

The virtio-net-ha process tries to automatically recover all devices

< 1

The virtnet restart command if recovery failed

Device/VQ/SF create/destroy failures

HA makes sure the existing device is not affected

N/A

Retry or restart service

DPA command timeout

No action from HA; DPA is likely stuck

N/A

The virtnet restart command

© Copyright 2024, NVIDIA. Last updated on Nov 12, 2024.