Hot-unplug Devices with Heavy Self-traffic, Guest OS Gets Call Trace
When the guest OS is running heavy traffic (e.g., iperf/iperf3) on a hotplug virtio-net device, unplugging those devices from BlueField side at the same time may results in the guest OS hanging.
The guest OS would print a call traffic similar like the following:
Unexpected TXQ (x) queue failure and sched: RT throttling activated are usually observed.
kernel: pcieport 0000
:e2:01.0
: pciehp: Slot(1
): Card not present
kernel: net ens1f0: Unexpected TXQ (0
) queue failure: -5
kernel: sched: RT throttling activated
kernel: rcu: INFO: rcu_sched self-detected stall on CPU
kernel: rcu: 42
-....: (1
GPs behind) idle=39f/1
/0x4000000000000000
softirq=388460
/388461
fqs=7491
kernel: (t=15000
jiffies g=2309357
q=7886
)
kernel: Sending NMI from CPU 42
to CPUs 25
:
kernel: NMI backtrace for
cpu 25
kernel: CPU: 25
PID: 491
Comm: irq/71
-pciehp Tainted: G OE 5.15
.48
#10
kernel: Hardware name: Dell Inc. PowerEdge R7525/0590KW, BIOS 2.2
.5
04
/08
/2021
kernel: RIP: 0010
:native_queued_spin_lock_slowpath+0x74
/0x230
kernel: Code: 0f ba 2b 08
0f 92
c2 8b 03
0f b6 d2 c1 e2 08
30
e4 09
d0 a9 00
01
ff ff 0f 85
0f 01
00
00
85
c0 74
0e 8b 03
84
c0 74
08
f3 90
<8b> 03
84
c0 75
f8 b8 01
00
00
00
66
89
03
5b 41
5c 41
5d 41
5e 41
kernel: RSP: 0018
:ffffb94988f97988 EFLAGS: 00000202
kernel: RAX: 0000000000000101
RBX: ffff8c2bc56bba80 RCX: ffff8c2bd8192800
kernel: RDX: 0000000000000000
RSI: 0000000000000000
RDI: ffff8c2bc56bba80
kernel: RBP: ffffb94988f979b0 R08: ffff8c2bc7be3000 R09: ffffffff9fa89ef8
kernel: R10: ffffb94988f979b8 R11: 000000000000001f R12: 0000000000000019
kernel: R13: ffff8c2bc56bba80 R14: 0000000000000000
R15: ffff8c2bc7be3000
kernel: FS: 0000000000000000
(0000
) GS:ffff8c339f640000(0000
) knlGS:0000000000000000
kernel: CS: 0010
DS: 0000
ES: 0000
CR0: 0000000080050033
kernel: CR2: 00007ff9c58a8a50 CR3: 0000000dc5010000 CR4: 0000000000350ee0
kernel: Call Trace:
kernel: <TASK>
kernel: _raw_spin_lock+0x1f
/0x30
kernel: dev_deactivate_many+0xf3
/0x2e0
kernel: __dev_close_many+0x7d
/0x120
kernel: dev_close_many+0x7f
/0x120
kernel: ? kernfs_put.part.0
+0xe2
/0x1a0
kernel: unregister_netdevice_many+0x13a
/0x790
kernel: ? idr_find+0xf
/0x20
kernel: unregister_netdevice_queue+0x91
/0xe0
kernel: unregister_netdev+0x1d
/0x30
kernel: virtnet_remove+0x4d
/0x80
[virtio_net]
kernel: virtio_dev_remove+0x4b
/0xa0
kernel: __device_release_driver+0x1a8
/0x290
kernel: device_release_driver+0x29
/0x40
kernel: bus_remove_device+0xde
/0x150
kernel: device_del+0x19c
/0x3f0
kernel: ? __cond_resched+0x1a
/0x50
kernel: device_unregister+0x18
/0x60
kernel: unregister_virtio_device+0x18
/0x30
kernel: virtio_pci_remove+0x41
/0x80
[virtio_pci]
kernel: pci_device_remove+0x3e
/0xb0
kernel: __device_release_driver+0x1a8
/0x290
kernel: device_release_driver+0x29
/0x40
kernel: pci_stop_bus_device+0x71
/0xa0
kernel: pci_stop_and_remove_bus_device+0x13
/0x30
kernel: pciehp_unconfigure_device+0x7e
/0x130
kernel: pciehp_disable_slot+0x6c
/0x100
kernel: pciehp_handle_presence_or_link_change+0xde
/0x2f0
kernel: pciehp_ist+0x197
/0x1a0
kernel: ? irq_forced_thread_fn+0x90
/0x90
kernel: irq_thread_fn+0x28
/0x60
kernel: irq_thread+0xde
/0x1b0
kernel: ? irq_thread_fn+0x60
/0x60
kernel: ? irq_thread_check_affinity+0xf0
/0xf0
kernel: kthread+0x12a
/0x150
kernel: ? set_kthread_struct+0x50
/0x50
kernel: ret_from_fork+0x22
/0x30
kernel: </TASK>
From kernel 5.14, the following patch introduced a while loop for the virtio-net TX path which may enter infinite when VQ is broken (e.g., device is removed) under heavy traffic:
commit a7766ef18b33674fa164e2e2916cef16d4e17f43
Author: Michael S. Tsirkin <mst@redhat
.com>
Date: Tue Apr 13
01
:30
:45
2021
-0400
virtio_net: disable cb aggressively
There are currently two cases where we poll TX vq not in response to a
callback: start xmit and rx napi. We currently do
this
with callbacks
enabled which can cause extra interrupts from the card. Used not to be
a big issue as we run with interrupts disabled but that is no longer the
case
, and in some cases the rate of spurious interrupts is so high
linux detects this
and actually kills the interrupt.
Fix up by disabling the callbacks before polling the tx vq.
Signed-off-by: Michael S. Tsirkin <mst@redhat
.com>
Currently, there is no official fix from the kernel side, some The following workarounds may be employed:
Bring down the network interface corresponds to the hotplug device before unplugging it
Use kernel without the offending kernel patches