Hot-unplug Devices with Heavy Self-traffic, Guest OS Gets Call Trace
When the guest OS is running heavy traffic (e.g., iperf/iperf3) on a hotplug virtio-net device, unplugging those devices from BlueField side at the same time may results in the guest OS hanging.
The guest OS would print a call traffic similar like the following:
[ 203.886218
] CPU: 35
PID: 3077
Comm: iperf3 Not tainted 6.6
.0
#1
[ 203.886222
] Hardware name: Dell Inc. PowerEdge R7525/0590KW, BIOS 2.2
.5
04
/08
/2021
[ 203.886224
] RIP: 0010
:free_old_xmit_skbs+0x5d
/0xf0
[virtio_net]
[ 203.886247
] Code: 41
f6 c4 01
75
75
66
90
44
89
fe 4c 89
e7 45
03
6c 24
70
e8 65
1a 0a f0 83
c3 01
49
8b 3e 48
8d 75
cc e8 26
21
d1 ef 49
89
c4 <48
> 85
c0 75
d1 85
db 74
0e 4d 01
ae 80
02
00
00
49
01
9e 78
02
00
[ 203.886249
] RSP: 0018
:ffffac62cb837678 EFLAGS: 00000246
[ 203.886253
] RAX: 0000000000000000
RBX: 0000000000000000
RCX: ffff9a35e7dbc000
[ 203.886255
] RDX: 0000000000000000
RSI: ffffac62cb83767c RDI: ffff9a2e5e7d8900
[ 203.886257
] RBP: ffffac62cb8376b0 R08: 0000000000000000
R09: 000000000003b2f0
[ 203.886259
] R10: ffff9a2e4a570b00 R11: 000000000000000c R12: 0000000000000000
[ 203.886261
] R13: 0000000000000000
R14: ffff9a2e62a48800 R15: 0000000000000000
[ 203.886263
] FS: 00007f8444643400(0000
) GS:ffff9a359f2c0000(0000
) knlGS:0000000000000000
[ 203.886266
] CS: 0010
DS: 0000
ES: 0000
CR0: 0000000080050033
[ 203.886268
] CR2: 000056277998d028 CR3: 0000000127976000
CR4: 0000000000350ee0
[ 203.886270
] Call Trace:
[ 203.886274
] <NMI>
[ 203.886277
] ? show_regs+0x6e
/0x80
[ 203.886289
] ? nmi_cpu_backtrace+0xb1
/0x120
[ 203.886298
] ? nmi_cpu_backtrace_handler+0x15
/0x20
[ 203.886305
] ? nmi_handle+0x6b
/0x180
[ 203.886310
] ? default_do_nmi+0x45
/0x120
[ 203.886316
] ? exc_nmi+0x142
/0x1c0
[ 203.886319
] ? end_repeat_nmi+0x16
/0x67
[ 203.886328
] ? free_old_xmit_skbs+0x5d
/0xf0
[virtio_net]
[ 203.886334
] ? free_old_xmit_skbs+0x5d
/0xf0
[virtio_net]
[ 203.886341
] ? free_old_xmit_skbs+0x5d
/0xf0
[virtio_net]
[ 203.886347
] </NMI>
[ 203.886348
] <TASK>
[ 203.886349
] ? free_old_xmit_skbs+0x8c
/0xf0
[virtio_net]
[ 203.886356
] start_xmit+0x149
/0x500
[virtio_net]
[ 203.886364
] dev_hard_start_xmit+0x95
/0x1e0
[ 203.886370
] ? validate_xmit_skb_list+0x51
/0x80
[ 203.886374
] sch_direct_xmit+0x10c
/0x3a0
[ 203.886381
] __dev_queue_xmit+0xa47
/0xda0
[ 203.886387
] ip_finish_output2+0x2ef
/0x5a0
[ 203.886393
] ? srso_return_thunk+0x5
/0x10
[ 203.886400
] ? nf_conntrack_in+0xeb
/0x6c0
[nf_conntrack]
[ 203.886428
] __ip_finish_output+0xb7
/0x190
[ 203.886433
] ip_finish_output+0x32
/0x100
[ 203.886437
] ip_output+0x63
/0xf0
[ 203.886441
] ? __pfx_ip_finish_output+0x10
/0x10
[ 203.886446
] ip_local_out+0x62
/0x70
[ 203.886449
] __ip_queue_xmit+0x18e
/0x4b0
[ 203.886454
] ip_queue_xmit+0x19
/0x20
[ 203.886456
] __tcp_transmit_skb+0xb2d
/0xcd0
[ 203.886462
] ? srso_return_thunk+0x5
/0x10
[ 203.886469
] tcp_write_xmit+0x565
/0x1620
[ 203.886474
] tcp_push_one+0x40
/0x50
[ 203.886476
] tcp_sendmsg_locked+0x350
/0xee0
[ 203.886481
] ? tcp_current_mss+0x75
/0xd0
[ 203.886488
] tcp_sendmsg+0x31
/0x50
[ 203.886491
] inet_sendmsg+0x47
/0x80
[ 203.886498
] sock_write_iter+0x163
/0x190
[ 203.886507
] vfs_write+0x342
/0x3f0
[ 203.886517
] ksys_write+0xb9
/0xf0
[ 203.886520
] __x64_sys_write+0x1d
/0x30
[ 203.886522
] do_syscall_64+0x60
/0x90
[ 203.886528
] ? srso_return_thunk+0x5
/0x10
[ 203.886531
] ? ksys_write+0xb9
/0xf0
[ 203.886532
] ? srso_return_thunk+0x5
/0x10
[ 203.886535
] ? exit_to_user_mode_prepare+0x35
/0x180
[ 203.886542
] ? srso_return_thunk+0x5
/0x10
[ 203.886544
] ? syscall_exit_to_user_mode+0x38
/0x50
[ 203.886549
] ? __x64_sys_write+0x1d
/0x30
[ 203.886551
] ? srso_return_thunk+0x5
/0x10
[ 203.886553
] ? do_syscall_64+0x6d
/0x90
[ 203.886556
] ? srso_return_thunk+0x5
/0x10
[ 203.886558
] ? syscall_exit_to_user_mode+0x38
/0x50
[ 203.886561
] ? srso_return_thunk+0x5
/0x10
[ 203.886564
] ? do_syscall_64+0x6d
/0x90
[ 203.886566
] ? __x64_sys_write+0x1d
/0x30
[ 203.886568
] ? srso_return_thunk+0x5
/0x10
[ 203.886570
] ? do_syscall_64+0x6d
/0x90
[ 203.886572
] ? srso_return_thunk+0x5
/0x10
[ 203.886575
] ? sysvec_apic_timer_interrupt+0x52
/0x90
[ 203.886578
] entry_SYSCALL_64_after_hwframe+0x6e
/0xd8
From kernel 5.14, the following patch introduced a while loop for the virtio-net TX path which may enter infinite when VQ is broken (e.g., device is removed) under heavy traffic:
commit a7766ef18b33674fa164e2e2916cef16d4e17f43
Author: Michael S. Tsirkin <mst@redhat
.com>
Date: Tue Apr 13
01
:30
:45
2021
-0400
virtio_net: disable cb aggressively
There are currently two cases where we poll TX vq not in response to a
callback: start xmit and rx napi. We currently do
this
with callbacks
enabled which can cause extra interrupts from the card. Used not to be
a big issue as we run with interrupts disabled but that is no longer the
case
, and in some cases the rate of spurious interrupts is so high
linux detects this
and actually kills the interrupt.
Fix up by disabling the callbacks before polling the tx vq.
Signed-off-by: Michael S. Tsirkin <mst@redhat
.com>
Currently, there is no official fix from the kernel side, some The following workarounds may be employed:
Use kernel without the offending kernel patches
Stop heavy traffic while performing unplug