Hot-unplug Devices with Heavy Self-traffic, Guest OS Gets Call Trace

NVIDIA BlueField Virtio-net v1.9.0

On This Page

When the guest OS is running heavy traffic (e.g., iperf/iperf3) on a hotplug virtio-net device, unplugging those devices from BlueField side at the same time may results in the guest OS hanging.

The guest OS would print a call traffic similar like the following:

Copy
Copied!
            

[ 203.886218] CPU: 35 PID: 3077 Comm: iperf3 Not tainted 6.6.0 #1 [ 203.886222] Hardware name: Dell Inc. PowerEdge R7525/0590KW, BIOS 2.2.5 04/08/2021 [ 203.886224] RIP: 0010:free_old_xmit_skbs+0x5d/0xf0 [virtio_net] [ 203.886247] Code: 41 f6 c4 01 75 75 66 90 44 89 fe 4c 89 e7 45 03 6c 24 70 e8 65 1a 0a f0 83 c3 01 49 8b 3e 48 8d 75 cc e8 26 21 d1 ef 49 89 c4 <48> 85 c0 75 d1 85 db 74 0e 4d 01 ae 80 02 00 00 49 01 9e 78 02 00 [ 203.886249] RSP: 0018:ffffac62cb837678 EFLAGS: 00000246 [ 203.886253] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff9a35e7dbc000 [ 203.886255] RDX: 0000000000000000 RSI: ffffac62cb83767c RDI: ffff9a2e5e7d8900 [ 203.886257] RBP: ffffac62cb8376b0 R08: 0000000000000000 R09: 000000000003b2f0 [ 203.886259] R10: ffff9a2e4a570b00 R11: 000000000000000c R12: 0000000000000000 [ 203.886261] R13: 0000000000000000 R14: ffff9a2e62a48800 R15: 0000000000000000 [ 203.886263] FS: 00007f8444643400(0000) GS:ffff9a359f2c0000(0000) knlGS:0000000000000000 [ 203.886266] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 203.886268] CR2: 000056277998d028 CR3: 0000000127976000 CR4: 0000000000350ee0 [ 203.886270] Call Trace: [ 203.886274] <NMI> [ 203.886277] ? show_regs+0x6e/0x80 [ 203.886289] ? nmi_cpu_backtrace+0xb1/0x120 [ 203.886298] ? nmi_cpu_backtrace_handler+0x15/0x20 [ 203.886305] ? nmi_handle+0x6b/0x180 [ 203.886310] ? default_do_nmi+0x45/0x120 [ 203.886316] ? exc_nmi+0x142/0x1c0 [ 203.886319] ? end_repeat_nmi+0x16/0x67 [ 203.886328] ? free_old_xmit_skbs+0x5d/0xf0 [virtio_net] [ 203.886334] ? free_old_xmit_skbs+0x5d/0xf0 [virtio_net] [ 203.886341] ? free_old_xmit_skbs+0x5d/0xf0 [virtio_net] [ 203.886347] </NMI> [ 203.886348] <TASK> [ 203.886349] ? free_old_xmit_skbs+0x8c/0xf0 [virtio_net] [ 203.886356] start_xmit+0x149/0x500 [virtio_net] [ 203.886364] dev_hard_start_xmit+0x95/0x1e0 [ 203.886370] ? validate_xmit_skb_list+0x51/0x80 [ 203.886374] sch_direct_xmit+0x10c/0x3a0 [ 203.886381] __dev_queue_xmit+0xa47/0xda0 [ 203.886387] ip_finish_output2+0x2ef/0x5a0 [ 203.886393] ? srso_return_thunk+0x5/0x10 [ 203.886400] ? nf_conntrack_in+0xeb/0x6c0 [nf_conntrack] [ 203.886428] __ip_finish_output+0xb7/0x190 [ 203.886433] ip_finish_output+0x32/0x100 [ 203.886437] ip_output+0x63/0xf0 [ 203.886441] ? __pfx_ip_finish_output+0x10/0x10 [ 203.886446] ip_local_out+0x62/0x70 [ 203.886449] __ip_queue_xmit+0x18e/0x4b0 [ 203.886454] ip_queue_xmit+0x19/0x20 [ 203.886456] __tcp_transmit_skb+0xb2d/0xcd0 [ 203.886462] ? srso_return_thunk+0x5/0x10 [ 203.886469] tcp_write_xmit+0x565/0x1620 [ 203.886474] tcp_push_one+0x40/0x50 [ 203.886476] tcp_sendmsg_locked+0x350/0xee0 [ 203.886481] ? tcp_current_mss+0x75/0xd0 [ 203.886488] tcp_sendmsg+0x31/0x50 [ 203.886491] inet_sendmsg+0x47/0x80 [ 203.886498] sock_write_iter+0x163/0x190 [ 203.886507] vfs_write+0x342/0x3f0 [ 203.886517] ksys_write+0xb9/0xf0 [ 203.886520] __x64_sys_write+0x1d/0x30 [ 203.886522] do_syscall_64+0x60/0x90 [ 203.886528] ? srso_return_thunk+0x5/0x10 [ 203.886531] ? ksys_write+0xb9/0xf0 [ 203.886532] ? srso_return_thunk+0x5/0x10 [ 203.886535] ? exit_to_user_mode_prepare+0x35/0x180 [ 203.886542] ? srso_return_thunk+0x5/0x10 [ 203.886544] ? syscall_exit_to_user_mode+0x38/0x50 [ 203.886549] ? __x64_sys_write+0x1d/0x30 [ 203.886551] ? srso_return_thunk+0x5/0x10 [ 203.886553] ? do_syscall_64+0x6d/0x90 [ 203.886556] ? srso_return_thunk+0x5/0x10 [ 203.886558] ? syscall_exit_to_user_mode+0x38/0x50 [ 203.886561] ? srso_return_thunk+0x5/0x10 [ 203.886564] ? do_syscall_64+0x6d/0x90 [ 203.886566] ? __x64_sys_write+0x1d/0x30 [ 203.886568] ? srso_return_thunk+0x5/0x10 [ 203.886570] ? do_syscall_64+0x6d/0x90 [ 203.886572] ? srso_return_thunk+0x5/0x10 [ 203.886575] ? sysvec_apic_timer_interrupt+0x52/0x90 [ 203.886578] entry_SYSCALL_64_after_hwframe+0x6e/0xd8

From kernel 5.14, the following patch introduced a while loop for the virtio-net TX path which may enter infinite when VQ is broken (e.g., device is removed) under heavy traffic:

Copy
Copied!
            

commit a7766ef18b33674fa164e2e2916cef16d4e17f43 Author: Michael S. Tsirkin <mst@redhat.com> Date: Tue Apr 13 01:30:45 2021 -0400   virtio_net: disable cb aggressively   There are currently two cases where we poll TX vq not in response to a callback: start xmit and rx napi. We currently do this with callbacks enabled which can cause extra interrupts from the card. Used not to be a big issue as we run with interrupts disabled but that is no longer the case, and in some cases the rate of spurious interrupts is so high linux detects this and actually kills the interrupt.   Fix up by disabling the callbacks before polling the tx vq.   Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

Currently, there is no official fix from the kernel side, some The following workarounds may be employed:

  • Use kernel without the offending kernel patches

  • Stop heavy traffic while performing unplug

© Copyright 2024, NVIDIA. Last updated on Jun 18, 2024.