Hot-unplug Devices with Heavy Self-traffic, Guest OS Gets Call Trace

Problem

When the guest OS is running heavy traffic (e.g., iperf/iperf3) on a hotplug virtio-net device, unplugging those devices from BlueField side at the same time may results in the guest OS hanging.

The guest OS would print a call traffic similar like the following:

Unexpected TXQ (x) queue failure and sched: RT throttling activated are usually observed.

Copy
Copied!

            
            kernel: pcieport 0000:e2:01.0: pciehp: Slot(1): Card not present
kernel: net ens1f0: Unexpected TXQ (0) queue failure: -5
kernel: sched: RT throttling activated
kernel: rcu: INFO: rcu_sched self-detected stall on CPU
kernel: rcu:         42-....: (1 GPs behind) idle=39f/1/0x4000000000000000 softirq=388460/388461 fqs=7491
kernel:         (t=15000 jiffies g=2309357 q=7886)
kernel: Sending NMI from CPU 42 to CPUs 25:
kernel: NMI backtrace for cpu 25
kernel: CPU: 25 PID: 491 Comm: irq/71-pciehp Tainted: G           OE     5.15.48 #10
kernel: Hardware name: Dell Inc. PowerEdge R7525/0590KW, BIOS 2.2.5 04/08/2021
kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x74/0x230
kernel: Code: 0f ba 2b 08 0f 92 c2 8b 03 0f b6 d2 c1 e2 08 30 e4 09 d0 a9 00 01 ff ff 0f 85 0f 01 00 00 85 c0 74 0e 8b 03 84 c0 74 08 f3 90 <8b> 03 84 c0 75 f8 b8 01 00 00 00 66 89 03 5b 41 5c 41 5d 41 5e 41
kernel: RSP: 0018:ffffb94988f97988 EFLAGS: 00000202
kernel: RAX: 0000000000000101 RBX: ffff8c2bc56bba80 RCX: ffff8c2bd8192800
kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8c2bc56bba80
kernel: RBP: ffffb94988f979b0 R08: ffff8c2bc7be3000 R09: ffffffff9fa89ef8
kernel: R10: ffffb94988f979b8 R11: 000000000000001f R12: 0000000000000019
kernel: R13: ffff8c2bc56bba80 R14: 0000000000000000 R15: ffff8c2bc7be3000
kernel: FS:  0000000000000000(0000) GS:ffff8c339f640000(0000) knlGS:0000000000000000
kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 00007ff9c58a8a50 CR3: 0000000dc5010000 CR4: 0000000000350ee0
kernel: Call Trace:
kernel:  <TASK>
kernel:  _raw_spin_lock+0x1f/0x30
kernel:  dev_deactivate_many+0xf3/0x2e0
kernel:  __dev_close_many+0x7d/0x120
kernel:  dev_close_many+0x7f/0x120
kernel:  ? kernfs_put.part.0+0xe2/0x1a0
kernel:  unregister_netdevice_many+0x13a/0x790
kernel:  ? idr_find+0xf/0x20
kernel:  unregister_netdevice_queue+0x91/0xe0
kernel:  unregister_netdev+0x1d/0x30
kernel:  virtnet_remove+0x4d/0x80 [virtio_net]
kernel:  virtio_dev_remove+0x4b/0xa0
kernel:  __device_release_driver+0x1a8/0x290
kernel:  device_release_driver+0x29/0x40
kernel:  bus_remove_device+0xde/0x150
kernel:  device_del+0x19c/0x3f0
kernel:  ? __cond_resched+0x1a/0x50
kernel:  device_unregister+0x18/0x60
kernel:  unregister_virtio_device+0x18/0x30
kernel:  virtio_pci_remove+0x41/0x80 [virtio_pci]
kernel:  pci_device_remove+0x3e/0xb0
kernel:  __device_release_driver+0x1a8/0x290
kernel:  device_release_driver+0x29/0x40
kernel:  pci_stop_bus_device+0x71/0xa0
kernel:  pci_stop_and_remove_bus_device+0x13/0x30
kernel:  pciehp_unconfigure_device+0x7e/0x130
kernel:  pciehp_disable_slot+0x6c/0x100
kernel:  pciehp_handle_presence_or_link_change+0xde/0x2f0
kernel:  pciehp_ist+0x197/0x1a0
kernel:  ? irq_forced_thread_fn+0x90/0x90
kernel:  irq_thread_fn+0x28/0x60
kernel:  irq_thread+0xde/0x1b0
kernel:  ? irq_thread_fn+0x60/0x60
kernel:  ? irq_thread_check_affinity+0xf0/0xf0
kernel:  kthread+0x12a/0x150
kernel:  ? set_kthread_struct+0x50/0x50
kernel:  ret_from_fork+0x22/0x30
kernel:  </TASK>

Root Cause

From kernel 5.14, the following patch introduced a while loop for the virtio-net TX path which may enter infinite when VQ is broken (e.g., device is removed) under heavy traffic:

Copy
Copied!

            
            commit a7766ef18b33674fa164e2e2916cef16d4e17f43
Author: Michael S. Tsirkin <mst@redhat.com>
Date:   Tue Apr 13 01:30:45 2021 -0400
 
    virtio_net: disable cb aggressively
 
    There are currently two cases where we poll TX vq not in response to a
    callback: start xmit and rx napi.  We currently do this with callbacks
    enabled which can cause extra interrupts from the card.  Used not to be
    a big issue as we run with interrupts disabled but that is no longer the
    case, and in some cases the rate of spurious interrupts is so high
    linux detects this and actually kills the interrupt.
 
    Fix up by disabling the callbacks before polling the tx vq.
 
    Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

Solution

Currently, there is no official fix from the kernel side, some The following workarounds may be employed:

Bring down the network interface corresponds to the hotplug device before unplugging it
Use kernel without the offending kernel patches

On This Page

Hot-unplug Devices with Heavy Self-traffic, Guest OS Gets Call Trace

Problem

Root Cause

Solution