Ubuntu Guest OS Hangs with Kernel 5.15.0-88/89-generic
When probing the virtio-pci and virtio-net kernel modules while running Ubuntu 22.04 with kernel 5.15.0-88/89-generic with any virtio function (i.e, PF or VF), the guest OS hangs and prints call traces as follows:
[ 2052.109566] CPU: 0 PID: 1183 Comm: systemd-udevd Tainted: P O L 5.15.0-88-generic #98-Ubuntu
[ 2052.109568] Hardware name: Red Hat KVM, BIOS 1.15.0-2.module+el8.6.0+14757+c25ee005 04/01/2014
[ 2052.109570] RIP: 0010:virtqueue_is_broken+0x9/0x20
[ 2052.109579] RSP: 0018:ffffc206423a79c0 EFLAGS: 00000246
[ 2052.109581] RAX: 0000000000000000 RBX: ffff9e8980bfa980 RCX: 0000000000000a20
[ 2052.109582] RDX: 0000000000000000 RSI: ffffc206423a79cc RDI: ffff9e89847b9000
[ 2052.109583] RBP: ffffc206423a7a60 R08: 0000000000000000 R09: 0000000000000003
[ 2052.109584] R10: 0000000000000003 R11: 0000000000000002 R12: ffffc206423a79f0
[ 2052.109585] R13: 0000000000000002 R14: 0000000000000004 R15: ffff9e8984667400
[ 2052.109586] FS: 00007f3e295388c0(0000) GS:ffff9e89bbc00000(0000) knlGS:0000000000000000
[ 2052.109588] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2052.109590] CR2: 0000555613432be0 CR3: 0000000116af0002 CR4: 0000000000170ef0
[ 2052.109593] Call Trace:
[ 2052.109595] <IRQ>
[ 2052.109598] ? show_trace_log_lvl+0x1d6/0x2ea
[ 2052.109605] ? show_trace_log_lvl+0x1d6/0x2ea
[ 2052.109609] ? _virtnet_set_queues+0xbb/0x100 [virtio_net]
[ 2052.109615] ? show_regs.part.0+0x23/0x29
[ 2052.109618] ? show_regs.cold+0x8/0xd
[ 2052.109621] ? watchdog_timer_fn+0x1be/0x220
[ 2052.109625] ? lockup_detector_update_enable+0x60/0x60
[ 2052.109627] ? __hrtimer_run_queues+0x107/0x230
[ 2052.109631] ? kvm_clock_get_cycles+0x11/0x20
[ 2052.109637] ? hrtimer_interrupt+0x101/0x220
[ 2052.109640] ? __sysvec_apic_timer_interrupt+0x61/0xe0
[ 2052.109644] ? sysvec_apic_timer_interrupt+0x7b/0x90
[ 2052.109650] </IRQ>
[ 2052.109650] <TASK>
[ 2052.109651] ? asm_sysvec_apic_timer_interrupt+0x1b/0x20
[ 2052.109655] ? virtqueue_is_broken+0x9/0x20
[ 2052.109656] ? virtnet_send_command+0x105/0x170 [virtio_net]
[ 2052.109660] _virtnet_set_queues+0xbb/0x100 [virtio_net]
[ 2052.109670] virtnet_probe+0x4ca/0xa10 [virtio_net]
[ 2052.109674] virtio_dev_probe+0x1ae/0x260
[ 2052.109676] really_probe+0x222/0x420
[ 2052.109679] __driver_probe_device+0xe8/0x140
[ 2052.109681] driver_probe_device+0x23/0xc0
[ 2052.109683] __driver_attach+0xf7/0x1f0
[ 2052.109685] ? __device_attach_driver+0x140/0x140
[ 2052.109687] bus_for_each_dev+0x7f/0xd0
[ 2052.109691] driver_attach+0x1e/0x30
[ 2052.109693] bus_add_driver+0x148/0x220
[ 2052.109695] driver_register+0x95/0x100
[ 2052.109697] register_virtio_driver+0x20/0x40
[ 2052.109698] virtio_net_driver_init+0x74/0x1000 [virtio_net]
[ 2052.109702] ? 0xffffffffc0d6f000
[ 2052.109704] do_one_initcall+0x49/0x1e0
[ 2052.109709] ? kmem_cache_alloc_trace+0x19e/0x2e0
[ 2052.109713] do_init_module+0x52/0x260
[ 2052.109716] load_module+0xb2b/0xbc0
[ 2052.109718] __do_sys_finit_module+0xbf/0x120
[ 2052.109721] __x64_sys_finit_module+0x18/0x20
[ 2052.109722] do_syscall_64+0x5c/0xc0
[ 2052.109725] ? do_syscall_64+0x69/0xc0
[ 2052.109726] ? syscall_exit_to_user_mode+0x35/0x50
[ 2052.109729] ? __x64_sys_newfstatat+0x1c/0x30
[ 2052.109733] ? do_syscall_64+0x69/0xc0
[ 2052.109735] entry_SYSCALL_64_after_hwframe+0x62/0xcc
There is a bug in upstream version v6.5-rc4, which is fixed in v6.5-rc7. Canonical backported the problematic patch to Ubuntu 5.15.0-88/89.generic, which triggers this Virtio-net deadlock issue:
commit 51b813176f098ff61bd2833f627f5319ead098a5
Author: Jason Wang <jasowang@redhat.com>
Date: Wed Aug 9 23:12:56 2023 -0400
virtio-net: set queues after driver_ok
Commit 25266128fe16 ("virtio-net: fix race between set queues and
probe") tries to fix the race between set queues and probe by calling
_virtnet_set_queues() before DRIVER_OK is set. This violates virtio
spec. Fixing this by setting queues after virtio_device_ready().
Note that rtnl needs to be held for userspace requests to change the
number of queues. So we are serialized in this way.
Fixes: 25266128fe16 ("virtio-net: fix race between set queues and probe")
Reported-by: Dragos Tatulea <dtatulea@nvidia.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Switch default kernel back to another version (e.g., 5.15.0-79-generic).
From 5.15.0-90-generic, the Ubuntu official kernel has the issue fixed.
There are multiple ways to switch the default kernel. The following is only one example:
Users must have root permission before proceeding.
Open
/etc/default/gruband changeGRUB_DEFAULTas follows:GRUB_DEFAULT=saved
Save file.
Run the following to get the number of the kernel you want
# grep
"menuentry 'Ubuntu,"/boot/grub/grub.cfgInfoNumbering starts from 0 (i.e., first entry is 0)
Run the following to set the default kernel:
# grub-set-
defaultnum_from_last_stepReboot.