Ubuntu Guest OS Hangs with Kernel 5.15.0-88/89-generic
When probing the virtio-pci and virtio-net kernel modules while running Ubuntu 22.04 with kernel 5.15.0-88/89-generic with any virtio function (i.e, PF or VF), the guest OS hangs and prints call traces as follows:
[ 2052.109566
] CPU: 0
PID: 1183
Comm: systemd-udevd Tainted: P O L 5.15
.0
-88
-generic #98
-Ubuntu
[ 2052.109568
] Hardware name: Red Hat KVM, BIOS 1.15
.0
-2
.module+el8.6.0
+14757
+c25ee005 04
/01
/2014
[ 2052.109570
] RIP: 0010
:virtqueue_is_broken+0x9
/0x20
[ 2052.109579
] RSP: 0018
:ffffc206423a79c0 EFLAGS: 00000246
[ 2052.109581
] RAX: 0000000000000000
RBX: ffff9e8980bfa980 RCX: 0000000000000a20
[ 2052.109582
] RDX: 0000000000000000
RSI: ffffc206423a79cc RDI: ffff9e89847b9000
[ 2052.109583
] RBP: ffffc206423a7a60 R08: 0000000000000000
R09: 0000000000000003
[ 2052.109584
] R10: 0000000000000003
R11: 0000000000000002
R12: ffffc206423a79f0
[ 2052.109585
] R13: 0000000000000002
R14: 0000000000000004
R15: ffff9e8984667400
[ 2052.109586
] FS: 00007f3e295388c0(0000
) GS:ffff9e89bbc00000(0000
) knlGS:0000000000000000
[ 2052.109588
] CS: 0010
DS: 0000
ES: 0000
CR0: 0000000080050033
[ 2052.109590
] CR2: 0000555613432be0 CR3: 0000000116af0002 CR4: 0000000000170ef0
[ 2052.109593
] Call Trace:
[ 2052.109595
] <IRQ>
[ 2052.109598
] ? show_trace_log_lvl+0x1d6
/0x2ea
[ 2052.109605
] ? show_trace_log_lvl+0x1d6
/0x2ea
[ 2052.109609
] ? _virtnet_set_queues+0xbb
/0x100
[virtio_net]
[ 2052.109615
] ? show_regs.part.0
+0x23
/0x29
[ 2052.109618
] ? show_regs.cold+0x8
/0xd
[ 2052.109621
] ? watchdog_timer_fn+0x1be
/0x220
[ 2052.109625
] ? lockup_detector_update_enable+0x60
/0x60
[ 2052.109627
] ? __hrtimer_run_queues+0x107
/0x230
[ 2052.109631
] ? kvm_clock_get_cycles+0x11
/0x20
[ 2052.109637
] ? hrtimer_interrupt+0x101
/0x220
[ 2052.109640
] ? __sysvec_apic_timer_interrupt+0x61
/0xe0
[ 2052.109644
] ? sysvec_apic_timer_interrupt+0x7b
/0x90
[ 2052.109650
] </IRQ>
[ 2052.109650
] <TASK>
[ 2052.109651
] ? asm_sysvec_apic_timer_interrupt+0x1b
/0x20
[ 2052.109655
] ? virtqueue_is_broken+0x9
/0x20
[ 2052.109656
] ? virtnet_send_command+0x105
/0x170
[virtio_net]
[ 2052.109660
] _virtnet_set_queues+0xbb
/0x100
[virtio_net]
[ 2052.109670
] virtnet_probe+0x4ca
/0xa10
[virtio_net]
[ 2052.109674
] virtio_dev_probe+0x1ae
/0x260
[ 2052.109676
] really_probe+0x222
/0x420
[ 2052.109679
] __driver_probe_device+0xe8
/0x140
[ 2052.109681
] driver_probe_device+0x23
/0xc0
[ 2052.109683
] __driver_attach+0xf7
/0x1f0
[ 2052.109685
] ? __device_attach_driver+0x140
/0x140
[ 2052.109687
] bus_for_each_dev+0x7f
/0xd0
[ 2052.109691
] driver_attach+0x1e
/0x30
[ 2052.109693
] bus_add_driver+0x148
/0x220
[ 2052.109695
] driver_register+0x95
/0x100
[ 2052.109697
] register_virtio_driver+0x20
/0x40
[ 2052.109698
] virtio_net_driver_init+0x74
/0x1000
[virtio_net]
[ 2052.109702
] ? 0xffffffffc0d6f000
[ 2052.109704
] do_one_initcall+0x49
/0x1e0
[ 2052.109709
] ? kmem_cache_alloc_trace+0x19e
/0x2e0
[ 2052.109713
] do_init_module+0x52
/0x260
[ 2052.109716
] load_module+0xb2b
/0xbc0
[ 2052.109718
] __do_sys_finit_module+0xbf
/0x120
[ 2052.109721
] __x64_sys_finit_module+0x18
/0x20
[ 2052.109722
] do_syscall_64+0x5c
/0xc0
[ 2052.109725
] ? do_syscall_64+0x69
/0xc0
[ 2052.109726
] ? syscall_exit_to_user_mode+0x35
/0x50
[ 2052.109729
] ? __x64_sys_newfstatat+0x1c
/0x30
[ 2052.109733
] ? do_syscall_64+0x69
/0xc0
[ 2052.109735
] entry_SYSCALL_64_after_hwframe+0x62
/0xcc
There is a bug in upstream version v6.5-rc4, which is fixed in v6.5-rc7. Canonical backported the problematic patch to Ubuntu 5.15.0-88/89.generic, which triggers this Virtio-net deadlock issue:
commit 51b813176f098ff61bd2833f627f5319ead098a5
Author: Jason Wang <jasowang@redhat
.com>
Date: Wed Aug 9
23
:12
:56
2023
-0400
virtio-net: set queues after driver_ok
Commit 25266128fe16 ("virtio-net: fix race between set queues and
probe") tries to fix the race between set queues and probe by calling
_virtnet_set_queues() before DRIVER_OK is set. This violates virtio
spec. Fixing this
by setting queues after virtio_device_ready().
Note that rtnl needs to be held for
userspace requests to change the
number of queues. So we are serialized in this
way.
Fixes: 25266128fe16 ("virtio-net: fix race between set queues and probe"
)
Reported-by: Dragos Tatulea <dtatulea@nvidia
.com>
Acked-by: Michael S. Tsirkin <mst@redhat
.com>
Signed-off-by: Jason Wang <jasowang@redhat
.com>
Signed-off-by: David S. Miller <davem@davemloft
.net>
Switch default kernel back to another version (e.g., 5.15.0-79-generic).
From 5.15.0-90-generic, the Ubuntu official kernel has the issue fixed.
There are multiple ways to switch the default kernel. The following is only one example:
Users must have root permission before proceeding.
Open /etc/default/grub and change GRUB_DEFAULT as follows:
GRUB_DEFAULT=saved
Save file.
Run the following to get the number of the kernel you want
# grep
"menuentry 'Ubuntu,"
/boot/grub/grub.cfgInfoNumbering starts from 0 (i.e., first entry is 0)
Run the following to set the default kernel:
# grub-set-
default
num_from_last_stepReboot.