Ubuntu Guest OS Hangs with Kernel 5.15.0-88/89-generic

NVIDIA BlueField Virtio-net v1.9.0

On This Page

When probing the virtio-pci and virtio-net kernel modules while running Ubuntu 22.04 with kernel 5.15.0-88/89-generic with any virtio function (i.e, PF or VF), the guest OS hangs and prints call traces as follows:

Copy
Copied!
            

[ 2052.109566] CPU: 0 PID: 1183 Comm: systemd-udevd Tainted: P O L 5.15.0-88-generic #98-Ubuntu [ 2052.109568] Hardware name: Red Hat KVM, BIOS 1.15.0-2.module+el8.6.0+14757+c25ee005 04/01/2014 [ 2052.109570] RIP: 0010:virtqueue_is_broken+0x9/0x20 [ 2052.109579] RSP: 0018:ffffc206423a79c0 EFLAGS: 00000246 [ 2052.109581] RAX: 0000000000000000 RBX: ffff9e8980bfa980 RCX: 0000000000000a20 [ 2052.109582] RDX: 0000000000000000 RSI: ffffc206423a79cc RDI: ffff9e89847b9000 [ 2052.109583] RBP: ffffc206423a7a60 R08: 0000000000000000 R09: 0000000000000003 [ 2052.109584] R10: 0000000000000003 R11: 0000000000000002 R12: ffffc206423a79f0 [ 2052.109585] R13: 0000000000000002 R14: 0000000000000004 R15: ffff9e8984667400 [ 2052.109586] FS: 00007f3e295388c0(0000) GS:ffff9e89bbc00000(0000) knlGS:0000000000000000 [ 2052.109588] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2052.109590] CR2: 0000555613432be0 CR3: 0000000116af0002 CR4: 0000000000170ef0 [ 2052.109593] Call Trace: [ 2052.109595] <IRQ> [ 2052.109598] ? show_trace_log_lvl+0x1d6/0x2ea [ 2052.109605] ? show_trace_log_lvl+0x1d6/0x2ea [ 2052.109609] ? _virtnet_set_queues+0xbb/0x100 [virtio_net] [ 2052.109615] ? show_regs.part.0+0x23/0x29 [ 2052.109618] ? show_regs.cold+0x8/0xd [ 2052.109621] ? watchdog_timer_fn+0x1be/0x220 [ 2052.109625] ? lockup_detector_update_enable+0x60/0x60 [ 2052.109627] ? __hrtimer_run_queues+0x107/0x230 [ 2052.109631] ? kvm_clock_get_cycles+0x11/0x20 [ 2052.109637] ? hrtimer_interrupt+0x101/0x220 [ 2052.109640] ? __sysvec_apic_timer_interrupt+0x61/0xe0 [ 2052.109644] ? sysvec_apic_timer_interrupt+0x7b/0x90 [ 2052.109650] </IRQ> [ 2052.109650] <TASK> [ 2052.109651] ? asm_sysvec_apic_timer_interrupt+0x1b/0x20 [ 2052.109655] ? virtqueue_is_broken+0x9/0x20 [ 2052.109656] ? virtnet_send_command+0x105/0x170 [virtio_net] [ 2052.109660] _virtnet_set_queues+0xbb/0x100 [virtio_net] [ 2052.109670] virtnet_probe+0x4ca/0xa10 [virtio_net] [ 2052.109674] virtio_dev_probe+0x1ae/0x260 [ 2052.109676] really_probe+0x222/0x420 [ 2052.109679] __driver_probe_device+0xe8/0x140 [ 2052.109681] driver_probe_device+0x23/0xc0 [ 2052.109683] __driver_attach+0xf7/0x1f0 [ 2052.109685] ? __device_attach_driver+0x140/0x140 [ 2052.109687] bus_for_each_dev+0x7f/0xd0 [ 2052.109691] driver_attach+0x1e/0x30 [ 2052.109693] bus_add_driver+0x148/0x220 [ 2052.109695] driver_register+0x95/0x100 [ 2052.109697] register_virtio_driver+0x20/0x40 [ 2052.109698] virtio_net_driver_init+0x74/0x1000 [virtio_net] [ 2052.109702] ? 0xffffffffc0d6f000 [ 2052.109704] do_one_initcall+0x49/0x1e0 [ 2052.109709] ? kmem_cache_alloc_trace+0x19e/0x2e0 [ 2052.109713] do_init_module+0x52/0x260 [ 2052.109716] load_module+0xb2b/0xbc0 [ 2052.109718] __do_sys_finit_module+0xbf/0x120 [ 2052.109721] __x64_sys_finit_module+0x18/0x20 [ 2052.109722] do_syscall_64+0x5c/0xc0 [ 2052.109725] ? do_syscall_64+0x69/0xc0 [ 2052.109726] ? syscall_exit_to_user_mode+0x35/0x50 [ 2052.109729] ? __x64_sys_newfstatat+0x1c/0x30 [ 2052.109733] ? do_syscall_64+0x69/0xc0 [ 2052.109735] entry_SYSCALL_64_after_hwframe+0x62/0xcc

There is a bug in upstream version v6.5-rc4, which is fixed in v6.5-rc7. Canonical backported the problematic patch to Ubuntu 5.15.0-88/89.generic, which triggers this Virtio-net deadlock issue:

Copy
Copied!
            

commit 51b813176f098ff61bd2833f627f5319ead098a5 Author: Jason Wang <jasowang@redhat.com> Date: Wed Aug 9 23:12:56 2023 -0400   virtio-net: set queues after driver_ok   Commit 25266128fe16 ("virtio-net: fix race between set queues and probe") tries to fix the race between set queues and probe by calling _virtnet_set_queues() before DRIVER_OK is set. This violates virtio spec. Fixing this by setting queues after virtio_device_ready().   Note that rtnl needs to be held for userspace requests to change the number of queues. So we are serialized in this way.   Fixes: 25266128fe16 ("virtio-net: fix race between set queues and probe") Reported-by: Dragos Tatulea <dtatulea@nvidia.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>

Switch default kernel back to another version (e.g., 5.15.0-79-generic).

Note

From 5.15.0-90-generic, the Ubuntu official kernel has the issue fixed.

There are multiple ways to switch the default kernel. The following is only one example:

Note

Users must have root permission before proceeding.

  1. Open /etc/default/grub and change GRUB_DEFAULT as follows:

    Copy
    Copied!
                

    GRUB_DEFAULT=saved

  2. Save file.

  3. Run the following to get the number of the kernel you want

    Copy
    Copied!
                

    # grep "menuentry 'Ubuntu," /boot/grub/grub.cfg

    Info

    Numbering starts from 0 (i.e., first entry is 0)

  4. Run the following to set the default kernel:

    Copy
    Copied!
                

    # grub-set-default num_from_last_step

  5. Reboot.

© Copyright 2024, NVIDIA. Last updated on Jun 18, 2024.