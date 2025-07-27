Ping failed with packet size greater than 1500/4000 after configuring jumbo MTU.

Jumbo MTU is supported starting from the following kernel version:

Release Upstream VM kernel: 4.18.0-193.el8.x86_64 VM Linux version supports big MTU after 4.11. Ubuntu DOCA_2.5.0_BSP_4.5.0_Ubuntu_22.04 Virtnet v1.7 or v1.6.26

The following steps configure jumbo MTU:

Change the MTU of uplink representor (or bond) from the BlueField Arm OS: Copy Copied! # echo 9216 > /sys/bus/pci/devices/ 0000 : 03 : 00.0 /net/p0/mtu Restart virtio-net-controller from the BlueField Arm OS: Copy Copied! # systemctl restart virtio-net-controlle Change the corresponding device MTU on BlueField Arm OS. For example, for the first VF on the first PF, run: Copy Copied! # virtnet modify -p 0 -v 0 device -t 9216 Reload the virtio driver from the guest OS: Copy Copied! # modprobe -rv virtio-net && modprobe -v virtio-net Verify the VQs' MTU configuration is correct on BlueField Arm OS: Copy Copied! # virtnet query -p 0 -v 0 --dbg_stats | grep jumbo_mtu "jumbo_mtu" : 1 "jumbo_mtu" : 1 Change the MTU of the virtio-net interface from the guest OS: Copy Copied! # echo 9216 > /sys/bus/pci/devices/ 0000 :af: 00.2 /virtio0/net/enp175s0f2/mtu

The problem can be verified using the following commands:

Copy Copied! # virtnet list ERR: Can't connect to virtnet controller: [Errno 111 ] Connection refused Check 'systemctl status virtio-net-controller' Or controller is not ready to accept commands

Copy Copied! # systemctl status virtio-net-controller virtio-net-controller.service - Nvidia VirtIO Net Controller Daemon Loaded: loaded (/etc/systemd/system/virtio-net-controller.service; enabled; vendor preset: disabled) Active: inactive (dead) since Fri 2023 - 10 - 27 17 : 46 : 59 CDT; 2min 26s ago Docs: file:/opt/mellanox/mlnx_virtnet/README.md Process: 29652 ExecStart=/usr/sbin/virtio_net_manager (code=exited, status= 0 /SUCCESS) Main PID: 29652 (code=exited, status= 0 /SUCCESS)





The problem may happen due to the following reasons.

Check if mlxconfig has VIRTIO_NET_EMULATION_ENABLE enabled: Copy Copied! # mlxconfig -d 03 : 00.0 -e q | grep -i VIRTIO_NET_EMULATION_ENABLE * VIRTIO_NET_EMULATION_ENABLE False( 0 ) True( 1 ) True( 1 ) Both 2 and 3 columns should appear as true . If they are not, perform the following from the BlueField Arm side: Copy Copied! # mlxconfig -d 03 : 00.0 s VIRTIO_NET_EMULATION_ENABLE= 1 Perform a BlueField system-level reset as documented in the BlueField software documentation.

This can happen when more VIRTIO_NET_EMULATION_NUM_PF are reserved than PF_TOTAL_SF , as each virtio-net PF/VF requires a corresponding SF created:

Copy Copied! # mlxconfig -d 03 : 00.0 -e q | grep -iE 'PF_TOTAL_SF|VIRTIO_NET_EMULATION_NUM_PF' * VIRTIO_NET_EMULATION_NUM_PF 0 4 4 * PF_TOTAL_SF 0 8 8

Info By default, the BlueField creates an SF for each PF. Take this into consideration when reserving PF_TOTAL_SF .

Creating a virtio-net VF returns an error from the command line:

Copy Copied! # echo 3 > /sys/bus/pci/drivers/virtio-pci/ 0000 : 41 : 00.2 /sriov_numvfs write error: Function not implemented

The host-side dmesg shows the following:

Copy Copied! [ 301.204661 ] virtio-pci 0000 : 41 : 00.2 : Driver doesn't support SRIOV configuration via sysfs





Virtio SR-IOV is only supported starting from the following kernel version:

Release Upstream 4.18 with commit cfecc2918d2b3 Ubuntu Ubuntu-hwe-4.18.0-9.10_18.04.1 CentOS 3.10.0-957.el7 / 7.6.1810

The following command from the hypervisor hangs:

Copy Copied! # echo 100 > /sys/bus/pci/drivers/virtio-pci/ 0000 : 89 : 00.4 /sriov_numvfs





This can happen when more VIRTIO_NET_EMULATION_NUM_PF / VIRTIO_NET_EMULATION_NUM_VF are reserved than PF_TOTAL_SF ( VIRTIO_NET_EMULATION_NUM_PF + VIRTIO_NET_EMULATION_NUM_VF > PF_TOTAL_SF ) as each virtio-net PF/VF requires a corresponding SF created. Example:

Copy Copied! # mlxconfig -d 03 : 00.0 -e q | grep -iE 'PF_TOTAL_SF|VIRTIO_NET_EMULATION_NUM_PF|VIRTIO_NET_EMULATION_NUM_VF' * VIRTIO_NET_EMULATION_NUM_VF 0 126 126 * VIRTIO_NET_EMULATION_NUM_PF 0 4 4 * PF_TOTAL_SF 0 508 508

Info By default, BlueField creates an SF for each PF. Take this into consideration when reserving PF_TOTAL_SF .

Note BlueField supports a limited number of SFs. The SF reserved on the BlueField Arm side and host side are not shared. Make sure to remove the SFs reserved on the host side when reserving a large number on the BlueField Arm side.

After creating a hotplug device from the BlueField side, probing virtio drivers does not create the virtio-net device correctly.

The problem may happen due to the following reasons.

Possible failure on BAR 0. check dmesg from guest OS for corresponding hotplug BDF:

Copy Copied! [ 10.874845 ] pci 0000 : 87 : 00.1 : BAR 0 : failed to assign [mem size 0x00100000 ]

Info In this example, the hotplug PCIe BDF is 87:00.1. This value can be retrieved using " lspci | grep -i virtio " from the guest OS.

This can be normally resolved by adding " pci=realloc " in the Linux command line (grub).

Possible failure on other PCIe BAR. Check the dmesg from the guest OS for the corresponding hotplug BDF:

Copy Copied! [ 2893.484281 ] pcieport 0000 : 10 : 01.0 : bridge window [mem 0x00100000 - 0x000fffff ] to [bus 12 ] add_size 200000 add_align 100000 [ 2893.484285 ] pcieport 0000 : 10 : 01.0 : BAR 14 : no space for [mem size 0x00200000 ] [ 2893.484287 ] pcieport 0000 : 10 : 01.0 : BAR 14 : failed to assign [mem size 0x00200000 ] [ 2893.484289 ] pcieport 0000 : 10 : 01.0 : BAR 14 : no space for [mem size 0x00200000 ] [ 2893.484290 ] pcieport 0000 : 10 : 01.0 : BAR 14 : failed to assign [mem size 0x00200000 ]

Info In this example, the hotplug PCIe BDF is 10:01.0. This value can be retrieved using " lspci | grep -i virtio " from the guest OS.

This is mostly due to there being insufficient BAR resources. Try to reduce the PF BAR size by performing the following from the BlueField side: Copy Copied! # mlxconfig -d 03 : 00.0 s PF_LOG_BAR_SIZE= 0

This can also be caused by the BIOS provider not reserving enough memory. Check the guest OS's dmesg for similar messages for the PCIe bus of the BlueField device: Copy Copied! [ 3.979061 ] pci_bus 0000 :a0: root bus resource [mem 0x41c0800000 - 0x41c10fffff window] (9M) [ 3.979062 ] pci_bus 0000 :a0: root bus resource [bus a0-bf] [ 4.017770 ] pci 0000 :a4: 00.0 : bridge window [mem 0x41c0800000 - 0x41c0ffffff 64bit pref] (8M) [ 4.018243 ] pci 0000 :a4: 00.0 : BAR 15 : no space for [mem size 0x05800000 64bit pref] (88M) [ 4.018245 ] pci 0000 :a4: 00.0 : BAR 15 : failed to assign [mem size 0x05800000 64bit pref] On the host, the prefetchable memory limit of the root bus ( a0 ) is only 9 M. This means that all the devices under this bus (including BlueField) can only be allocated 9M prefetchable memory in total. The BAR 15 is the total prefetchable memory limit on the bridge ( a4 ) of the device. The PCI bridge window of the BlueField for prefetchable memory is 8M, but the bridge requires 88M for its child device ( BlueField ). A fter several attempts, the PCIe bridge did not find sufficient IO memory to allocate for BlueField BARs. This can be solved by contacting the BIOS provider to provide enough memory to the PCI root.



If the the hotplug operation from the BlueField Arm side is performed before the guest OS is up, and the virtio device is not found by the command " lspci | grep -i virtio ". Try to rescan from guest OS:

Copy Copied! # echo 1 >/sys/bus/pci/rescan





The server BIOS may not support hotplug device. This can be confirmed by looking at guest OS dmesg:

Copy Copied! [ 8.209406 ] acpi PNP0A08: 03 : _OSC: platform does not support [PCIeHotplug PME]

Try to enable hotplug from the BIOS:

Guest OS may be running a kernel older than 4.19, the virtio device is not found by " lspci | grep -i virtio ". Add the entry pciehp.pciehp_force=1 to the grub command line.

When the guest OS is running heavy traffic (e.g., iperf/iperf3) on a hotplug virtio-net device, unplugging those devices from BlueField side at the same time may results in the guest OS hanging.

The guest OS would print a call traffic similar like the following:

Collapse Source Copy Copied! [ 203.886218 ] CPU: 35 PID: 3077 Comm: iperf3 Not tainted 6.6 . 0 # 1 [ 203.886222 ] Hardware name: Dell Inc. PowerEdge R7525/0590KW, BIOS 2.2 . 5 04 / 08 / 2021 [ 203.886224 ] RIP: 0010 :free_old_xmit_skbs+ 0x5d / 0xf0 [virtio_net] [ 203.886247 ] Code: 41 f6 c4 01 75 75 66 90 44 89 fe 4c 89 e7 45 03 6c 24 70 e8 65 1a 0a f0 83 c3 01 49 8b 3e 48 8d 75 cc e8 26 21 d1 ef 49 89 c4 < 48 > 85 c0 75 d1 85 db 74 0e 4d 01 ae 80 02 00 00 49 01 9e 78 02 00 [ 203.886249 ] RSP: 0018 :ffffac62cb837678 EFLAGS: 00000246 [ 203.886253 ] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff9a35e7dbc000 [ 203.886255 ] RDX: 0000000000000000 RSI: ffffac62cb83767c RDI: ffff9a2e5e7d8900 [ 203.886257 ] RBP: ffffac62cb8376b0 R08: 0000000000000000 R09: 000000000003b2f0 [ 203.886259 ] R10: ffff9a2e4a570b00 R11: 000000000000000c R12: 0000000000000000 [ 203.886261 ] R13: 0000000000000000 R14: ffff9a2e62a48800 R15: 0000000000000000 [ 203.886263 ] FS: 00007f8444643400( 0000 ) GS:ffff9a359f2c0000( 0000 ) knlGS: 0000000000000000 [ 203.886266 ] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 203.886268 ] CR2: 000056277998d028 CR3: 0000000127976000 CR4: 0000000000350ee0 [ 203.886270 ] Call Trace: [ 203.886274 ] <NMI> [ 203.886277 ] ? show_regs+ 0x6e / 0x80 [ 203.886289 ] ? nmi_cpu_backtrace+ 0xb1 / 0x120 [ 203.886298 ] ? nmi_cpu_backtrace_handler+ 0x15 / 0x20 [ 203.886305 ] ? nmi_handle+ 0x6b / 0x180 [ 203.886310 ] ? default_do_nmi+ 0x45 / 0x120 [ 203.886316 ] ? exc_nmi+ 0x142 / 0x1c0 [ 203.886319 ] ? end_repeat_nmi+ 0x16 / 0x67 [ 203.886328 ] ? free_old_xmit_skbs+ 0x5d / 0xf0 [virtio_net] [ 203.886334 ] ? free_old_xmit_skbs+ 0x5d / 0xf0 [virtio_net] [ 203.886341 ] ? free_old_xmit_skbs+ 0x5d / 0xf0 [virtio_net] [ 203.886347 ] </NMI> [ 203.886348 ] <TASK> [ 203.886349 ] ? free_old_xmit_skbs+ 0x8c / 0xf0 [virtio_net] [ 203.886356 ] start_xmit+ 0x149 / 0x500 [virtio_net] [ 203.886364 ] dev_hard_start_xmit+ 0x95 / 0x1e0 [ 203.886370 ] ? validate_xmit_skb_list+ 0x51 / 0x80 [ 203.886374 ] sch_direct_xmit+ 0x10c / 0x3a0 [ 203.886381 ] __dev_queue_xmit+ 0xa47 / 0xda0 [ 203.886387 ] ip_finish_output2+ 0x2ef / 0x5a0 [ 203.886393 ] ? srso_return_thunk+ 0x5 / 0x10 [ 203.886400 ] ? nf_conntrack_in+ 0xeb / 0x6c0 [nf_conntrack] [ 203.886428 ] __ip_finish_output+ 0xb7 / 0x190 [ 203.886433 ] ip_finish_output+ 0x32 / 0x100 [ 203.886437 ] ip_output+ 0x63 / 0xf0 [ 203.886441 ] ? __pfx_ip_finish_output+ 0x10 / 0x10 [ 203.886446 ] ip_local_out+ 0x62 / 0x70 [ 203.886449 ] __ip_queue_xmit+ 0x18e / 0x4b0 [ 203.886454 ] ip_queue_xmit+ 0x19 / 0x20 [ 203.886456 ] __tcp_transmit_skb+ 0xb2d / 0xcd0 [ 203.886462 ] ? srso_return_thunk+ 0x5 / 0x10 [ 203.886469 ] tcp_write_xmit+ 0x565 / 0x1620 [ 203.886474 ] tcp_push_one+ 0x40 / 0x50 [ 203.886476 ] tcp_sendmsg_locked+ 0x350 / 0xee0 [ 203.886481 ] ? tcp_current_mss+ 0x75 / 0xd0 [ 203.886488 ] tcp_sendmsg+ 0x31 / 0x50 [ 203.886491 ] inet_sendmsg+ 0x47 / 0x80 [ 203.886498 ] sock_write_iter+ 0x163 / 0x190 [ 203.886507 ] vfs_write+ 0x342 / 0x3f0 [ 203.886517 ] ksys_write+ 0xb9 / 0xf0 [ 203.886520 ] __x64_sys_write+ 0x1d / 0x30 [ 203.886522 ] do_syscall_64+ 0x60 / 0x90 [ 203.886528 ] ? srso_return_thunk+ 0x5 / 0x10 [ 203.886531 ] ? ksys_write+ 0xb9 / 0xf0 [ 203.886532 ] ? srso_return_thunk+ 0x5 / 0x10 [ 203.886535 ] ? exit_to_user_mode_prepare+ 0x35 / 0x180 [ 203.886542 ] ? srso_return_thunk+ 0x5 / 0x10 [ 203.886544 ] ? syscall_exit_to_user_mode+ 0x38 / 0x50 [ 203.886549 ] ? __x64_sys_write+ 0x1d / 0x30 [ 203.886551 ] ? srso_return_thunk+ 0x5 / 0x10 [ 203.886553 ] ? do_syscall_64+ 0x6d / 0x90 [ 203.886556 ] ? srso_return_thunk+ 0x5 / 0x10 [ 203.886558 ] ? syscall_exit_to_user_mode+ 0x38 / 0x50 [ 203.886561 ] ? srso_return_thunk+ 0x5 / 0x10 [ 203.886564 ] ? do_syscall_64+ 0x6d / 0x90 [ 203.886566 ] ? __x64_sys_write+ 0x1d / 0x30 [ 203.886568 ] ? srso_return_thunk+ 0x5 / 0x10 [ 203.886570 ] ? do_syscall_64+ 0x6d / 0x90 [ 203.886572 ] ? srso_return_thunk+ 0x5 / 0x10 [ 203.886575 ] ? sysvec_apic_timer_interrupt+ 0x52 / 0x90 [ 203.886578 ] entry_SYSCALL_64_after_hwframe+ 0x6e / 0xd8





From kernel 5.14, the following patch introduced a while loop for the virtio-net TX path which may enter infinite when VQ is broken (e.g., device is removed) under heavy traffic:

Copy Copied! commit a7766ef18b33674fa164e2e2916cef16d4e17f43 Author: Michael S. Tsirkin <mst @redhat .com> Date: Tue Apr 13 01 : 30 : 45 2021 - 0400 virtio_net: disable cb aggressively There are currently two cases where we poll TX vq not in response to a callback: start xmit and rx napi. We currently do this with callbacks enabled which can cause extra interrupts from the card. Used not to be a big issue as we run with interrupts disabled but that is no longer the case , and in some cases the rate of spurious interrupts is so high linux detects this and actually kills the interrupt. Fix up by disabling the callbacks before polling the tx vq. Signed-off-by: Michael S. Tsirkin <mst @redhat .com>





Currently, there is no official fix from the kernel side, some The following workarounds may be employed:

Use kernel without the offending kernel patches

Stop heavy traffic while performing unplug

When probing the virtio-pci and virtio-net kernel modules while running Ubuntu 22.04 with kernel 5.15.0-88/89-generic with any virtio function (i.e, PF or VF), the guest OS hangs and prints call traces as follows:

Collapse Source Copy Copied! [ 2052.109566 ] CPU: 0 PID: 1183 Comm: systemd-udevd Tainted: P O L 5.15 . 0 - 88 -generic # 98 -Ubuntu [ 2052.109568 ] Hardware name: Red Hat KVM, BIOS 1.15 . 0 - 2 .module+el8. 6.0 + 14757 +c25ee005 04 / 01 / 2014 [ 2052.109570 ] RIP: 0010 :virtqueue_is_broken+ 0x9 / 0x20 [ 2052.109579 ] RSP: 0018 :ffffc206423a79c0 EFLAGS: 00000246 [ 2052.109581 ] RAX: 0000000000000000 RBX: ffff9e8980bfa980 RCX: 0000000000000a20 [ 2052.109582 ] RDX: 0000000000000000 RSI: ffffc206423a79cc RDI: ffff9e89847b9000 [ 2052.109583 ] RBP: ffffc206423a7a60 R08: 0000000000000000 R09: 0000000000000003 [ 2052.109584 ] R10: 0000000000000003 R11: 0000000000000002 R12: ffffc206423a79f0 [ 2052.109585 ] R13: 0000000000000002 R14: 0000000000000004 R15: ffff9e8984667400 [ 2052.109586 ] FS: 00007f3e295388c0( 0000 ) GS:ffff9e89bbc00000( 0000 ) knlGS: 0000000000000000 [ 2052.109588 ] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2052.109590 ] CR2: 0000555613432be0 CR3: 0000000116af0002 CR4: 0000000000170ef0 [ 2052.109593 ] Call Trace: [ 2052.109595 ] <IRQ> [ 2052.109598 ] ? show_trace_log_lvl+ 0x1d6 / 0x2ea [ 2052.109605 ] ? show_trace_log_lvl+ 0x1d6 / 0x2ea [ 2052.109609 ] ? _virtnet_set_queues+ 0xbb / 0x100 [virtio_net] [ 2052.109615 ] ? show_regs.part. 0 + 0x23 / 0x29 [ 2052.109618 ] ? show_regs.cold+ 0x8 / 0xd [ 2052.109621 ] ? watchdog_timer_fn+ 0x1be / 0x220 [ 2052.109625 ] ? lockup_detector_update_enable+ 0x60 / 0x60 [ 2052.109627 ] ? __hrtimer_run_queues+ 0x107 / 0x230 [ 2052.109631 ] ? kvm_clock_get_cycles+ 0x11 / 0x20 [ 2052.109637 ] ? hrtimer_interrupt+ 0x101 / 0x220 [ 2052.109640 ] ? __sysvec_apic_timer_interrupt+ 0x61 / 0xe0 [ 2052.109644 ] ? sysvec_apic_timer_interrupt+ 0x7b / 0x90 [ 2052.109650 ] </IRQ> [ 2052.109650 ] <TASK> [ 2052.109651 ] ? asm_sysvec_apic_timer_interrupt+ 0x1b / 0x20 [ 2052.109655 ] ? virtqueue_is_broken+ 0x9 / 0x20 [ 2052.109656 ] ? virtnet_send_command+ 0x105 / 0x170 [virtio_net] [ 2052.109660 ] _virtnet_set_queues+ 0xbb / 0x100 [virtio_net] [ 2052.109670 ] virtnet_probe+ 0x4ca / 0xa10 [virtio_net] [ 2052.109674 ] virtio_dev_probe+ 0x1ae / 0x260 [ 2052.109676 ] really_probe+ 0x222 / 0x420 [ 2052.109679 ] __driver_probe_device+ 0xe8 / 0x140 [ 2052.109681 ] driver_probe_device+ 0x23 / 0xc0 [ 2052.109683 ] __driver_attach+ 0xf7 / 0x1f0 [ 2052.109685 ] ? __device_attach_driver+ 0x140 / 0x140 [ 2052.109687 ] bus_for_each_dev+ 0x7f / 0xd0 [ 2052.109691 ] driver_attach+ 0x1e / 0x30 [ 2052.109693 ] bus_add_driver+ 0x148 / 0x220 [ 2052.109695 ] driver_register+ 0x95 / 0x100 [ 2052.109697 ] register_virtio_driver+ 0x20 / 0x40 [ 2052.109698 ] virtio_net_driver_init+ 0x74 / 0x1000 [virtio_net] [ 2052.109702 ] ? 0xffffffffc0d6f000 [ 2052.109704 ] do_one_initcall+ 0x49 / 0x1e0 [ 2052.109709 ] ? kmem_cache_alloc_trace+ 0x19e / 0x2e0 [ 2052.109713 ] do_init_module+ 0x52 / 0x260 [ 2052.109716 ] load_module+ 0xb2b / 0xbc0 [ 2052.109718 ] __do_sys_finit_module+ 0xbf / 0x120 [ 2052.109721 ] __x64_sys_finit_module+ 0x18 / 0x20 [ 2052.109722 ] do_syscall_64+ 0x5c / 0xc0 [ 2052.109725 ] ? do_syscall_64+ 0x69 / 0xc0 [ 2052.109726 ] ? syscall_exit_to_user_mode+ 0x35 / 0x50 [ 2052.109729 ] ? __x64_sys_newfstatat+ 0x1c / 0x30 [ 2052.109733 ] ? do_syscall_64+ 0x69 / 0xc0 [ 2052.109735 ] entry_SYSCALL_64_after_hwframe+ 0x62 / 0xcc





There is a bug in upstream version v6.5-rc4, which is fixed in v6.5-rc7. Canonical backported the problematic patch to Ubuntu 5.15.0-88/89.generic, which triggers this Virtio-net deadlock issue:

Copy Copied! commit 51b813176f098ff61bd2833f627f5319ead098a5 Author: Jason Wang <jasowang @redhat .com> Date: Wed Aug 9 23 : 12 : 56 2023 - 0400 virtio-net: set queues after driver_ok Commit 25266128fe16 ("virtio-net: fix race between set queues and probe") tries to fix the race between set queues and probe by calling _virtnet_set_queues() before DRIVER_OK is set. This violates virtio spec. Fixing this by setting queues after virtio_device_ready(). Note that rtnl needs to be held for userspace requests to change the number of queues. So we are serialized in this way. Fixes: 25266128fe16 ( "virtio-net: fix race between set queues and probe" ) Reported-by: Dragos Tatulea <dtatulea @nvidia .com> Acked-by: Michael S. Tsirkin <mst @redhat .com> Signed-off-by: Jason Wang <jasowang @redhat .com> Signed-off-by: David S. Miller <davem @davemloft .net>

Switch default kernel back to another version (e.g., 5.15.0-79-generic).

Note From 5.15.0-90-generic, the Ubuntu official kernel has the issue fixed.

There are multiple ways to switch the default kernel. The following is only one example:

Note Users must have root permission before proceeding.