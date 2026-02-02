To query the packet counters, use stats command.

Copy Copied! [dpu]# virtnet stats [-h] {[-p PF] [-v VF] | [-u VUID]} [-q QUEUE_ID]

Info The options --pf , --vf and --vuid are mutually exclusive, but one of them must be applied.

Option Abbr Argument Type Required Description --help -h N/A No Show the help message and exit --pf -p Number No Unique device ID for the PF. Can be retrieved by using virtnet list . --vf -v Number No Unique device ID for the VF. Can be retrieved by using virtnet list . --vuid -u String No Unique device SN for the device (PF/VF). Can be retrieved by using virtnet list . --queue_id -q Number No Queue index of the device RQs or SQs

Note This command is recommended for obtaining all packet counter information. The existing packet counter information available through the virtnet list and virtnet query commands will be deprecated in the future.

The following command queries PF 0 and VQ 0 (i.e., RQ):

Copy Copied! [dpu]# virtnet stats -p 0 -q 0

Output:

Collapse Source Copy Copied! # virtnet stats -p 0 -q 0 { 'pf' : '0x0' , 'queue_id' : '0x0' } { "device" : { "pf_id" : 0 , "packet_counters" : "Enabled" , "queues-stats" : [ { "VQ Index" : 0 , "rx_64_or_less_octet_packets" : 0 , "rx_65_to_127_octet_packets" : 259 , "rx_128_to_255_octet_packets" : 0 , "rx_256_to_511_octet_packets" : 0 , "rx_512_to_1023_octet_packets" : 0 , "rx_1024_to_1522_octet_packets" : 0 , "rx_1523_to_2047_octet_packets" : 0 , "rx_2048_to_4095_octet_packets" : 199 , "rx_4096_to_8191_octet_packets" : 0 , "rx_8192_to_9022_octet_packets" : 0 , "received_desc" : "4096" , "completed_desc" : "0" , "bad_desc_errors" : "0" , "error_cqes" : "0" , "exceed_max_chain" : "0" , "invalid_buffer" : "0" , "batch_number" : "64" , "dma_q_used_number" : "0" , "handler_schd_number" : "44" , "aux_handler_schd_number" : "43" , "max_post_desc_number" : "0" , "total_bytes" : "0" , "err_handler_schd_num" : "0" , "rq_cq_max_count" : "0" , "rq_cq_period" : "0" , "rq_cq_period_mode" : "1" } ] } }

The output has two sections.

The first section, wrapped by device , are device details along with the packet counter statics enable state. Entry Type Description device String Entries under this section is per device information pf_id String Physical function ID packet_counters String packet counters feature: enabled/disabled

The second section, wrapped by queues-stats , are information for each receive VQ. Entry Type Description VQ Index Number The VQ index starts at 0 (the first RQ) and continues up to the last SQ rx_64_or_less_octet_packets Number The number of packets received with a size of 0 to 64 bytes. Relevant for BlueField-3 RQ when packet counter is enabled. rx_65_to_127_octet_packets Number The number of packets received with a size of 65 to 127 bytes. Relevant for BlueField-3 RQ when packet counter is enabled. rx_128_to_255_octet_packets Number The number of packets received with a size of 128 to 255 bytes. Relevant for BlueField-3 RQ when packet counter is enabled. rx_256_to_511_octet_packets Number The number of packets received with a size of 256 to 511 bytes. Relevant for BlueField-3 RQ when packet counter is enabled. rx_512_to_1023_octet_packets Number The number of packets received with a size of 512 to 1023 bytes. Relevant for BlueField-3 RQ when packet counter is enabled. rx_1024_to_1522_octet_packets Number The number of packets received with a size of 1024 to 1522 bytes. Relevant for BlueField-3 RQ when packet counter is enabled. rx_1523_to_2047_octet_packets Number The number of packets received with a size of 1523 to 2047 bytes. Relevant for BlueField-3 RQ when packet counter is enabled. rx_2048_to_4095_octet_packets Number The number of packets received with a size of 2048 to 4095 bytes. Relevant for BlueField-3 RQ when packet counter is enabled. rx_4096_to_8191_octet_packets Number The number of packets received with a size of 4096 to 8191 bytes. Relevant for BlueField-3 RQ when packet counter is enabled. rx_8192_to_9022_octet_packets Number The number of packets received with a size of 8192 to 9022 bytes. Relevant for BlueField-3 RQ when packet counter is enabled. received_desc Number Total number of received descriptors by the device on this VQ completed_desc Number Total number of completed descriptors by the device on this VQ bad_desc_errors Number Total number of bad descriptors received on this VQ error_cqes Number Total number of errors CQ entries on this VQ exceed_max_chain Number Total number of chained descriptors received that exceed the max allowed chain by the device invalid_buffer Number Total number of times device tried to read or write buffer that is not registered to the device batch_number Number The number of RX descriptors for the last received packet. Relevant for BlueField-3. dma_q_used_number Number The DMA q index used for this VQ. Relevant for BlueField-3. handler_schd_number Number Scheduler number for this VQ. Relevant for BlueField-3. aux_handler_schd_number Number Aux scheduler number for this VQ. Relevant for BlueField-3. max_post_desc_number Number Maximum number of posted descriptors on this VQ. Relevant for DPA. total_bytes Number Total number of bytes handled by this VQ. Relevant for BlueField-3. rq_cq_max_count Number Event generation moderation counter of the queue. Relevant for RQ. rq_cq_period Number Event generation moderation timer for the queue in 1 µ sec granularity. Relevant for RQ. rq_cq_period_mode Number Current period mode for RQ 0x0 – default_mode – use device best defaults 0x1 – upon_event – queue_period timer restarts upon event generation 0x2 – upon_cqe – queue_period timer restarts upon completion generation The second section wrapped by queues-stats IS information for each receive VQ.



To query Rx VQ statistics, use the corresponding VQ index. For example, If there are 3 queues configured then to query Rx, VQ uses queue 0, Tx VQ uses queue 1, and Ctrl VQ uses queue 2.

The following is the command to query PF 0, VF 0, and VQ 0 (i.e., Rx).

Copy Copied! [dpu]# virtnet query -p 0 -v 0 -q 0

Output:

Copy Copied! "enabled-queues-info" : [ { "index" : "0" , "size" : "256" , "msix_vector" : "0x1" , "enable" : "1" , "notify_offset" : "0" , "descriptor_address" : "0xffffe000" , "driver_address" : "0xfffff000" , "device_address" : "0xfffff240" , "received_desc" : "256" , "completed_desc" : "19" , "bad_desc_errors" : "0" , "error_cqes" : "0" , "exceed_max_chain" : "0" , "invalid_buffer" : "0" , "batch_number" : "64" , "dma_q_used_number" : "0" , "handler_schd_number" : "4" , "aux_handler_schd_number" : "3" , "max_post_desc_number" : "0" , "total_bytes" : "6460" , "rq_cq_max_count" : "0" , "rq_cq_period" : "0" , "rq_cq_period_mode" : "1" }

The following are some of the important VQ counters:

Counter Name Description total_bytes Number of bytes received received_desc Number of available descriptors received by device completed_desc Number of available descriptors completed by the device error_cqes Number of error CQEs received on the queue bad_desc_errors Number of bad descriptors received exceed_max_chain Number of chained descriptors received that exceed the max allowed chain by device invalid_buffer Number of times device tried to read or write buffer that is not registered to the device

When DPA is the data path provider, each RQ has its corresponding drop counter, which counts the number of packets dropped inside the DPA virtio RQs.

Info The drop could also happen from the uplink or SF.

The drop counter only increments (initial value being 0), and its value gets reset to 0 when disabled.

RQ drop counter can be enabled and disabled as follows (using VF 0 on PF 0):

Copy Copied! [dpu]# virtnet modify -p 0 -v 0 device -dc enable [dpu]# virtnet modify -p 0 -v 0 device -dc disable

Note Drop counter is attached to a RQ, thus RQ must be created first. This means that the virtio-net device should be probed by the driver on the host OS before running the commands above.

To query the drop counter value(s), run:

Copy Copied! [dpu]# virtnet query -p 0 -v 0 | grep num_desc_drop_pkts

If there are more than one RQ for a device, the drop count is the sum of all RQ's value.

Note Relevant for BlueField-3 only.

The packet counter feature helps the user query the byte-wise packet counters for each Rx queue.

By default, byte-wise packet counters are disabled as that negatively impacts performance. When the user is interested in the debug, enable the packet counter feature using the below command

Packet counter can be enabled and disabled as follows (using VF 0 on PF 0):

Copy Copied! [dpu]# virtnet modify -p 0 -v 0 device -pkt_cnt enable [dpu]# virtnet modify -p 0 -v 0 device -pkt_cnt disable

When enabled, byte-wise packet counters are initialized to zero.

When disabled, the previous values are retained for debugging purposes. The command will still return these old, disabled counter values.

Note Packet counters are attached to an RQ. Thus, RQ must be created first. This means that the virtio-net device should be probed by the driver on the host OS before running the commands above.





Note Relevant for BlueField-3 only.

The health statistics are for displaying real-time health information of a specific device.

Output example (using VF 0 on PF 0):

Collapse Source Copy Copied! [dpu]# virtnet health -p 0 -v 0 show { "pf_id" : 0 , "vf_id" : 0 , "type" : "VF" , "vuid" : "MT2306XZ00BPVNETS0D0F2" , "dev_status" : { "value" : "0xf" , " 0" : "ACK" , " 1" : "DRIVER" , " 2" : "DRIVER_OK" , " 3" : "FEATURES_OK" }, "health_status" : "Good" , "health_recover_counter" : 0 , "dev_health_details" : { "control_plane_errors" : { "sf_rqt_update_err" : 0 , "sf_drop_create_err" : 0 , "sf_tir_create_err" : 0 , "steer_rx_domain_err" : 0 , "steer_rx_table_err" : 0 , "sf_flows_apply_err" : 0 , "aarfs_flow_init_err" : 0 , "vlan_flow_init_err" : 0 , "drop_cnt_config_err" : 0 }, "data_plane_errors" : { "sq_stall" : 0 , "dma_q_stall" : 0 , "spurious_db_invoke" : 0 , "aux_not_invoked" : 0 , "dma_q_errors" : 0 , "host_read_errors" : 0 } }

Where

health_status represents the overall status of the device ( Good or Fatal )

dev_health_details has two sections, control_plane_errors and data_plane_errors , as explained in the following table: Counter Name Description Control Plane Errors sf_rqt_update_err Counter tallying receive queue table update failures sf_drop_create_err Counter tallying drop RQ creation failures sf_tir_create_err Counter tallying TIR create failures steer_rx_domain_err Counter tallying RX steering rule creation failures steer_rx_table_err Counter tallying RX table creation failures sf_flows_apply_err Counter tallying packet flow rule creation failures aarfs_flow_init_err Counter tallying packet flow initialization failures vlan_flow_init_err Counter tallying VLAN flow rule initialization failures drop_cnt_config_err Counter tallying drop counter configuration failures Data Plane Errors sq_stall One or more network send queues stalled without getting completions. This leads traffic stalling for packets flowing over this VQ. dma_q_stall QP which is paired to itself issues a read request from the DPA to the host to read either available index or descriptor table. This request does not result in a completion and hangs in a loop waiting for a response. spurious_db_invoke Doorbell handler is repeatedly invoked but DPA finds no new data to be read and posted. This could be due to a faulty driver or issue on the DPA side. aux_not_invoked To speed up descriptor processing, an auxiliary execution (EU) unit is used if available. The primary thread invokes this EU and waits for the expected thread to run on the auxiliary execution unit. If this EU is not invoked, the primary thread hangs. dma_q_errors QP which is paired to itself issues a read request from the DPA to the host to read either an available index or the descriptor table. This request results in an error and the QP becomes unavailable. An internal mechanism detects this error QP and recycles it for use at later stage.

Dynamic Interrupt Moderation (DIM) adjusts the interrupt moderation settings to optimize packet processing. For guest OS kernels older than version 6.8, DIM offloads this function to the DPU, reducing the interrupt rate from the guest OS.

By lowering the interrupt rate in high-bandwidth traffic scenarios, DIM enhances CPU utilization for both the hypervisor and guest VMs, while maintaining nearly the same bandwidth.

Note DIM is only supported on BlueField-3.

For example, the following table shows the benefit of using DIM:

Tx Interrupt Rate (K irq/s) Rx Interrupt Rate (K irq/s) Tx Throughput (Gb/s) Rx Throughput (Gb/s) DIM Enabled 7.3 7.5 171 181 DIM Disabled 7.5 23.7 175 181

The following test parameters:

Guest OS kernel version – 5.11.0

Number of virtio-net device – 1

Number of QPs – 31

Queue depth – 1024

MTU – 1500

Benchmark – iPerf with 31 streams

DIM is a per-device configuration. To enable or disable it, use this command:

Copy Copied! [dpu]# virtnet modify -p <pf> [-v <vf>] device -dim {enable | disable}

Configuration example:

Unload drivers from the guest-OS side: Copy Copied! [host]# modprobe -rv virtio_net && modprobe -rv virtio_pci Enable DIM: Copy Copied! [dpu]# virtnet modify -p 0 device -dim enable {'pf': '0x0', 'all': '0x0', 'subcmd': '0x0', 'dim_config': 'enable'} { "errno": 0, "errstr": "Success" } Info Using disable disables DIM. Load the drivers: Copy Copied! [host]# modprobe -v virtio_pci && modprobe -v virtio_net Query the device to verify dim is enabled: Copy Copied! [dpu]# virtnet query -p 0 -b | grep -i dim "dim": "enabled"

High availability (HA) is essential in network infrastructure to ensure continuous performance with minimal downtime, even during failures.

To support HA, the virtio-net-controller process creates the auxiliary processes virtio-net-emu and virtio-net-ha . The virtio-net-emu process handles primary controller functions, while virtio-net-ha manages HA. virtio-net-ha saves and oversees critical resources from virtio-net-emu and restores it to a working state if a failure occurs. The two processes communicate through IPC messages.

Note High availability is only supported on BlueField-3 and after.

The following table provides possible expected behaviors:

Scenarios Behavior Downtime Per Device (sec) Fallback Action Virtio-net-emu process crashes (e.g., Segfault) The virtio-net-ha process tries to automatically recover all devices < 1 The virtnet restart command if recovery failed Device/VQ/SF create/destroy failures HA makes sure the existing device is not affected N/A Retry or restart service DPA command timeout No action from HA; DPA is likely stuck N/A The virtnet restart command

Jumbo MTU is critical for increasing the efficiency of Ethernet and network processing by reducing the protocol overhead (ratio of headers and payload size).

To enable support for jumbo MTU, run the following virtnet command:

Copy Copied! [dpu]# virtnet modify -p 0 -v 0 device -t 9216

Info The example sets the MTU to 9126 for VF 0 on PF 0.

Jumbo MTU is only supported starting from the following version:

Release Upstream VM kernel: 4.18.0-193.el8.x86_64 ( VM Linux version supports big MTU after 4.11 ) Ubuntu DOCA_2.5.0_BSP_4.5.0_Ubuntu_22.04 Virtnet controller v1.7 or v1.6.26

To configure jumbo MTU (e.g., using VF 0 on PF 0):

Change the MTU of the uplink and SF representor from the BlueField: Copy Copied! [dpu]# ifconfig p0 mtu 9216 [dpu]# ifconfig en3f0pf0sf3000 mtu 9216 If a bond is configured, change the MTU of the bond rather than p0 : Copy Copied! [dpu]# ifconfig bond0 mtu 9216 [dpu]# ifconfig en3f0pf0sf3000 mtu 9216 Restart the virtio-net-controller from the BlueField: Copy Copied! [dpu]# systemctl restart virtio-net-controller Unload the virtio driver from the host OS: Copy Copied! [host]# modprobe -rv virtio-net Change the corresponding device MTU on the BlueField: Copy Copied! [dpu]# virtnet modify -p 0 -v 0 device -t 9216 Reload virtio driver from the host OS: Copy Copied! [host]# modprobe -v virtio-net Check virtqueue MTU configuration is correct on the BlueField: Copy Copied! [dpu]# virtnet query -p 0 -v 0 --dbg_stats | grep jumbo_mtu "jumbo_mtu": 1 "jumbo_mtu": 1 Change the MTU of virtio-net interface from the host OS: Copy Copied! [host]# ifconfig <vnet> mtu 9216

It is common to use link aggregation (LAG) or bond interfaces to increase reliability, availability, or bandwidth of networking devices. Virtio-net devices support this mode via DPU-side LAG configurations.

To configure the virtio-net-controller in LAG mode must follow a specific procedure due to the dependency on mlx5 RDMA device:

Stop the virtio-net-controller to avoid resource leakage (which would be caused by LAG destroying the existing mlx5 RDMA device and creating a new bond RDMA device). Copy Copied! [dpu]# systemctl stop virtio-net-controller.service Configure the LAG interface for two uplink interfaces from the DPU side. Refer to the " Link Aggregation " page for detailed steps. Note The virtio-net-controller service starts by default. If DPU is rebooted during LAG configuration, it is necessary to stop the controller before creating a bond interfaces from the DPU side. Update the controller configuration file to use bond interface. Copy Copied! [dpu]# cat /opt/mellanox/mlnx_virtnet/virtnet.conf { "ib_dev_lag": "mlx5_bond_0", "ib_dev_for_static_pf": "mlx5_bond_0", "is_lag": 1, } Info Refer to page "Configuration File" for details. Start the controller for the new configuration to take effect. Copy Copied! [dpu]# systemctl start virtio-net-controller.service

Virtio VF PCIe devices can be attached to the guest VM using the vhost acceleration software stack. This enables performing live migration of guest VMs.

This section provides the steps to enable VM live migration using virtio VF PCIe devices along with vhost acceleration software.

Minimum hypervisor kernel version – Linux kernel 5.15 (for VFIO SR-IOV support)

To use high-availability (the additional vfe-vhostd-ha service which can persist datapath when vfe-vhostd crashes), this kernel patch must be applied.

Vhost acceleration software stack is built using open-source BSD licensed DPDK.

To install vhost acceleration software:

Clone the software source code: Copy Copied! [host]# git clone https://github.com/Mellanox/dpdk-vhost-vfe Info The latest release tag is vfe-24.10.0-rc2 . Build software: Copy Copied! [host]# apt-get install libev-dev -y [host]# apt-get install libev-libevent-dev -y [host]# apt-get install uuid-dev -y [host]# apt-get install libnuma-dev -y [host]# meson build --debug -Denable_drivers=vdpa/virtio,common/virtio,common/virtio_mi,common/virtio_ha [host]# ninja -C build install

To install QEMU: Info Upstream QEMU later than 8.1 can be used or the following NVIDIA QEMU.

Clone NVIDIA QEMU sources. Copy Copied! [host]# git clone git@github.com:Mellanox/qemu.git -b stable-8.1-presetup [host]# git checkout 24aaba9255 Info Latest stable commit is 24aaba9255 . Build NVIDIA QEMU. Copy Copied! [host]# mkdir bin [host]# cd bin [host]# ../configure --target-list=x86_64-softmmu --enable-kvm [host]# make -j24

Configure 1G huge pages : Copy Copied! [host]# mkdir /dev/hugepages1G [host]# mount -t hugetlbfs -o pagesize=1G none /dev/hugepages1G [host]# echo 16 > /sys/devices/system/node/node0/hugepages/hugepages-1048576kB/nr_hugepages [host]# echo 16 > /sys/devices/system/node/node1/hugepages/hugepages-1048576kB/nr_hugepages Enable qemu:commandline in VM XML by adding the xmlns:qemu option: Copy Copied! < domain type = 'kvm' xmlns:qemu = 'http://libvirt.org/schemas/domain/qemu/1.0' > Assign a memory amount and use 1GB page size for huge pages in VM XML: Copy Copied! < memory unit = 'GiB' >4</ memory > < currentMemory unit = 'GiB' >4</ currentMemory > < memoryBacking > < hugepages > < page size = '1' unit = 'GiB' /> </ hugepages > </ memoryBacking > Set the memory access for the CPUs to be shared: Copy Copied! < cpu mode = 'custom' match = 'exact' check = 'partial' > < model fallback = 'allow' >Skylake-Server-IBRS</ model > < numa > < cell id = '0' cpus = '0-1' memory = '4' unit = 'GiB' memAccess = 'shared' /> </ numa > </ cpu > Add a virtio-net interface in VM XML: Copy Copied! < qemu :commandline> < qemu :arg value = '-chardev' /> < qemu :arg value = 'socket,id=char0,path=/tmp/vhost-net0,server=on' /> < qemu :arg value = '-netdev' /> < qemu :arg value = 'type=vhost-user,id=vhost1,chardev=char0,queues=4' /> < qemu :arg value = '-device' /> < qemu :arg value = 'virtio-net-pci,netdev=vhost1,mac=00:00:00:00:33:00,vectors=10,page-per-vq=on,rx_queue_size=1024,tx_queue_size=1024,mq=on,disable-legacy=on,disable-modern=off' /> </ qemu :commandline>

Bind the virtio PF devices to the vfio-pci driver: Copy Copied! [host]# modprobe vfio vfio_pci [host]# echo 1 > /sys/module/vfio_pci/parameters/enable_sriov [host]# echo 0x1af4 0x1041 > /sys/bus/pci/drivers/vfio-pci/new_id [host]# echo 0x1af4 0x1042 > /sys/bus/pci/drivers/vfio-pci/new_id [host]# echo <pf_bdf> > /sys/bus/pci/drivers/virtio-pci/unbind [host]# echo <vf_bdf> > /sys/bus/pci/drivers/virtio-pci/unbind [host]# echo <pf_bdf> > /sys/bus/pci/drivers/vfio-pci/bind [host]# echo <vf_bdf> > /sys/bus/pci/drivers/vfio-pci/bind [host]# lspci -vvv -s <pf_bdf> | grep "Kernel driver" Kernel driver in use: vfio-pci [host]# lspci -vvv -s <vf_bdf> | grep "Kernel driver" Kernel driver in use: vfio-pci Info Example of <pf_bdf> or <vf_bdf> format: 0000:af:00.3 Run the vhost acceleration software service by starting the vfe-vhostd service: Copy Copied! [host]# systemctl start vfe-vhostd Info A log of the service can be viewed by running the following: Copy Copied! [host]# journalctl -u vfe-vhostd Provision the virtio-net PF: Copy Copied! [host]# /usr/local/bin/vfe-vhost-cli mgmtpf -a <pf_bdf> Wait on the virtio-net-controller to finish handling PF FLR. Enable SR-IOV and create a VF (or more): Copy Copied! [host]# echo 1 > /sys/bus/pci/devices/<pf_bdf>/sriov_numvfs [host]# lspci | grep Virtio 0000:af:00.1 Ethernet controller: Red Hat, Inc. Virtio network device 0000:af:00.3 Ethernet controller: Red Hat, Inc. Virtio network device Add a VF representor to the OVS bridge on the BlueField: Copy Copied! [dpu]# virtnet query -p 0 -v 0 | grep sf_rep_net_device "sf_rep_net_device": "en3f0pf0sf3000", [dpu]# ovs-vsctl add-port ovsbr1 en3f0pf0sf3000 Provision the virtio-net VF: On BlueField, change VF MAC address or other device options: Copy Copied! [dpu]# virtnet modify -p 0 -v 0 device -m 00 : 00 : 00 : 00 : 33 : 00 Add VF into vfe-dpdk Copy Copied! [host]# /usr/local/bin/vfe-vhost-cli vf -a <vf_bdf> -v /tmp/vhost-net0 Note If the SR-IOV is disabled and reenabled, the user must re-provision the VFs. 00:00:00:00:33:00 is a virtual MAC address used in VM XML.

Copy Copied! [host]# virsh start <vm_name>





Running the vfe-vhostd-ha service allows the datapath to persist should vfe-vhostd crash:

Copy Copied! [host]# systemctl start vfe-vhostd-ha





Prepare two identical hosts and perform the provisioning of the virtio device to DPDK on both. Boot the VM on one server: Copy Copied! [host]# virsh migrate --verbose --live --persistent <vm_name> qemu+ssh://<dest_node_ip_addr>/system --unsafe

When finished with the virtio devices, use following commands to remove them from DPDK:

Copy Copied! [host]# /usr/local/bin/vfe-vhost-cli vf -r <vf_bdf> [host]# /usr/local/bin/vfe-vhost-cli mgmtpf -r <pf_bdf>

Live update minimizes network interface downtime by performing online upgrade of the virtio-net controller without necessitating a full restart.

To perform a live update, the user must install a newer version of the controller either using the rpm or deb package (depending on the OS distro used). Run:

For Ubuntu/Debian Copy Copied! [dpu]# dpkg --force-all -i virtio-net-controller-x.y.z-1.mlnx.aarch64.deb For CentOS/RedHat Copy Copied! [dpu]# rpm -Uvh virtio-net-controller-x.y.z-1.mlnx.aarch64.rpm --force

Before staring live update, the following command can be used to check the version of the original and destination controllers:

Copy Copied! [dpu]# virtnet version { "Original Controller": "v24.10.13" }, { "Destination Controller": "v24.10.16" }





If no errors occur, issue the following command to start the live update process:

Copy Copied! [dpu]# virtnet update -s

Note If an error indicates that the update command is unsupported, this means the controller version you are attempting to install is outdated. Reinstalling the correct version resolves the issue.





During the update process, the following command may be used to check the update status:

Copy Copied! [dpu]# virtnet update -t

Example output:

Copy Copied! { "status": "inactive", # updating status, whether live update is finished or ongoing "last live update status": "success", # last live update status "time_used (s)": 1.655439 # time cost for last live update }

Note During the update, it is recommended to not issue any virtnet CLI command.

When the update process completes successfully, the command virtnet update status reflects the status accordingly

Note If a device is actively migrating, the existing virtnet commands appear as "migrating" for that specific device so that the user can retry later.

Note When live update is in progress, hotplug/unplug and VF creation/deletion are not supported.

When negotiating with the driver, mergeable buffers is a mode where multiple descriptors are posted to fit a single jumbo sized packet coming from the wire. This is a receive-side only feature which helps im prove performance in situations of a large MTU (e.g., 9K).

Enabling and using mergeable buffers requires updating the configuration file along with advertising feature bits from the controller side as described in the following subsections.

To enable or disable the mergeable Rx buffer feature, set the mrg_rxbuf attribute in the virtnet.conf configuration file to 1 or 0 respectively.

For example, to enable mergeable Rx buffer:

Copy Copied! [dpu]# cat /opt/mellanox/mlnx_virtnet/virtnet.conf { ... "mrg_rxbuf": 1 ... }

Note Updating the configuration file requires a restart of the virtio-net-controller.

Info Refer to "Configuration File" page for more information.





Mergeable buffer is a per-device feature.

Users must query a device to check if VIRTIO_F_MRG_RX_BUFFER is available. For example, the following PF 0 does not support mergeable buffer: Collapse Source Copy Copied! [dpu]# virtnet query -p 0 -b {'all': '0x0', 'pf': '0x0', 'dbg_stats': '0x0', 'brief': '0x1', 'latency_stats': '0x0', 'stats_clear': '0x0'} { "devices": [ { "pf_id": 0, "transitional": 0, "vuid": "MT2251X00020VNETS1D0F0", "pci_bdf": "86:00.0", "pci_dev_id": "0x1041", "pci_vendor_id": "0x1af4", "pci_class_code": "0x20000", "pci_subsys_id": "0x1", "pci_subsys_vendor_id": "0x1af4", "pci_revision_id": "1", "pci_max_vfs": "0", "enabled_vfs": "0", "device_feature": { "value": "0x8900010300e7182f", " 0": "VIRTIO_NET_F_CSUM", " 1": "VIRTIO_NET_F_GUEST_CSUM", " 2": "VIRTIO_NET_F_CTRL_GUEST_OFFLOADS", " 3": "VIRTIO_NET_F_MTU", " 5": "VIRTIO_NET_F_MAC", " 11": "VIRTIO_NET_F_HOST_TSO4", " 12": "VIRTIO_NET_F_HOST_TSO6", " 16": "VIRTIO_NET_F_STATUS", " 17": "VIRTIO_NET_F_CTRL_VQ", " 18": "VIRTIO_NET_F_CTRL_RX", " 21": "VIRTIO_NET_F_GUEST_ANNOUNCE", " 22": "VIRTIO_NET_F_MQ", " 23": "VIRTIO_NET_F_CTRL_MAC_ADDR", " 32": "VIRTIO_F_VERSION_1", " 33": "VIRTIO_F_IOMMU_PLATFORM", " 40": "VIRTIO_F_RING_RESET", " 56": "VIRTIO_NET_F_HOST_USO", " 59": "VIRTIO_NET_F_GUEST_HDRLEN", " 63": "VIRTIO_NET_F_SPEED_DUPLEX" }, ... } To enable the feature: Make sure there is no driver loaded from the guest-OS side: Copy Copied! [host]# modprobe -rv virtio_net && modprobe -rv virtio_pci Set the 15th bit to 1 in the feature bits, and modify the device: Copy Copied! [dpu]# virtnet modify -p 0 device -f 0x8900010300e7982f {'pf': '0x0', 'all': '0x0', 'subcmd': '0x0', 'features': '0x8900010300e7982f'} { "errno": 0, "errstr": "Success" } Load the drivers from the host: Copy Copied! [host]# modprobe -v virtio_pci && modprobe -v virtio_net Query the device again, checking whether VIRTIO_F_MRG_RX_BUFFER is available. The following query shows VIRTIO_F_MRG_RX_BUFFER under device_feature and driver_feature . Now mergeable buffer is enabled on PF 0. Collapse Source Copy Copied! [dpu]# virtnet query -p 0 -b {'all': '0x0', 'pf': '0x0', 'dbg_stats': '0x0', 'brief': '0x1', 'latency_stats': '0x0', 'stats_clear': '0x0'} { "devices": [ { "pf_id": 0, "transitional": 0, "vuid": "MT2251X00020VNETS0D0F1", "pci_bdf": "85:00.1", "pci_dev_id": "0x1041", "pci_vendor_id": "0x1af4", "pci_class_code": "0x20000", "pci_subsys_id": "0x1041", "pci_subsys_vendor_id": "0x1af4", "pci_revision_id": "1", "pci_max_vfs": "0", "enabled_vfs": "0", "device_feature": { "value": "0x8900032300e7982f", " 0": "VIRTIO_NET_F_CSUM", " 1": "VIRTIO_NET_F_GUEST_CSUM", " 2": "VIRTIO_NET_F_CTRL_GUEST_OFFLOADS", " 3": "VIRTIO_NET_F_MTU", " 5": "VIRTIO_NET_F_MAC", " 11": "VIRTIO_NET_F_HOST_TSO4", " 12": "VIRTIO_NET_F_HOST_TSO6", " 15": "VIRTIO_F_MRG_RX_BUFFER", " 16": "VIRTIO_NET_F_STATUS", " 17": "VIRTIO_NET_F_CTRL_VQ", " 18": "VIRTIO_NET_F_CTRL_RX", " 21": "VIRTIO_NET_F_GUEST_ANNOUNCE", " 22": "VIRTIO_NET_F_MQ", " 23": "VIRTIO_NET_F_CTRL_MAC_ADDR", " 32": "VIRTIO_F_VERSION_1", " 33": "VIRTIO_F_IOMMU_PLATFORM", " 37": "VIRTIO_F_SR_IOV", " 40": "VIRTIO_F_RING_RESET", " 41": "VIRTIO_F_ADMIN_VQ", " 56": "VIRTIO_NET_F_HOST_USO", " 59": "VIRTIO_NET_F_GUEST_HDRLEN", " 63": "VIRTIO_NET_F_SPEED_DUPLEX" }, "driver_feature": { "value": "0x8000002300e7982f", " 0": "VIRTIO_NET_F_CSUM", " 1": "VIRTIO_NET_F_GUEST_CSUM", " 2": "VIRTIO_NET_F_CTRL_GUEST_OFFLOADS", " 3": "VIRTIO_NET_F_MTU", " 5": "VIRTIO_NET_F_MAC", " 11": "VIRTIO_NET_F_HOST_TSO4", " 12": "VIRTIO_NET_F_HOST_TSO6", " 15": "VIRTIO_F_MRG_RX_BUFFER", " 16": "VIRTIO_NET_F_STATUS", " 17": "VIRTIO_NET_F_CTRL_VQ", " 18": "VIRTIO_NET_F_CTRL_RX", " 21": "VIRTIO_NET_F_GUEST_ANNOUNCE", " 22": "VIRTIO_NET_F_MQ", " 23": "VIRTIO_NET_F_CTRL_MAC_ADDR", " 32": "VIRTIO_F_VERSION_1", " 33": "VIRTIO_F_IOMMU_PLATFORM", " 37": "VIRTIO_F_SR_IOV", " 63": "VIRTIO_NET_F_SPEED_DUPLEX" }, ... }

The number of descriptors per work queue entry depends on the MTU size. For best performance, it is recommended to not enable the feature if the MTU is set to the default value (1500).

Performance is expected to degrade with this feature when receiving small sized packets (e.g., 64 bytes) from the wire.

Mergeable buffer does not work with the packed VQ feature.

The mergeable Rx Buffer feature does not work with an MTU equal to 9216. The max MTU value is 9000.

Note NetDIM is only supported on BlueField-3

Network dynamic interrupt moderation (netDIM) adjusts interrupt moderation settings to optimize packet processing. This feature offloads DIM to virtio PCIe devices, enabling interrupt moderation on the DPU for virtio-net devices that lack netDIM support in the guest kernel.

By reducing interrupt rates during high-bandwidth traffic, DIM improves CPU utilization for both the hypervisor and guest VMs while maintaining nearly the same bandwidth.

To enable or disable netDIM, use the following virtnet command:

Copy Copied! [dpu]# virtnet modify -p <> -v <> device -netdim {enable,disable}

Note Enabling or disabling netDIM requires the driver not to be loaded.





NetDIM is enabled per-device.

To enable netDIM:

Make sure there is no driver loaded from the guest-OS side: Copy Copied! [host]# modprobe -rv virtio_net && modprobe -rv virtio_pci Enable netDIM by using the using virtnet command on the respective device: Copy Copied! [dpu]# virtnet modify -p 0 device -netdim enable {'pf': '0x0', 'all': '0x0', 'subcmd': '0x0', 'net_dim_config': 'enable'} { "errno": 0, "errstr": "Success" } Load the drivers: Copy Copied! [host]# modprobe -v virtio_pci && modprobe -v virtio_net Query the device to check whether netdim is enabled: Copy Copied! [dpu]# virtnet query -p 0 -b {'all': '0x0', 'pf': '0x0', 'dbg_stats': '0x0', 'brief': '0x1', 'latency_stats': '0x0', 'stats_clear': '0x0'} { "devices": [ { "pf_id": 0, "function_type": "static PF", "transitional": 0, ... ... "aarfs": "disabled", "netdim": "enabled" } ] }

The virtio-net driver can configure the number of combined channels via ethtool. This determines how many virtqueues (VQs) can be used for the netdev. Normally, more VQs result in better overall throughput when multi-threaded (e.g., iPerf with multiple streams).

Copy Copied! [host]# ethtool -l eth0 Channel parameters for eth0: Pre-set maximums: RX: n/a TX: n/a Other: n/a Combined: 31 Current hardware settings: RX: n/a TX: n/a Other: n/a Combined: 15

Therefore, it is common to pick a larger number (less than pre-set maximums) of channels using the following command.

Tip Normally, configuring the combined number of channels to be the same as number of CPUs available on the guest OS will yield good performance.

Copy Copied! [host]# ethtool -L eth0 combined 31 [host]# ethtool -l eth0 Channel parameters for eth0: Pre-set maximums: RX: n/a TX: n/a Other: n/a Combined: 31 Current hardware settings: RX: n/a TX: n/a Other: n/a Combined: 31





To reach the best performance, it is required to make sure each tx/rx queue has an assigned MSIX. Check the information of a particular device and make sure num_queues is less than num_msix .

Copy Copied! [dpu]# virtnet query -p 0 -b | grep -i num_ "num_msix": "64", "num_queues": "8",

If num_queues is greater than num_msix , it is necessary to change mlxconfig to reserve more MSIX than queues. It is determined by the VIRTIO_NET_EMULATION_NUM_VF_MSIX and VIRTIO_NET_EMULATION_NUM_MSIX . Please refer to the "Virtio-net Deployment" page for more information.

By default, queue depth is set to 256. It is common to use a larger queue depth (e.g., 1024). This cannot be requested from the driver side but must be done from the device side.

Refer to the "Virtnet CLI Commands" page to learn how to modify device max_queue_size .

To improve performance, the user can use jumbo MTU. Refer to "Jumbo MTU" page for information regarding MTU configuration.

Recovery is critical for status restoration (both control plane and data plane) for cases such as controller restart, live update, or live migration.

The recovery process relies on JSON files stored in /opt/mellanox/mlnx_virtnet/recovery , where each device (either PF or VF) has a corresponding file named after its unique VUID.

The following entries are saved to the recovery file and restored when necessary:

Entry Type Description port_ib_dev String RDMA device name the virtio-net device is created on pf_id Number ID of PF vf_id Number ID of VF, valid for VF only function_type String PF or VF bdf_raw Number Virtio-net device bus:device:function in uint16 type device_type String Static or hotplug (only for PF) mac String MAC address of device pf_num Number PCIe function number sf_num Number SF number which was used for this virtio-net device mq Number Number of multi-queue created for this virtio-net device

An example of recovery file for a hotplug PF device:

Copy Copied! { "port_ib_dev": "mlx5_0", "pf_id": 0, "function_type": "pf", "bdf_raw": 57611, "device_type": "hotplug", "mac": "0c:c4:7a:ff:22:93", "pf_num": 0, "sf_num": 2000, "mq": 3 }

Depending on the actions of the BlueField or host, recovery may or may not be performed. Please refer to the following table for individual scenarios:

DPU Actions Host Actions Restart Controller Live Update Hot Unplug Destroy VFs Unload Driver Power Cycle Host & DPU Warm Reboot Live Migration Static PF Recover Recover N/A N/A Recover No recover Recover Recover Hotplug PF Recover Recover No recover N/A Recover No recover Recover Recover VF Recover Recover N/A Recovery file deleted No Recover No recover No recover Recover

Note These recovery files are internal to the controller and should not be modified.

Note Controller recovery is enabled by default and does not need user configuration or intervention. When the mlxconfig settings used by the controller take effect, the newly started controller service automatically deletes all recovery files.

A transitional device is a virtio device which supports drivers conforming to virtio specification 1.x and legacy drivers operating under virtio specification 0.95 (i.e., legacy mode) so servers with old Linux kernels can still utilize virtio-based technology.

Info Currently, only transitional VF device is supported.

Note Host kernel version must be newer than v6.9.

Note When using this feature, vfe-vdpa-dpdk solutions cannot be used anymore, including vfe-vdpa-dpdk live migration.

Note Libvirt does not support the virtio_vfio_pci kernel driver. Use the QEMU command line to start the VM instead.

Configure virtio-net SR-IOV. R efer to "Virtio-net Deployment" for details. Modify configuration file to add the "lm_prov": "kernel" option. Copy Copied! [dpu]# cat /opt/mellanox/mlnx_virtnet/virtnet.conf { ... "lm_prov": "kernel", ... } Restart the virtio-net controller for the configuration to take effect: Copy Copied! [dpu]# systemctl restart virtio-net-controller.service Create virtio-net VF devices on the host: Copy Copied! [host]# modprobe -v virtio_pci [host]# modprobe -v virtio_net [host]# echo <vf_num> > /sys/bus/pci/devices/<pf_bdf>/sriov_numvfs Bind the VF devices with the virtio_vfio_pci kernel driver: Copy Copied! [host]# echo <vf_bdf> > /sys/bus/pci/devices/<vf_bdf>/driver/unbind [host]# echo 0x1af4 0x1041 > /sys/bus/pci/drivers/virtio_vfio_pci/new_id [host]# modprobe -v virtio_vfio_pci [host]# lspci -s <vf_bdf> -vvv | grep -i virtio_vfio_pci Kernel driver in use: virtio_vfio_pci Add the following option into the QEMU command line to passthrough the VF device into the VM: Copy Copied! -device vfio-pci,host=<vf_bdf>,id=hostdev0,bus=pci.<#BUS_IN_VM>,addr=<#FUNC_IN_VM> Load virtio-net driver as legacy mode inside the VM: Copy Copied! [vm]# modprobe -v virtio_pci force_legacy=1 [vm]# modprobe -v virtio_net [vm]# lspci -s <vf_bdf_in_vm> -n 00:0a.0 0200: 1af4:1000 Verify that the VF is a transitional device: Copy Copied! [dpu]# virtnet query -p <pf_id> -v <vf_id> | grep transitional "transitional": 1,

In virtio-net controller, each VF gets the same number of MSIX and virtqueues (VQs) so that each data VQ has a MSIX assigned. This means that changing the number of MSIX updates the number of VQs.

By default, each VF is assigned with the same number of MSIX, the default number is determined by the minimum of NUM_VF_MSIX and VIRTIO_NET_EMULATION_NUM_MSIX .

Using dynamic VF MSIX, a VF can be assigned with more MSIX/queues than its default. MSIX hardware resources of all VF devices are managed by PF via a shared MSIX pool. The user can reduce the MSIX of one VF, thus releasing its MSIX resources to the shared pool. On the other hand, another VF can be assigned with more MSIX than its default to gain more performance.

The emulation VF device uses VIRTIO_NET_EMULATION_NUM_VF_MSIX to set the MSIX number.

VIRTIO_NET_EMULATION_NUM_VF_MSIX is available to set the MSIX number of the emulation VF device. For the emulation VF device, uses the new configuration VIRTIO_NET_EMULATION_NUM_VF_MSIX instead of the old configuration NUM_VF_MSIX.

If VIRTIO_NET_EMULATION_NUM_VF_MSIX !=0, VIRTIO_NET_EMULATION_NUM_ MSIX is used for the PF only, and VF uses VIRTIO_NET_EMULATION_NUM_VF_MSIX . For example, to configure the default MSIX number for a VF to 32: Copy Copied! [dpu]# mlxconfig -y -d 03:00.0 s VIRTIO_NET_EMULATION_NUM_ MSIX=32 VIRTIO_NET_EMULATION_NUM_VF_MSIX=32

If VIRTIO_NET_EMULATION_NUM_VF_MSIX ==0, VIRTIO_NET_EMULATION_NUM_ MSIX is used for the PF and VF.

The default number of MSIX for each VF is determined by minimum(NUM_VF_MSIX, VIRTIO_NET_EMULATION_NUM_MSIX) . For example, to configure the default MSIX number for a VF to 32:

Copy Copied! [dpu]# mlxconfig -y -d 03:00.0 s VIRTIO_NET_EMULATION_NUM_MSIX=32 NUM_VF_MSIX=32

Power cycle the BlueField and host to have the mlxconfig taking effect.

The MSIX pool for VFs is managed by their PF. To check the share pool size, run the following command (using PF 0 as example):

Copy Copied! [dpu]# virtnet list | grep -i '"pf_id": 0' -A 8 | grep -i msix_num_pool_size

By default, the share pool size is empty (0), since all MSIX resources have already been allocated to VFs evenly. Upon reducing the MSIX of one or more VFs, the reduced MSIX is released back to the pool.

However, the number of MSIX can be assigned to a given VF is also bound by capability. To check those caps, run the following command:

Copy Copied! [dpu]# virtnet list | grep -i '"pf_id": 0' -A 10 | grep -i max_msix_num [dpu]# virtnet list | grep -i '"pf_id": 0' -A 10 | grep -i min_msix_num

To check the currently assigned number of MSIX, run the following command:

Copy Copied! [dpu]# virtnet query -p 0 -v 0 | grep num_msix

If num_msix is less than max_msix_num cap, more MSIX can be assigned to the VF.

To allocate more MSIX to one VF, there should be MSIX available from the pool. This is done by reducing the MSIX from another VF(s).

The following example shows the steps to reallocate MSIX from VF1 to VF0, assuming that each VF has 32 MSIX available as default:

Unbind both VF devices from host driver. Copy Copied! [host]# echo <vf0_bdf> > /sys/bus/pci/drivers/virtio-pci/unbind [host]# echo <vf1_bdf> > /sys/bus/pci/drivers/virtio-pci/unbind Reduce the MSIX of VF1. Copy Copied! [dpu]# virtnet modify -p 0 -v 1 device -n 4 Check pool size of PF0. Copy Copied! [dpu]# virtnet list | grep -i '"pf_id": 0' -A 8 | grep -i msix_num_pool_size Confirm the reduced MSIX are added to the share pool. Increase the MSIX of VF0. Copy Copied! [dpu]# virtnet modify -p 0 -v 0 device -n 48 Check the MSIX of VF0. Copy Copied! [dpu]# virtnet query -p 0 -v 0 | grep -i num_msix Bind both VF devices to host driver. Copy Copied! [host]# echo <vf0_bdf> > /sys/bus/pci/drivers/virtio-pci/bind [host]# echo <vf1_bdf> > /sys/bus/pci/drivers/virtio-pci/bind Note The number of MSIX must be an even number greater than 4.

MSIX and QP configuration is mutually exclusive (i.e., only one of them can be configured at a time). For example, the following modify command should result in failure: Copy Copied! [dpu]# virtnet modify -p 0 -v 1 device -qp 2 -n 6

To use a VF, make sure to assign a valid MSIX number: Copy Copied! [dpu]# virtnet modify -p 0 -v 1 device -n 10 The minimum number of MSIX resources required for the VF to load the host driver is 4 if VIRTIO_NET_F_CTRL_VQ is negotiated, or 2 if it is not.

The MSIX resources of a VF can be reduced to 0, but doing so prevents the VF from functioning. Copy Copied! [dpu]# virtnet modify -p 0 -v 1 device -n 0

Queue pairs (QPs) are the number of data virtio queue (VQ) pairs. Each VQ pair has one transmit (TX) queue and one receive (RX) queue. These pairs are dedicated to handling data traffic and do not include control or admin VQs.

The QP pool for VFs is managed by their PF.

To check the shared pool size, run the following command (using PF 0 as example):

Copy Copied! [dpu]# virtnet list | grep -i '"pf_id": 0' -A 13 | grep -i qp_pool_size

By default, the shared pool size is empty (0), since all QP resources have already been allocated to VFs evenly. Upon reducing the QP of one or more VFs, the reduced QP is released back into the pool.

However, the number of QPs assignable to a VF depends on its supported capabilities. To verify these capabilities, run the following command:

Copy Copied! [dpu]# virtnet list | grep -i '"pf_id": 0' -A 12 | grep -i max_num_of_qp [dpu]# virtnet list | grep -i '"pf_id": 0' -A 12 | grep -i min_num_of_qp

To check the currently assigned number of QPs, run the following command:

Copy Copied! [dpu]# virtnet query -p 0 -v 0 | grep max_queue_pairs

If max_queue_pairs is less than max_num_of_qp cap, then more QPs can be assigned to the VF.

To allocate more QPs to one VF, there should be QPs available from the pool as explained in the previous section.

The following example illustrates the process of reallocating a QP from VF1 to VF0, assuming that each VF initially has 32 QPs available by default:

Unbind both VF devices from the host driver: Copy Copied! [host]# echo <vf0_bdf> > /sys/bus/pci/drivers/virtio-pci/unbind [host]# echo <vf1_bdf> > /sys/bus/pci/drivers/virtio-pci/unbind Reduce the number of QPs VF1 has: Copy Copied! [dpu]# virtnet modify -p 0 -v 1 device -qp 1 Check the pool size of PF0 and confirm that the reduced number of QPs are added to the shared pool: Copy Copied! [dpu]# virtnet list | grep -i '"pf_id": 0' -A 13 | grep -i qp_pool_size Increase the number of QPs VF0 has: Copy Copied! [dpu]# virtnet modify -p 0 -v 0 device -qp 23 Check the number of QPs VF0 has: Copy Copied! [dpu]# virtnet query -p 0 -v 0 | grep -i max_queue_pairs Bind both VF devices to the host driver: Copy Copied! [host]# echo <vf0_bdf> > /sys/bus/pci/drivers/virtio-pci/bind [host]# echo <vf1_bdf> > /sys/bus/pci/drivers/virtio-pci/bind Note The number of QPs must be greater than 0.

QP and MSIX configuration is mutually exclusive (i.e., only one of them can be configured at a time). For example, the following modify command should result in failure: Copy Copied! [dpu]# virtnet modify -p 0 -v 1 device -qp 2 -n 6

To use a VF, assign it with a valid QP number: Copy Copied! [dpu]# virtnet modify -p 0 -v 1 device -n 4 The minimum number of QP resources which allows the VF to load the host driver is 1.

The QP resources of a VF can be reduced to 0. However, the VF would not be functional in this case. Copy Copied! [dpu]# virtnet modify -p 0 -v 1 device -qp 0

Virt queues (VQs) are the mechanism for bulk data transport on virtio devices. Each device can have zero or more VQs.

VQs can be in one of the following modes:

Split

Packed

Warning When changing the supported VQ types, make sure to unload the guest driver first so the device can modify the supported feature bits.

Currently the default VQ type. Split VQ format is the only format supported by version 1.0 of the virtio spec.

In split VQ mode, each VQ is separated into three parts:

Descriptor table – occupies the descriptor area

Available ring – occupies the driver area

Used ring – occupies the device area

Each of these parts is physically-contiguous in guest memory. Split VQ has a very simple design, but its sparse memory usage puts pressure on CPU cache utilization and requires several PCIe transactions for each descriptor.

The following shows how the output of the virtnet list command appears only when split VQ mode is enabled:

Copy Copied! "supported_virt_queue_types": { "value": "0x1", " 0": "SPLIT" },

Packed VQ addresses the limitations of split VQ by merging the three rings in one location in virtual environment guest memory. This mode allows for fewer PCIe transactions and better CPU cache utilization per each descriptor access.

Info Packed VQ is supported from kernel 5.0 with the virtio-support-packed-ring commit from the guest OS.

Packed VQ mode can be enabled by defining packed_vq in the configuration file at the following path /opt/mellanox/mlnx_virtnet/virtnet.conf .

The following is an example of the packed_vq enabled in the configuration file:

Copy Copied! { "single_port": 1, "packed_vq": 1, "sf_pool_percent": 0, "sf_pool_force_destroy": 0, "vf": { "mac_base": "CC:48:15:FF:00:00", "vfs_per_pf": 126 } }

The controller must be restarted after the configuration file is modified for the changes to take effect. Make sure to unload virtio-net/virtio-pcie drivers on the host and run:

Copy Copied! [dpu]# systemctl restart virtio-net-controller.service

To check if the configuration has taken effect and controller supported packed VQ mode, run:

Copy Copied! [dpu]# virtnet list

Check for PACKED in supported_virt_queue_types :

Copy Copied! "supported_virt_queue_types": { "value": "0x3", " 0": "SPLIT", " 1": "PACKED" },

Virtio-net/virtio-pci drivers can be loaded at this point to create VQs in packed mode. Once the driver is loaded to verify that the device has packed VQ mode enabled, run the following command:

Copy Copied! [dpu]# virtnet query -p <PFID> -v <VFID>

Check for VIRTNET_F_RING_PACKED in the driver features:

Copy Copied! "driver_feature": { "value": "0x8930012700e7182f", " 0": "VIRTIO_NET_F_CSUM", " 1": "VIRTIO_NET_F_GUEST_CSUM", " 2": "VIRTIO_NET_F_CTRL_GUEST_OFFLOADS", " 3": "VIRTIO_NET_F_MTU", " 5": "VIRTIO_NET_F_MAC", " 11": "VIRTIO_NET_F_HOST_TSO4", " 12": "VIRTIO_NET_F_HOST_TSO6", " 16": "VIRTIO_NET_F_STATUS", " 17": "VIRTIO_NET_F_CTRL_VQ", " 18": "VIRTIO_NET_F_CTRL_RX", " 21": "VIRTIO_NET_F_GUEST_ANNOUNCE", " 22": "VIRTIO_NET_F_MQ", " 23": "VIRTIO_NET_F_CTRL_MAC_ADDR", " 32": "VIRTIO_F_VERSION_1", " 33": "VIRTIO_F_IOMMU_PLATFORM", " 34": "VIRTIO_F_RING_PACKED", " 37": "VIRTIO_F_SR_IOV", " 40": "VIRTIO_F_RING_RESET", " 52": "VIRTIO_NET_F_VQ_NOTF_COAL", " 53": "VIRTIO_NET_F_NOTF_COAL", " 56": "VIRTIO_NET_F_HOST_USO", " 59": "VIRTIO_NET_F_GUEST_HDRLEN", " 63": "VIRTIO_NET_F_SPEED_DUPLEX" },

If there are VFs mapped to multiple VMs then it is possible to have some devices create VQs in packed mode and some in split mode depending on the OS version and whether the driver has the feature supported.

The following features are not currently supported when packed VQ is enabled:

Mergeable buffer

Jumbo MTU

UDP segmentation offload and RSS hash report

Per virtio spec, virtio the device negotiates with the virtio driver on the supported features when the driver probes the device. The final negotiated features are a subset of the features supported by the device.

From the controller's perspective, all feature bits can be supported by a device are populated by virtnet list . Each individual virtio-net device is able to choose the feature bits supported by itself.

The following is a list of the feature bits currently supported by controller:

VIRTIO_NET_F_CSUM

VIRTIO_NET_F_GUEST_CSUM

VIRTIO_NET_F_CTRL_GUEST_OFFLOADS

VIRTIO_NET_F_MTU

VIRTIO_NET_F_MAC

VIRTIO_NET_F_HOST_TSO4

VIRTIO_NET_F_HOST_TSO6

VIRTIO_NET_F_MRG_RXBUF

VIRTIO_NET_F_STATUS

VIRTIO_NET_F_CTRL_VQ

VIRTIO_NET_F_CTRL_RX

VIRTIO_NET_F_CTRL_VLAN

VIRTIO_NET_F_GUEST_ANNOUNCE

VIRTIO_NET_F_MQ

VIRTIO_NET_F_CTRL_MAC_ADDR

VIRTIO_F_VERSION_1

VIRTIO_F_IOMMU_PLATFORM

VIRTIO_F_RING_PACKED

VIRTIO_F_ORDER_PLATFORM

VIRTIO_F_SR_IOV

VIRTIO_F_NOTIFICATION_DATA

VIRTIO_F_RING_RESET

VIRTIO_F_ADMIN_VQ

VIRTIO_NET_F_HOST_USO

VIRTIO_NET_F_HASH_REPORT

VIRTIO_NET_F_GUEST_HDRLEN

VIRTIO_NET_F_SPEED_DUPLEX

Info For more information on these bits, refer to the VIRTIO Version 1.2 Specifications.



