Appendix#
ATS and ACS configuration#
To achieve maximum GPUDirect RDMA performance in a VM, PCIe Address Translation Services (ATS) must be enabled on the NIC, and PCIe Access Control Services (ACS) settings on PCIe switches and root ports must be configured to allow ATS to work.
Enabling ATS on NVIDIA NICs#
To enable ATS, first make sure the Mellanox Support Tool Kit (MST) is running with sudo mst start
, then use the following commands, one per NIC, to check and/or set the ATS status:
To check the current state:
1# mlxconfig -d /dev/mst/mt4129_pciconf0 q | grep ATS_ENABLED 2 ATS_ENABLED False(0)
In the example above, the first NIC (pciconf0) has ATS disabled, which is the default setting. Each NIC should be checked individually.
To enable it on a single NIC, use the following command:
# mlxconfig -d /dev/mst/mt4123_pciconf0 set ATS_ENABLED=true
After running the command above for all NICs, a host reboot is necessary. Keep in mind that if the NIC is already in passthrough mode, rebooting the VM alone will not suffice to complete this change; the host must be rebooted in order to fully reinitialise the networking card.
When ATS is enabled on the NIC, the Address Translation Service capability should show Enable``+ in ``lspci
output:
11a:00.0 Infiniband controller: Mellanox Technologies MT2910 Family [ConnectX-7] 2 Subsystem: Mellanox Technologies MT2910 Family [ConnectX-7] 3... 4 Capabilities: [480 v1] Address Translation Service (ATS) 5 ATSCap: Invalidate Queue Depth: 00 6 ATSCtl: Enable+, Smallest Translation Unit: 00 7 Kernel driver in use: mlx5_core 8 Kernel modules: mlx5_core
Alternately, check the status on all NVIDIA NICs in the system using the following command:
sudo lspci -d 15b3: -vvv | grep ATSCtl1user@hgx1:~$ sudo lspci -d 15b3: -vvv | grep ATSCtl 2 ATSCtl: Enable+, Smallest Translation Unit: 00 3 ATSCtl: Enable+, Smallest Translation Unit: 00 4 ATSCtl: Enable+, Smallest Translation Unit: 00 5 ATSCtl: Enable+, Smallest Translation Unit: 00 6 ATSCtl: Enable+, Smallest Translation Unit: 00 7 ATSCtl: Enable+, Smallest Translation Unit: 00 8 ATSCtl: Enable+, Smallest Translation Unit: 00 9 ATSCtl: Enable+, Smallest Translation Unit: 00
Configuring ACS#
The following ACS settings should be configured on PCIe root ports and switch downstream ports to allow use of ATS by NICs:
SrcValid (Source Validation) - enabled
TransBlk (Translation Blocking) - disabled
ReqRedir (P2P Request Redirection) - enabled
CmpltRedir (P2P Completion Redirection) - enabled
UpstreamFwd (Upstream Forwarding) - enabled
EgressCtrl (Egress Control) - disabled
DirectTrans (Direct Translated P2P) - enabled
These settings are exposed in the ACS Capability:
sudo lspci -vvv | less1... 217:00.0 PCI bridge: Broadcom / LSI PEX890xx PCIe Gen 5 Switch (rev b0) (prog-if 00 [Normal decode]) 3... 4 Capabilities: [170 v1] Access Control Services 5 ACSCap: SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+ UpstreamFwd+ EgressCtrl- DirectTrans+ 6 ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans- 7 Capabilities: [1f0 v1] Advanced Error Reporting 8...
These ACS settings are applied on the virtualization host, not in guest VMs. The following script should be run each time the system is powered on or rebooted, as these settings are not retained across resets / reboots:
1#!/bin/bash 2# 3# Copyright (c) 2020, NVIDIA Corporation. 4# 5# Permission is hereby granted, free of charge, to any person obtaining a copy of this 6# software and associated documentation files (the “Software”), to deal in the Software 7# without restriction, including without limitation the rights to use, copy, modify, 8# merge, publish, distribute, sublicense, and/or sell copies of the Software, and to 9# permit persons to whom the Software is furnished to do so, subject to the following 10# conditions: 11# 12# The above copyright notice and this permission notice shall be included in all 13# copies or substantial portions of the Software. 14# 15# THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, 16# INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A 17# PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT 18# HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION 19# OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE 20# SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 21# 22# 23# Enable P2P specific ACS bits on every device that supports it 24 25PLATFORM=$(dmidecode --string system-product-name) 26logger "PLATFORM=${PLATFORM}" 27 28# must be root to access extended PCI config space 29if [ "$EUID" -ne 0 ]; then 30echo "ERROR: $0 must be run as root" 31exit 1 32fi 33 34for BDF in `lspci -d "*:*:*" | awk '{print $1}'`; do 35 36 # skip if it doesn't support ACS 37 setpci -v -s ${BDF} ECAP_ACS+0x6.w > /dev/null 2>&1 38 if [ $? -ne 0 ]; then 39 # echo "${BDF} does not support ACS, skipping" 40 continue 41 fi 42 43 logger "Enabling ACS on `lspci -s ${BDF}`"q 44 setpci -v -s ${BDF} ECAP_ACS+0x6.w 45 setpci -v -s ${BDF} ECAP_ACS+0x6.w=0x5D 46 setpci -v -s ${BDF} ECAP_ACS+0x6.w 47 if [ $? -ne 0 ]; then 48 logger "Error enabling ACS on ${BDF}" 49 continue 50 fi 51 NEW_VAL=`setpci -v -s ${BDF} ECAP_ACS+0x6.w | awk '{print $NF}'` 52 if [ "${NEW_VAL}" != "0x5D" ]; then 53 logger "Failed to Enable ACS on ${BDF}" 54 continue 55 fi 56done 57exit 0
Example physical topology map for HGX H200 8-GPU platform#
Output from lstopo -sv
command run on the vGPU host:
1Machine (2015GB total) 2Package L#0 3 NUMANode L#0 (P#0 1007GB) 4 L3 L#0 (105MB) 5 L2 L#0 (2048KB) + L1d L#0 (48KB) + L1i L#0 (32KB) + Core L#0 + PU L#0 (P#0) 6 L2 L#1 (2048KB) + L1d L#1 (48KB) + L1i L#1 (32KB) + Core L#1 + PU L#1 (P#2) 7 L2 L#2 (2048KB) + L1d L#2 (48KB) + L1i L#2 (32KB) + Core L#2 + PU L#2 (P#4) 8 L2 L#3 (2048KB) + L1d L#3 (48KB) + L1i L#3 (32KB) + Core L#3 + PU L#3 (P#6) 9 L2 L#4 (2048KB) + L1d L#4 (48KB) + L1i L#4 (32KB) + Core L#4 + PU L#4 (P#8) 10 L2 L#5 (2048KB) + L1d L#5 (48KB) + L1i L#5 (32KB) + Core L#5 + PU L#5 (P#10) 11 L2 L#6 (2048KB) + L1d L#6 (48KB) + L1i L#6 (32KB) + Core L#6 + PU L#6 (P#12) 12 L2 L#7 (2048KB) + L1d L#7 (48KB) + L1i L#7 (32KB) + Core L#7 + PU L#7 (P#14) 13 L2 L#8 (2048KB) + L1d L#8 (48KB) + L1i L#8 (32KB) + Core L#8 + PU L#8 (P#16) 14 L2 L#9 (2048KB) + L1d L#9 (48KB) + L1i L#9 (32KB) + Core L#9 + PU L#9 (P#18) 15 L2 L#10 (2048KB) + L1d L#10 (48KB) + L1i L#10 (32KB) + Core L#10 + PU L#10 (P#20) 16 L2 L#11 (2048KB) + L1d L#11 (48KB) + L1i L#11 (32KB) + Core L#11 + PU L#11 (P#22) 17 L2 L#12 (2048KB) + L1d L#12 (48KB) + L1i L#12 (32KB) + Core L#12 + PU L#12 (P#24) 18 L2 L#13 (2048KB) + L1d L#13 (48KB) + L1i L#13 (32KB) + Core L#13 + PU L#13 (P#26) 19 L2 L#14 (2048KB) + L1d L#14 (48KB) + L1i L#14 (32KB) + Core L#14 + PU L#14 (P#28) 20 L2 L#15 (2048KB) + L1d L#15 (48KB) + L1i L#15 (32KB) + Core L#15 + PU L#15 (P#30) 21 L2 L#16 (2048KB) + L1d L#16 (48KB) + L1i L#16 (32KB) + Core L#16 + PU L#16 (P#32) 22 L2 L#17 (2048KB) + L1d L#17 (48KB) + L1i L#17 (32KB) + Core L#17 + PU L#17 (P#34) 23 L2 L#18 (2048KB) + L1d L#18 (48KB) + L1i L#18 (32KB) + Core L#18 + PU L#18 (P#36) 24 L2 L#19 (2048KB) + L1d L#19 (48KB) + L1i L#19 (32KB) + Core L#19 + PU L#19 (P#38) 25 L2 L#20 (2048KB) + L1d L#20 (48KB) + L1i L#20 (32KB) + Core L#20 + PU L#20 (P#40) 26 L2 L#21 (2048KB) + L1d L#21 (48KB) + L1i L#21 (32KB) + Core L#21 + PU L#21 (P#42) 27 L2 L#22 (2048KB) + L1d L#22 (48KB) + L1i L#22 (32KB) + Core L#22 + PU L#22 (P#44) 28 L2 L#23 (2048KB) + L1d L#23 (48KB) + L1i L#23 (32KB) + Core L#23 + PU L#23 (P#46) 29 L2 L#24 (2048KB) + L1d L#24 (48KB) + L1i L#24 (32KB) + Core L#24 + PU L#24 (P#48) 30 L2 L#25 (2048KB) + L1d L#25 (48KB) + L1i L#25 (32KB) + Core L#25 + PU L#25 (P#50) 31 L2 L#26 (2048KB) + L1d L#26 (48KB) + L1i L#26 (32KB) + Core L#26 + PU L#26 (P#52) 32 L2 L#27 (2048KB) + L1d L#27 (48KB) + L1i L#27 (32KB) + Core L#27 + PU L#27 (P#54) 33 L2 L#28 (2048KB) + L1d L#28 (48KB) + L1i L#28 (32KB) + Core L#28 + PU L#28 (P#56) 34 L2 L#29 (2048KB) + L1d L#29 (48KB) + L1i L#29 (32KB) + Core L#29 + PU L#29 (P#58) 35 L2 L#30 (2048KB) + L1d L#30 (48KB) + L1i L#30 (32KB) + Core L#30 + PU L#30 (P#60) 36 L2 L#31 (2048KB) + L1d L#31 (48KB) + L1i L#31 (32KB) + Core L#31 + PU L#31 (P#62) 37 L2 L#32 (2048KB) + L1d L#32 (48KB) + L1i L#32 (32KB) + Core L#32 + PU L#32 (P#64) 38 L2 L#33 (2048KB) + L1d L#33 (48KB) + L1i L#33 (32KB) + Core L#33 + PU L#33 (P#66) 39 L2 L#34 (2048KB) + L1d L#34 (48KB) + L1i L#34 (32KB) + Core L#34 + PU L#34 (P#68) 40 L2 L#35 (2048KB) + L1d L#35 (48KB) + L1i L#35 (32KB) + Core L#35 + PU L#35 (P#70) 41 L2 L#36 (2048KB) + L1d L#36 (48KB) + L1i L#36 (32KB) + Core L#36 + PU L#36 (P#72) 42 L2 L#37 (2048KB) + L1d L#37 (48KB) + L1i L#37 (32KB) + Core L#37 + PU L#37 (P#74) 43 L2 L#38 (2048KB) + L1d L#38 (48KB) + L1i L#38 (32KB) + Core L#38 + PU L#38 (P#76) 44 L2 L#39 (2048KB) + L1d L#39 (48KB) + L1i L#39 (32KB) + Core L#39 + PU L#39 (P#78) 45 L2 L#40 (2048KB) + L1d L#40 (48KB) + L1i L#40 (32KB) + Core L#40 + PU L#40 (P#80) 46 L2 L#41 (2048KB) + L1d L#41 (48KB) + L1i L#41 (32KB) + Core L#41 + PU L#41 (P#82) 47 L2 L#42 (2048KB) + L1d L#42 (48KB) + L1i L#42 (32KB) + Core L#42 + PU L#42 (P#84) 48 L2 L#43 (2048KB) + L1d L#43 (48KB) + L1i L#43 (32KB) + Core L#43 + PU L#43 (P#86) 49 L2 L#44 (2048KB) + L1d L#44 (48KB) + L1i L#44 (32KB) + Core L#44 + PU L#44 (P#88) 50 L2 L#45 (2048KB) + L1d L#45 (48KB) + L1i L#45 (32KB) + Core L#45 + PU L#45 (P#90) 51 L2 L#46 (2048KB) + L1d L#46 (48KB) + L1i L#46 (32KB) + Core L#46 + PU L#46 (P#92) 52 L2 L#47 (2048KB) + L1d L#47 (48KB) + L1i L#47 (32KB) + Core L#47 + PU L#47 (P#94) 53 HostBridge 54 PCIBridge 55 PCI 01:00.0 (NVMExp) 56 Block(Disk) "nvme0c0n1" 57 PCIBridge 58 PCI 02:00.0 (Ethernet) 59 Net "eno8303" 60 PCI 02:00.1 (Ethernet) 61 Net "eno8403" 62 PCIBridge 63 PCIBridge 64 PCI 04:00.0 (VGA) 65 PCI 00:18.0 (SATA) 66 PCI 00:19.0 (SATA) 67 HostBridge 68 PCIBridge 69 PCIBridge 70 PCIBridge 71 PCI 18:00.0 (NVMExp) 72 Block(Disk) "nvme1n1" 73 PCIBridge 74 PCI 19:00.0 (3D) 75 PCIBridge 76 PCI 1a:00.0 (InfiniBand) 77 OpenFabrics "mlx5_0" 78 PCIBridge 79 PCI 1c:00.0 (SAS) 80 HostBridge 81 PCIBridge 82 PCIBridge 83 PCIBridge 84 PCI 3a:00.0 (NVMExp) 85 Block(Disk) "nvme2n1" 86 PCIBridge 87 PCI 3b:00.0 (3D) 88 PCIBridge 89 PCI 3c:00.0 (InfiniBand) 90 OpenFabrics "mlx5_1" 91 PCIBridge 92 PCI 3d:00.0 (SAS) 93 HostBridge 94 PCIBridge 95 PCIBridge 96 PCIBridge 97 PCI 4c:00.0 (3D) 98 PCIBridge 99 PCI 4d:00.0 (InfiniBand) 100 OpenFabrics "mlx5_2" 101 PCIBridge 102 PCI 4e:00.0 (SAS) 103 HostBridge 104 PCIBridge 105 PCIBridge 106 PCIBridge 107 PCI 5d:00.0 (3D) 108 PCIBridge 109 PCI 5e:00.0 (InfiniBand) 110 OpenFabrics "mlx5_3" 111 PCIBridge 112 PCI 5f:00.0 (SAS) 113Package L#1 114 NUMANode L#1 (P#1 1008GB) 115 L3 L#1 (105MB) 116 L2 L#48 (2048KB) + L1d L#48 (48KB) + L1i L#48 (32KB) + Core L#48 + PU L#48 (P#1) 117 L2 L#49 (2048KB) + L1d L#49 (48KB) + L1i L#49 (32KB) + Core L#49 + PU L#49 (P#3) 118 L2 L#50 (2048KB) + L1d L#50 (48KB) + L1i L#50 (32KB) + Core L#50 + PU L#50 (P#5) 119 L2 L#51 (2048KB) + L1d L#51 (48KB) + L1i L#51 (32KB) + Core L#51 + PU L#51 (P#7) 120 L2 L#52 (2048KB) + L1d L#52 (48KB) + L1i L#52 (32KB) + Core L#52 + PU L#52 (P#9) 121 L2 L#53 (2048KB) + L1d L#53 (48KB) + L1i L#53 (32KB) + Core L#53 + PU L#53 (P#11) 122 L2 L#54 (2048KB) + L1d L#54 (48KB) + L1i L#54 (32KB) + Core L#54 + PU L#54 (P#13) 123 L2 L#55 (2048KB) + L1d L#55 (48KB) + L1i L#55 (32KB) + Core L#55 + PU L#55 (P#15) 124 L2 L#56 (2048KB) + L1d L#56 (48KB) + L1i L#56 (32KB) + Core L#56 + PU L#56 (P#17) 125 L2 L#57 (2048KB) + L1d L#57 (48KB) + L1i L#57 (32KB) + Core L#57 + PU L#57 (P#19) 126 L2 L#58 (2048KB) + L1d L#58 (48KB) + L1i L#58 (32KB) + Core L#58 + PU L#58 (P#21) 127 L2 L#59 (2048KB) + L1d L#59 (48KB) + L1i L#59 (32KB) + Core L#59 + PU L#59 (P#23) 128 L2 L#60 (2048KB) + L1d L#60 (48KB) + L1i L#60 (32KB) + Core L#60 + PU L#60 (P#25) 129 L2 L#61 (2048KB) + L1d L#61 (48KB) + L1i L#61 (32KB) + Core L#61 + PU L#61 (P#27) 130 L2 L#62 (2048KB) + L1d L#62 (48KB) + L1i L#62 (32KB) + Core L#62 + PU L#62 (P#29) 131 L2 L#63 (2048KB) + L1d L#63 (48KB) + L1i L#63 (32KB) + Core L#63 + PU L#63 (P#31) 132 L2 L#64 (2048KB) + L1d L#64 (48KB) + L1i L#64 (32KB) + Core L#64 + PU L#64 (P#33) 133 L2 L#65 (2048KB) + L1d L#65 (48KB) + L1i L#65 (32KB) + Core L#65 + PU L#65 (P#35) 134 L2 L#66 (2048KB) + L1d L#66 (48KB) + L1i L#66 (32KB) + Core L#66 + PU L#66 (P#37) 135 L2 L#67 (2048KB) + L1d L#67 (48KB) + L1i L#67 (32KB) + Core L#67 + PU L#67 (P#39) 136 L2 L#68 (2048KB) + L1d L#68 (48KB) + L1i L#68 (32KB) + Core L#68 + PU L#68 (P#41) 137 L2 L#69 (2048KB) + L1d L#69 (48KB) + L1i L#69 (32KB) + Core L#69 + PU L#69 (P#43) 138 L2 L#70 (2048KB) + L1d L#70 (48KB) + L1i L#70 (32KB) + Core L#70 + PU L#70 (P#45) 139 L2 L#71 (2048KB) + L1d L#71 (48KB) + L1i L#71 (32KB) + Core L#71 + PU L#71 (P#47) 140 L2 L#72 (2048KB) + L1d L#72 (48KB) + L1i L#72 (32KB) + Core L#72 + PU L#72 (P#49) 141 L2 L#73 (2048KB) + L1d L#73 (48KB) + L1i L#73 (32KB) + Core L#73 + PU L#73 (P#51) 142 L2 L#74 (2048KB) + L1d L#74 (48KB) + L1i L#74 (32KB) + Core L#74 + PU L#74 (P#53) 143 L2 L#75 (2048KB) + L1d L#75 (48KB) + L1i L#75 (32KB) + Core L#75 + PU L#75 (P#55) 144 L2 L#76 (2048KB) + L1d L#76 (48KB) + L1i L#76 (32KB) + Core L#76 + PU L#76 (P#57) 145 L2 L#77 (2048KB) + L1d L#77 (48KB) + L1i L#77 (32KB) + Core L#77 + PU L#77 (P#59) 146 L2 L#78 (2048KB) + L1d L#78 (48KB) + L1i L#78 (32KB) + Core L#78 + PU L#78 (P#61) 147 L2 L#79 (2048KB) + L1d L#79 (48KB) + L1i L#79 (32KB) + Core L#79 + PU L#79 (P#63) 148 L2 L#80 (2048KB) + L1d L#80 (48KB) + L1i L#80 (32KB) + Core L#80 + PU L#80 (P#65) 149 L2 L#81 (2048KB) + L1d L#81 (48KB) + L1i L#81 (32KB) + Core L#81 + PU L#81 (P#67) 150 L2 L#82 (2048KB) + L1d L#82 (48KB) + L1i L#82 (32KB) + Core L#82 + PU L#82 (P#69) 151 L2 L#83 (2048KB) + L1d L#83 (48KB) + L1i L#83 (32KB) + Core L#83 + PU L#83 (P#71) 152 L2 L#84 (2048KB) + L1d L#84 (48KB) + L1i L#84 (32KB) + Core L#84 + PU L#84 (P#73) 153 L2 L#85 (2048KB) + L1d L#85 (48KB) + L1i L#85 (32KB) + Core L#85 + PU L#85 (P#75) 154 L2 L#86 (2048KB) + L1d L#86 (48KB) + L1i L#86 (32KB) + Core L#86 + PU L#86 (P#77) 155 L2 L#87 (2048KB) + L1d L#87 (48KB) + L1i L#87 (32KB) + Core L#87 + PU L#87 (P#79) 156 L2 L#88 (2048KB) + L1d L#88 (48KB) + L1i L#88 (32KB) + Core L#88 + PU L#88 (P#81) 157 L2 L#89 (2048KB) + L1d L#89 (48KB) + L1i L#89 (32KB) + Core L#89 + PU L#89 (P#83) 158 L2 L#90 (2048KB) + L1d L#90 (48KB) + L1i L#90 (32KB) + Core L#90 + PU L#90 (P#85) 159 L2 L#91 (2048KB) + L1d L#91 (48KB) + L1i L#91 (32KB) + Core L#91 + PU L#91 (P#87) 160 L2 L#92 (2048KB) + L1d L#92 (48KB) + L1i L#92 (32KB) + Core L#92 + PU L#92 (P#89) 161 L2 L#93 (2048KB) + L1d L#93 (48KB) + L1i L#93 (32KB) + Core L#93 + PU L#93 (P#91) 162 L2 L#94 (2048KB) + L1d L#94 (48KB) + L1i L#94 (32KB) + Core L#94 + PU L#94 (P#93) 163 L2 L#95 (2048KB) + L1d L#95 (48KB) + L1i L#95 (32KB) + Core L#95 + PU L#95 (P#95) 164 HostBridge 165 PCIBridge 166 PCIBridge 167 PCIBridge 168 PCI 9b:00.0 (3D) 169 PCIBridge 170 PCI 9c:00.0 (InfiniBand) 171 OpenFabrics "mlx5_4" 172 PCIBridge 173 PCI 9e:00.0 (SAS) 174 HostBridge 175 PCIBridge 176 PCIBridge 177 PCIBridge 178 PCI bb:00.0 (3D) 179 PCIBridge 180 PCI bc:00.0 (InfiniBand) 181 OpenFabrics "mlx5_5" 182 PCIBridge 183 PCI bd:00.0 (SAS) 184 HostBridge 185 PCIBridge 186 PCIBridge 187 PCIBridge 188 PCI cb:00.0 (3D) 189 PCIBridge 190 PCI cc:00.0 (InfiniBand) 191 OpenFabrics "mlx5_6" 192 PCIBridge 193 PCI cd:00.0 (SAS) 194 HostBridge 195 PCIBridge 196 PCIBridge 197 PCIBridge 198 PCI db:00.0 (3D) 199 PCIBridge 200 PCI dc:00.0 (InfiniBand) 201 OpenFabrics "mlx5_7" 202 PCIBridge 203 PCI dd:00.0 (SAS)
Example libvirt domain XML for HGX H200 8-GPU platform#
Reference: libvirt: Domain XML format
This sample XML created on Ubuntu 22.04 is for the VM configured on the following physical topology, with the PCIe devices highlighted in green boxes passed through to the VM.

The VM is configured with 88 vCPUs arranged in two sockets, each with 44 cores. Each socket implements a single NUMA node with approximately 1TB of RAM. The virtual PCIe hierarchy is equivalent to the physical topology, with 4 PCIe switches on CPU socket 0 and 5 PCIe switches on CPU socket 1.
1<domain type='kvm' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'> 2<name>ubuntu-vm-numa0</name> 3<uuid>ea260ed6-8344-49fc-952c-ec6c3de1364e</uuid> 4<metadata> 5 <libosinfo:libosinfo xmlns:libosinfo="http://libosinfo.org/xmlns/libvirt/domain/1.0"> 6 <libosinfo:os id="http://ubuntu.com/ubuntu/22.04"/> 7 </libosinfo:libosinfo> 8</metadata> 9<memory unit='KiB'>1887436800</memory> 10<currentMemory unit='KiB'>1887436800</currentMemory> 11<vcpu placement='static'>88</vcpu> 12<cputune> 13 <vcpupin vcpu='0' cpuset='4'/> 14 <vcpupin vcpu='1' cpuset='6'/> 15 <vcpupin vcpu='2' cpuset='8'/> 16 <vcpupin vcpu='3' cpuset='10'/> 17 <vcpupin vcpu='4' cpuset='12'/> 18 <vcpupin vcpu='5' cpuset='14'/> 19 <vcpupin vcpu='6' cpuset='16'/> 20 <vcpupin vcpu='7' cpuset='18'/> 21 <vcpupin vcpu='8' cpuset='20'/> 22 <vcpupin vcpu='9' cpuset='22'/> 23 <vcpupin vcpu='10' cpuset='24'/> 24 <vcpupin vcpu='11' cpuset='26'/> 25 <vcpupin vcpu='12' cpuset='28'/> 26 <vcpupin vcpu='13' cpuset='30'/> 27 <vcpupin vcpu='14' cpuset='32'/> 28 <vcpupin vcpu='15' cpuset='34'/> 29 <vcpupin vcpu='16' cpuset='36'/> 30 <vcpupin vcpu='17' cpuset='38'/> 31 <vcpupin vcpu='18' cpuset='40'/> 32 <vcpupin vcpu='19' cpuset='42'/> 33 <vcpupin vcpu='20' cpuset='44'/> 34 <vcpupin vcpu='21' cpuset='46'/> 35 <vcpupin vcpu='22' cpuset='48'/> 36 <vcpupin vcpu='23' cpuset='50'/> 37 <vcpupin vcpu='24' cpuset='52'/> 38 <vcpupin vcpu='25' cpuset='54'/> 39 <vcpupin vcpu='26' cpuset='56'/> 40 <vcpupin vcpu='27' cpuset='58'/> 41 <vcpupin vcpu='28' cpuset='60'/> 42 <vcpupin vcpu='29' cpuset='62'/> 43 <vcpupin vcpu='30' cpuset='64'/> 44 <vcpupin vcpu='31' cpuset='66'/> 45 <vcpupin vcpu='32' cpuset='68'/> 46 <vcpupin vcpu='33' cpuset='70'/> 47 <vcpupin vcpu='34' cpuset='72'/> 48 <vcpupin vcpu='35' cpuset='74'/> 49 <vcpupin vcpu='36' cpuset='76'/> 50 <vcpupin vcpu='37' cpuset='78'/> 51 <vcpupin vcpu='38' cpuset='80'/> 52 <vcpupin vcpu='39' cpuset='82'/> 53 <vcpupin vcpu='40' cpuset='84'/> 54 <vcpupin vcpu='41' cpuset='86'/> 55 <vcpupin vcpu='42' cpuset='88'/> 56 <vcpupin vcpu='43' cpuset='90'/> 57 <vcpupin vcpu='44' cpuset='5'/> 58 <vcpupin vcpu='45' cpuset='7'/> 59 <vcpupin vcpu='46' cpuset='9'/> 60 <vcpupin vcpu='47' cpuset='11'/> 61 <vcpupin vcpu='48' cpuset='13'/> 62 <vcpupin vcpu='49' cpuset='15'/> 63 <vcpupin vcpu='50' cpuset='17'/> 64 <vcpupin vcpu='51' cpuset='19'/> 65 <vcpupin vcpu='52' cpuset='21'/> 66 <vcpupin vcpu='53' cpuset='23'/> 67 <vcpupin vcpu='54' cpuset='25'/> 68 <vcpupin vcpu='55' cpuset='27'/> 69 <vcpupin vcpu='56' cpuset='29'/> 70 <vcpupin vcpu='57' cpuset='31'/> 71 <vcpupin vcpu='58' cpuset='33'/> 72 <vcpupin vcpu='59' cpuset='35'/> 73 <vcpupin vcpu='60' cpuset='37'/> 74 <vcpupin vcpu='61' cpuset='39'/> 75 <vcpupin vcpu='62' cpuset='41'/> 76 <vcpupin vcpu='63' cpuset='43'/> 77 <vcpupin vcpu='64' cpuset='45'/> 78 <vcpupin vcpu='65' cpuset='47'/> 79 <vcpupin vcpu='66' cpuset='49'/> 80 <vcpupin vcpu='67' cpuset='51'/> 81 <vcpupin vcpu='68' cpuset='53'/> 82 <vcpupin vcpu='69' cpuset='55'/> 83 <vcpupin vcpu='70' cpuset='57'/> 84 <vcpupin vcpu='71' cpuset='59'/> 85 <vcpupin vcpu='72' cpuset='61'/> 86 <vcpupin vcpu='73' cpuset='63'/> 87 <vcpupin vcpu='74' cpuset='65'/> 88 <vcpupin vcpu='75' cpuset='67'/> 89 <vcpupin vcpu='76' cpuset='69'/> 90 <vcpupin vcpu='77' cpuset='71'/> 91 <vcpupin vcpu='78' cpuset='73'/> 92 <vcpupin vcpu='79' cpuset='75'/> 93 <vcpupin vcpu='80' cpuset='77'/> 94 <vcpupin vcpu='81' cpuset='79'/> 95 <vcpupin vcpu='82' cpuset='81'/> 96 <vcpupin vcpu='83' cpuset='83'/> 97 <vcpupin vcpu='84' cpuset='85'/> 98 <vcpupin vcpu='85' cpuset='87'/> 99 <vcpupin vcpu='86' cpuset='89'/> 100 <vcpupin vcpu='87' cpuset='91'/> 101</cputune> 102<resource> 103 <partition>/machine</partition> 104</resource> 105<os> 106 <type arch='x86_64' machine='pc-q35-6.2'>hvm</type> 107 <loader readonly='yes' secure='no' type='pflash'>/usr/share/OVMF/OVMF_CODE_4M.ms.fd</loader> 108 <nvram>/var/lib/libvirt/qemu/nvram/ubuntu-vm-numa0_VARS.fd</nvram> 109 <boot dev='hd'/> 110 <smbios mode='host'/> 111</os> 112<features> 113 <acpi/> 114 <apic/> 115</features> 116<cpu mode='host-passthrough' check='none' migratable='on'> 117 <topology sockets='2' dies='1' cores='44' threads='1'/> 118 <numa> 119 <cell id='0' cpus='0-43' memory='943718400' unit='KiB'/> 120 <cell id='1' cpus='44-87' memory='943718400' unit='KiB'/> 121 </numa> 122</cpu> 123<numatune> 124 <memory mode='strict' nodeset='0-1'/> 125</numatune> 126<clock offset='utc'> 127 <timer name='rtc' tickpolicy='catchup'/> 128 <timer name='pit' tickpolicy='delay'/> 129 <timer name='hpet' present='no'/> 130</clock> 131<on_poweroff>destroy</on_poweroff> 132<on_reboot>restart</on_reboot> 133<on_crash>destroy</on_crash> 134<pm> 135 <suspend-to-mem enabled='no'/> 136 <suspend-to-disk enabled='no'/> 137</pm> 138<devices> 139 <emulator>/usr/bin/qemu-system-x86_64</emulator> 140 <disk type='file' device='disk'> 141 <driver name='qemu' type='qcow2' discard='unmap'/> 142 <source file='/var/lib/libvirt/images/ubuntu-vm-numa0.qcow2'/> 143 <target dev='vda' bus='virtio'/> 144 <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/> 145 </disk> 146 <controller type='usb' index='0' model='qemu-xhci' ports='15'> 147 <address type='pci' domain='0x0000' bus='0x02' slot='0x00' function='0x0'/> 148 </controller> 149 <controller type='pci' index='0' model='pcie-root'/> 150 <controller type='pci' index='1' model='pcie-root-port'> 151 <model name='pcie-root-port'/> 152 <target chassis='1' port='0x10'/> 153 <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0' multifunction='on'/> 154 </controller> 155 <controller type='pci' index='2' model='pcie-root-port'> 156 <model name='pcie-root-port'/> 157 <target chassis='2' port='0x11'/> 158 <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x1'/> 159 </controller> 160 <controller type='pci' index='3' model='pcie-root-port'> 161 <model name='pcie-root-port'/> 162 <target chassis='3' port='0x12'/> 163 <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x2'/> 164 </controller> 165 <controller type='pci' index='4' model='pcie-root-port'> 166 <model name='pcie-root-port'/> 167 <target chassis='4' port='0x13'/> 168 <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x3'/> 169 </controller> 170 <controller type='pci' index='5' model='pcie-root-port'> 171 <model name='pcie-root-port'/> 172 <target chassis='5' port='0x14'/> 173 <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x4'/> 174 </controller> 175 <controller type='pci' index='6' model='pcie-root-port'> 176 <model name='pcie-root-port'/> 177 <target chassis='6' port='0x15'/> 178 <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x5'/> 179 </controller> 180 <controller type='pci' index='7' model='pcie-root-port'> 181 <model name='pcie-root-port'/> 182 <target chassis='7' port='0x16'/> 183 <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x6'/> 184 </controller> 185 <controller type='pci' index='8' model='pcie-root-port'> 186 <model name='pcie-root-port'/> 187 <target chassis='8' port='0x17'/> 188 <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x7'/> 189 </controller> 190 <controller type='pci' index='9' model='pcie-root-port'> 191 <model name='pcie-root-port'/> 192 <target chassis='9' port='0x18'/> 193 <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0' multifunction='on'/> 194 </controller> 195 <controller type='pci' index='10' model='pcie-root-port'> 196 <model name='pcie-root-port'/> 197 <target chassis='10' port='0x19'/> 198 <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x1'/> 199 </controller> 200 <controller type='pci' index='11' model='pcie-root-port'> 201 <model name='pcie-root-port'/> 202 <target chassis='11' port='0x1a'/> 203 <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x2'/> 204 </controller> 205 <controller type='pci' index='12' model='pcie-root-port'> 206 <model name='pcie-root-port'/> 207 <target chassis='12' port='0x1b'/> 208 <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x3'/> 209 </controller> 210 <controller type='pci' index='13' model='pcie-root-port'> 211 <model name='pcie-root-port'/> 212 <target chassis='13' port='0x1c'/> 213 <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x4'/> 214 </controller> 215 <controller type='pci' index='14' model='pcie-root-port'> 216 <model name='pcie-root-port'/> 217 <target chassis='14' port='0x1d'/> 218 <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x5'/> 219 </controller> 220 221 <!-- PCI expander bus NUMA node 0 --> 222 <controller type='pci' index='15' model='pcie-expander-bus'> 223 <target busNr='0x20'> 224 <node>0</node> 225 </target> 226 <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x6'/> 227 </controller> 228 229 <!-- PCI expander bus NUMA node 1 --> 230 <controller type='pci' index='16' model='pcie-expander-bus'> 231 <target busNr='0x40'> 232 <node>1</node> 233 </target> 234 <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x7'/> 235 </controller> 236 237 <!-- 4 root ports on bus 15 (index of upstream expander bus) NUMA node 0 --> 238 239 <controller type='pci' index='17' model='pcie-root-port'> 240 <address type='pci' bus='15' slot='0x00' function='0x0' multifunction='on'/> 241 </controller> 242 <controller type='pci' index='18' model='pcie-root-port'> 243 <address type='pci' bus='15' slot='0x00' function='0x1'/> 244 </controller> 245 <controller type='pci' index='19' model='pcie-root-port'> 246 <address type='pci' bus='15' slot='0x00' function='0x2'/> 247 </controller> 248 <controller type='pci' index='20' model='pcie-root-port'> 249 <address type='pci' bus='15' slot='0x00' function='0x3'/> 250 </controller> 251 252 <!-- 4 port PCIe switch on bus 17 (index of upstream root port) / func 0 --> 253 <controller type='pci' index='21' model='pcie-switch-upstream-port'> 254 <address type='pci' bus='17' slot='0x00' function='0x0'/> 255 </controller> 256 <controller type='pci' index='22' model='pcie-switch-downstream-port'> 257 <address type='pci' bus='21' slot='0x00' function='0x0'/> 258 </controller> 259 <controller type='pci' index='23' model='pcie-switch-downstream-port'> 260 <address type='pci' bus='21' slot='0x01' function='0x0'/> 261 </controller> 262 <controller type='pci' index='24' model='pcie-switch-downstream-port'> 263 <address type='pci' bus='21' slot='0x02' function='0x0'/> 264 </controller> 265 <controller type='pci' index='25' model='pcie-switch-downstream-port'> 266 <address type='pci' bus='21' slot='0x03' function='0x0'/> 267 </controller> 268 269 <!-- 4 port PCIe switch on bus 18 (index of upstream root port) / func 1 --> 270 <controller type='pci' index='26' model='pcie-switch-upstream-port'> 271 <address type='pci' bus='18' slot='0x00' function='0x0'/> 272 </controller> 273 <controller type='pci' index='27' model='pcie-switch-downstream-port'> 274 <address type='pci' bus='26' slot='0x00' function='0x0'/> 275 </controller> 276 <controller type='pci' index='28' model='pcie-switch-downstream-port'> 277 <address type='pci' bus='26' slot='0x01' function='0x0'/> 278 </controller> 279 <controller type='pci' index='29' model='pcie-switch-downstream-port'> 280 <address type='pci' bus='26' slot='0x02' function='0x0'/> 281 </controller> 282 <controller type='pci' index='30' model='pcie-switch-downstream-port'> 283 <address type='pci' bus='26' slot='0x03' function='0x0'/> 284 </controller> 285 286 <!-- 4 port PCIe switch on bus 19 (index of upstream root port) / func 2 --> 287 <controller type='pci' index='31' model='pcie-switch-upstream-port'> 288 <address type='pci' bus='19' slot='0x00' function='0x0'/> 289 </controller> 290 <controller type='pci' index='32' model='pcie-switch-downstream-port'> 291 <address type='pci' bus='31' slot='0x00' function='0x0'/> 292 </controller> 293 <controller type='pci' index='33' model='pcie-switch-downstream-port'> 294 <address type='pci' bus='31' slot='0x01' function='0x0'/> 295 </controller> 296 <controller type='pci' index='34' model='pcie-switch-downstream-port'> 297 <address type='pci' bus='31' slot='0x02' function='0x0'/> 298 </controller> 299 <controller type='pci' index='35' model='pcie-switch-downstream-port'> 300 <address type='pci' bus='31' slot='0x03' function='0x0'/> 301 </controller> 302 303 <!-- 4 port PCIe switch on bus 20 (index of upstream root port) / func 3 --> 304 <controller type='pci' index='36' model='pcie-switch-upstream-port'> 305 <address type='pci' bus='20' slot='0x00' function='0x0'/> 306 </controller> 307 <controller type='pci' index='37' model='pcie-switch-downstream-port'> 308 <address type='pci' bus='36' slot='0x00' function='0x0'/> 309 </controller> 310 <controller type='pci' index='38' model='pcie-switch-downstream-port'> 311 <address type='pci' bus='36' slot='0x01' function='0x0'/> 312 </controller> 313 <controller type='pci' index='39' model='pcie-switch-downstream-port'> 314 <address type='pci' bus='36' slot='0x02' function='0x0'/> 315 </controller> 316 <controller type='pci' index='40' model='pcie-switch-downstream-port'> 317 <address type='pci' bus='36' slot='0x03' function='0x0'/> 318 </controller> 319 320 321 <!-- 5 root ports on bus 16 (index of upstream expander bus) NUMA node 1 --> 322 323 <controller type='pci' index='41' model='pcie-root-port'> 324 <address type='pci' bus='16' slot='0x00' function='0x0' multifunction='on'/> 325 </controller> 326 <controller type='pci' index='42' model='pcie-root-port'> 327 <address type='pci' bus='16' slot='0x00' function='0x1'/> 328 </controller> 329 <controller type='pci' index='43' model='pcie-root-port'> 330 <address type='pci' bus='16' slot='0x00' function='0x2'/> 331 </controller> 332 <controller type='pci' index='44' model='pcie-root-port'> 333 <address type='pci' bus='16' slot='0x00' function='0x3'/> 334 </controller> 335 <controller type='pci' index='45' model='pcie-root-port'> 336 <address type='pci' bus='16' slot='0x00' function='0x4'/> 337 </controller> 338 339 <!-- 4 port PCIe switch on bus 41 (index of upstream root port) / func 0 --> 340 <controller type='pci' index='46' model='pcie-switch-upstream-port'> 341 <address type='pci' bus='41' slot='0x00' function='0x0'/> 342 </controller> 343 <controller type='pci' index='47' model='pcie-switch-downstream-port'> 344 <address type='pci' bus='46' slot='0x00' function='0x0'/> 345 </controller> 346 <controller type='pci' index='48' model='pcie-switch-downstream-port'> 347 <address type='pci' bus='46' slot='0x01' function='0x0'/> 348 </controller> 349 <controller type='pci' index='49' model='pcie-switch-downstream-port'> 350 <address type='pci' bus='46' slot='0x02' function='0x0'/> 351 </controller> 352 <controller type='pci' index='50' model='pcie-switch-downstream-port'> 353 <address type='pci' bus='46' slot='0x03' function='0x0'/> 354 </controller> 355 356 <!-- 4 port PCIe switch on bus 42 (index of upstream root port) / func 1 --> 357 <controller type='pci' index='51' model='pcie-switch-upstream-port'> 358 <address type='pci' bus='42' slot='0x00' function='0x0'/> 359 </controller> 360 <controller type='pci' index='52' model='pcie-switch-downstream-port'> 361 <address type='pci' bus='51' slot='0x00' function='0x0'/> 362 </controller> 363 <controller type='pci' index='53' model='pcie-switch-downstream-port'> 364 <address type='pci' bus='51' slot='0x01' function='0x0'/> 365 </controller> 366 <controller type='pci' index='54' model='pcie-switch-downstream-port'> 367 <address type='pci' bus='51' slot='0x02' function='0x0'/> 368 </controller> 369 <controller type='pci' index='55' model='pcie-switch-downstream-port'> 370 <address type='pci' bus='51' slot='0x03' function='0x0'/> 371 </controller> 372 373 <!-- 4 port PCIe switch on bus 43 (index of upstream root port) / func 2 --> 374 <controller type='pci' index='56' model='pcie-switch-upstream-port'> 375 <address type='pci' bus='43' slot='0x00' function='0x0'/> 376 </controller> 377 <controller type='pci' index='57' model='pcie-switch-downstream-port'> 378 <address type='pci' bus='56' slot='0x00' function='0x0'/> 379 </controller> 380 <controller type='pci' index='58' model='pcie-switch-downstream-port'> 381 <address type='pci' bus='56' slot='0x01' function='0x0'/> 382 </controller> 383 <controller type='pci' index='59' model='pcie-switch-downstream-port'> 384 <address type='pci' bus='56' slot='0x02' function='0x0'/> 385 </controller> 386 <controller type='pci' index='60' model='pcie-switch-downstream-port'> 387 <address type='pci' bus='56' slot='0x03' function='0x0'/> 388 </controller> 389 390 <!-- 4 port PCIe switch on bus 44 (index of upstream root port) / func 3 --> 391 <controller type='pci' index='61' model='pcie-switch-upstream-port'> 392 <address type='pci' bus='44' slot='0x00' function='0x0'/> 393 </controller> 394 <controller type='pci' index='62' model='pcie-switch-downstream-port'> 395 <address type='pci' bus='61' slot='0x00' function='0x0'/> 396 </controller> 397 <controller type='pci' index='63' model='pcie-switch-downstream-port'> 398 <address type='pci' bus='61' slot='0x01' function='0x0'/> 399 </controller> 400 <controller type='pci' index='64' model='pcie-switch-downstream-port'> 401 <address type='pci' bus='61' slot='0x02' function='0x0'/> 402 </controller> 403 <controller type='pci' index='65' model='pcie-switch-downstream-port'> 404 <address type='pci' bus='61' slot='0x03' function='0x0'/> 405 </controller> 406 407 <!-- 4 port PCIe switch on bus 45 (index of upstream root port) / func 4 --> 408 <controller type='pci' index='66' model='pcie-switch-upstream-port'> 409 <address type='pci' bus='45' slot='0x00' function='0x0'/> 410 </controller> 411 <controller type='pci' index='67' model='pcie-switch-downstream-port'> 412 <address type='pci' bus='66' slot='0x00' function='0x0'/> 413 </controller> 414 <controller type='pci' index='68' model='pcie-switch-downstream-port'> 415 <address type='pci' bus='66' slot='0x01' function='0x0'/> 416 </controller> 417 <controller type='pci' index='69' model='pcie-switch-downstream-port'> 418 <address type='pci' bus='66' slot='0x02' function='0x0'/> 419 </controller> 420 <controller type='pci' index='70' model='pcie-switch-downstream-port'> 421 <address type='pci' bus='66' slot='0x03' function='0x0'/> 422 </controller> 423 424 425 <controller type='sata' index='0'> 426 <address type='pci' domain='0x0000' bus='0x00' slot='0x1f' function='0x2'/> 427 </controller> 428 <controller type='virtio-serial' index='0'> 429 <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0'/> 430 </controller> 431 <interface type='bridge'> 432 <mac address='52:54:00:e2:a3:d9'/> 433 <source bridge='br0'/> 434 <model type='virtio'/> 435 <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/> 436 </interface> 437 <serial type='pty'> 438 <target type='isa-serial' port='0'> 439 <model name='isa-serial'/> 440 </target> 441 </serial> 442 <console type='pty'> 443 <target type='serial' port='0'/> 444 </console> 445 <channel type='unix'> 446 <target type='virtio' name='org.qemu.guest_agent.0'/> 447 <address type='virtio-serial' controller='0' bus='0' port='1'/> 448 </channel> 449 <input type='tablet' bus='usb'> 450 <address type='usb' bus='0' port='1'/> 451 </input> 452 <input type='keyboard' bus='usb'> 453 <address type='usb' bus='0' port='2'/> 454 </input> 455 <input type='mouse' bus='ps2'/> 456 <input type='keyboard' bus='ps2'/> 457 <graphics type='spice' autoport='yes'> 458 <listen type='address'/> 459 <image compression='off'/> 460 </graphics> 461 <audio id='1' type='spice'/> 462 <video> 463 <model type='qxl' ram='65536' vram='65536' vgamem='16384' heads='1' primary='yes'/> 464 <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0'/> 465 </video> 466 467 <!-- GPUs, NICs, NVMe on socket 0 --> 468 469 <hostdev mode='subsystem' type='pci' managed='yes'> 470 <driver name='vfio'/> 471 <source> 472 <address domain='0x0000' bus='0x18' slot='0x00' function='0x0'/> 473 </source> 474 <address type='pci' domain='0x0000' bus='22' slot='0x00' function='0x0'/> 475 </hostdev> 476 <hostdev mode='subsystem' type='pci' managed='yes'> 477 <driver name='vfio'/> 478 <source> 479 <address domain='0x0000' bus='0x19' slot='0x00' function='0x0'/> 480 </source> 481 <address type='pci' domain='0x0000' bus='23' slot='0x00' function='0x0'/> 482 </hostdev> 483 <hostdev mode='subsystem' type='pci' managed='yes'> 484 <source> 485 <address domain='0x0000' bus='0x1a' slot='0x00' function='0x0'/> 486 </source> 487 <address type='pci' domain='0x0000' bus='24' slot='0x00' function='0x0'/> 488 </hostdev> 489 <hostdev mode='subsystem' type='pci' managed='yes'> 490 <driver name='vfio'/> 491 <source> 492 <address domain='0x0000' bus='0x3b' slot='0x00' function='0x0'/> 493 </source> 494 <address type='pci' domain='0x0000' bus='27' slot='0x00' function='0x0'/> 495 </hostdev> 496 <hostdev mode='subsystem' type='pci' managed='yes'> 497 <source> 498 <address domain='0x0000' bus='0x3c' slot='0x00' function='0x0'/> 499 </source> 500 <address type='pci' domain='0x0000' bus='28' slot='0x00' function='0x0'/> 501 </hostdev> 502 <hostdev mode='subsystem' type='pci' managed='yes'> 503 <driver name='vfio'/> 504 <source> 505 <address domain='0x0000' bus='0x4c' slot='0x00' function='0x0'/> 506 </source> 507 <address type='pci' domain='0x0000' bus='32' slot='0x00' function='0x0'/> 508 </hostdev> 509 <hostdev mode='subsystem' type='pci' managed='yes'> 510 <source> 511 <address domain='0x0000' bus='0x4d' slot='0x00' function='0x0'/> 512 </source> 513 <address type='pci' domain='0x0000' bus='33' slot='0x00' function='0x0'/> 514 </hostdev> 515 <hostdev mode='subsystem' type='pci' managed='yes'> 516 <driver name='vfio'/> 517 <source> 518 <address domain='0x0000' bus='0x5d' slot='0x00' function='0x0'/> 519 </source> 520 <address type='pci' domain='0x0000' bus='37' slot='0x00' function='0x0'/> 521 </hostdev> 522 <hostdev mode='subsystem' type='pci' managed='yes'> 523 <source> 524 <address domain='0x0000' bus='0x5e' slot='0x00' function='0x0'/> 525 </source> 526 <address type='pci' domain='0x0000' bus='38' slot='0x00' function='0x0'/> 527 </hostdev> 528 529 <!-- GPUs, NICs on socket 1 --> 530 531 <hostdev mode='subsystem' type='pci' managed='yes'> 532 <driver name='vfio'/> 533 <source> 534 <address domain='0x0000' bus='0x9b' slot='0x00' function='0x0'/> 535 </source> 536 <address type='pci' domain='0x0000' bus='52' slot='0x00' function='0x0'/> 537 </hostdev> 538 <hostdev mode='subsystem' type='pci' managed='yes'> 539 <source> 540 <address domain='0x0000' bus='0x9c' slot='0x00' function='0x0'/> 541 </source> 542 <address type='pci' domain='0x0000' bus='53' slot='0x00' function='0x0'/> 543 </hostdev> 544 <hostdev mode='subsystem' type='pci' managed='yes'> 545 <driver name='vfio'/> 546 <source> 547 <address domain='0x0000' bus='0xbb' slot='0x00' function='0x0'/> 548 </source> 549 <address type='pci' domain='0x0000' bus='57' slot='0x00' function='0x0'/> 550 </hostdev> 551 <hostdev mode='subsystem' type='pci' managed='yes'> 552 <source> 553 <address domain='0x0000' bus='0xbc' slot='0x00' function='0x0'/> 554 </source> 555 <address type='pci' domain='0x0000' bus='58' slot='0x00' function='0x0'/> 556 </hostdev> 557 <hostdev mode='subsystem' type='pci' managed='yes'> 558 <driver name='vfio'/> 559 <source> 560 <address domain='0x0000' bus='0xcb' slot='0x00' function='0x0'/> 561 </source> 562 <address type='pci' domain='0x0000' bus='62' slot='0x00' function='0x0'/> 563 </hostdev> 564 <hostdev mode='subsystem' type='pci' managed='yes'> 565 <source> 566 <address domain='0x0000' bus='0xcc' slot='0x00' function='0x0'/> 567 </source> 568 <address type='pci' domain='0x0000' bus='63' slot='0x00' function='0x0'/> 569 </hostdev> 570 <hostdev mode='subsystem' type='pci' managed='yes'> 571 <driver name='vfio'/> 572 <source> 573 <address domain='0x0000' bus='0xdb' slot='0x00' function='0x0'/> 574 </source> 575 <address type='pci' domain='0x0000' bus='67' slot='0x00' function='0x0'/> 576 </hostdev> 577 <hostdev mode='subsystem' type='pci' managed='yes'> 578 <source> 579 <address domain='0x0000' bus='0xdc' slot='0x00' function='0x0'/> 580 </source> 581 <address type='pci' domain='0x0000' bus='68' slot='0x00' function='0x0'/> 582 </hostdev> 583 584 <!-- NVswitches socket 1 --> 585 586 <hostdev mode='subsystem' type='pci' managed='yes'> 587 <source> 588 <address domain='0x0000' bus='0x83' slot='0x00' function='0x0'/> 589 </source> 590 <address type='pci' domain='0x0000' bus='47' slot='0x00' function='0x0'/> 591 </hostdev> 592 <hostdev mode='subsystem' type='pci' managed='yes'> 593 <source> 594 <address domain='0x0000' bus='0x84' slot='0x00' function='0x0'/> 595 </source> 596 <address type='pci' domain='0x0000' bus='48' slot='0x00' function='0x0'/> 597 </hostdev> 598 <hostdev mode='subsystem' type='pci' managed='yes'> 599 <source> 600 <address domain='0x0000' bus='0x85' slot='0x00' function='0x0'/> 601 </source> 602 <address type='pci' domain='0x0000' bus='49' slot='0x00' function='0x0'/> 603 </hostdev> 604 <hostdev mode='subsystem' type='pci' managed='yes'> 605 <source> 606 <address domain='0x0000' bus='0x86' slot='0x00' function='0x0'/> 607 </source> 608 <address type='pci' domain='0x0000' bus='50' slot='0x00' function='0x0'/> 609 </hostdev> 610 611 <memballoon model='virtio'> 612 <address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x0'/> 613 </memballoon> 614 <rng model='virtio'> 615 <backend model='random'>/dev/urandom</backend> 616 <address type='pci' domain='0x0000' bus='0x06' slot='0x00' function='0x0'/> 617 </rng> 618</devices> 619<seclabel type='dynamic' model='apparmor' relabel='yes'/> 620<qemu:commandline> 621 <qemu:arg value='-fw_cfg'/> 622 <qemu:arg value='opt/ovmf/X-PciMmio64Mb,string=4718592'/> 623</qemu:commandline> 624</domain>
NCCL#
NCCL (NVIDIA Collective Communication Library) is a communication layer widely used in distributed AI/ML training. It provides efficient, multi-GPU and multi-node collective operations (like all-reduce, all-gather, reduce-broadcast) that form the basis for training algorithms for large-scale deep learning models. NCCL offers a library of routines optimized for GPU-to-GPU communication and abstracts complexity from AI frameworks (e.g. TensorFlow, PyTorch). These routines take advantage of the underlying topology to minimize latency and maximize bandwidth utilization [2].
Topological awareness is a prerequisite for running NCCL. If the hypervisor obscures the physical hardware topology within a VM, NCCL will create suboptimal communication paths. This can introduce additional overhead and reduce overall effective bandwidth. For multi-node HGX server systems designed to handle distributed training or foundation model training workloads, sub-optimal NCCL performance directly translates to wasted GPU cycles, longer training durations, and reduced overall efficiency.
It is imperative that the VM NUMA and PCIe hierarchy mirrors the underlying physical topology so that NCCL can detect and utilize the correct GPU, NIC, and NUMA relationships. This ensures that the overall performance of NCCL-based applications in a virtualized environment approaches bare metal.