Appendix#
ATS and ACS configuration#
To achieve maximum GPUDirect RDMA performance in a VM, PCIe Address Translation Services (ATS) must be enabled on the NIC, and PCIe Access Control Services (ACS) settings on PCIe switches and root ports must be configured to allow ATS to work.
Enabling ATS on NVIDIA NICs#
To enable ATS, first make sure the Mellanox Support Tool Kit (MST) is running with sudo mst start, then use the following commands, one per NIC, to check and/or set the ATS status:
To check the current state:
1# mlxconfig -d /dev/mst/mt4129_pciconf0 q | grep ATS_ENABLED
2 ATS_ENABLED False(0)
In the example above, the first NIC (pciconf0) has ATS disabled, which is the default setting. Each NIC should be checked individually.
To enable it on a single NIC, use the following command:
# mlxconfig -d /dev/mst/mt4123_pciconf0 set ATS_ENABLED=true
After running the command above for all NICs, a host reboot is necessary. Keep in mind that if the NIC is already in passthrough mode, rebooting the VM alone will not suffice to complete this change; the host must be rebooted in order to fully reinitialise the networking card.
When ATS is enabled on the NIC, the Address Translation Service capability should show Enable+ in lspci output:
11a:00.0 Infiniband controller: Mellanox Technologies MT2910 Family [ConnectX-7]
2 Subsystem: Mellanox Technologies MT2910 Family [ConnectX-7]
3...
4 Capabilities: [480 v1] Address Translation Service (ATS)
5 ATSCap: Invalidate Queue Depth: 00
6 ATSCtl: Enable+, Smallest Translation Unit: 00
7 Kernel driver in use: mlx5_core
8 Kernel modules: mlx5_core
Alternately, check the status on all NVIDIA NICs in the system using the following command:
sudo lspci -d 15b3: -vvv | grep ATSCtl
1user@hgx1:~$ sudo lspci -d 15b3: -vvv | grep ATSCtl
2 ATSCtl: Enable+, Smallest Translation Unit: 00
3 ATSCtl: Enable+, Smallest Translation Unit: 00
4 ATSCtl: Enable+, Smallest Translation Unit: 00
5 ATSCtl: Enable+, Smallest Translation Unit: 00
6 ATSCtl: Enable+, Smallest Translation Unit: 00
7 ATSCtl: Enable+, Smallest Translation Unit: 00
8 ATSCtl: Enable+, Smallest Translation Unit: 00
9 ATSCtl: Enable+, Smallest Translation Unit: 00
Configuring ACS#
The following ACS settings should be configured on PCIe root ports and switch downstream ports to allow use of ATS by NICs:
SrcValid (Source Validation) - enabled
TransBlk (Translation Blocking) - disabled
ReqRedir (P2P Request Redirection) - enabled
CmpltRedir (P2P Completion Redirection) - enabled
UpstreamFwd (Upstream Forwarding) - enabled
EgressCtrl (Egress Control) - disabled
DirectTrans (Direct Translated P2P) - enabled
These settings are exposed in the ACS Capability:
sudo lspci -vvv | less
1...
217:00.0 PCI bridge: Broadcom / LSI PEX890xx PCIe Gen 5 Switch (rev b0) (prog-if 00 [Normal decode])
3...
4 Capabilities: [170 v1] Access Control Services
5 ACSCap: SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+ UpstreamFwd+ EgressCtrl- DirectTrans+
6 ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
7 Capabilities: [1f0 v1] Advanced Error Reporting
8...
These ACS settings are applied on the virtualization host, not in guest VMs. The following script should be run each time the system is powered on or rebooted, as these settings are not retained across resets / reboots:
1#!/bin/bash
2#
3# Copyright (c) 2020, NVIDIA Corporation.
4#
5# Permission is hereby granted, free of charge, to any person obtaining a copy of this
6# software and associated documentation files (the "Software"), to deal in the Software
7# without restriction, including without limitation the rights to use, copy, modify,
8# merge, publish, distribute, sublicense, and/or sell copies of the Software, and to
9# permit persons to whom the Software is furnished to do so, subject to the following
10# conditions:
11#
12# The above copyright notice and this permission notice shall be included in all
13# copies or substantial portions of the Software.
14#
15# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED,
16# INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A
17# PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
18# HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
19# OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
20# SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
21#
22#
23# Enable P2P specific ACS bits on every device that supports it
24
25PLATFORM=$(dmidecode --string system-product-name)
26logger "PLATFORM=${PLATFORM}"
27
28# must be root to access extended PCI config space
29if [ "$EUID" -ne 0 ]; then
30echo "ERROR: $0 must be run as root"
31exit 1
32fi
33
34for BDF in `lspci -d "*:*:*" | awk '{print $1}'`; do
35
36 # skip if it doesn't support ACS
37 setpci -v -s ${BDF} ECAP_ACS+0x6.w > /dev/null 2>&1
38 if [ $? -ne 0 ]; then
39 # echo "${BDF} does not support ACS, skipping"
40 continue
41 fi
42
43 logger "Enabling ACS on `lspci -s ${BDF}`"q
44 setpci -v -s ${BDF} ECAP_ACS+0x6.w
45 setpci -v -s ${BDF} ECAP_ACS+0x6.w=0x5D
46 setpci -v -s ${BDF} ECAP_ACS+0x6.w
47 if [ $? -ne 0 ]; then
48 logger "Error enabling ACS on ${BDF}"
49 continue
50 fi
51 NEW_VAL=`setpci -v -s ${BDF} ECAP_ACS+0x6.w | awk '{print $NF}'`
52 if [ "${NEW_VAL}" != "0x5D" ]; then
53 logger "Failed to Enable ACS on ${BDF}"
54 continue
55 fi
56done
57exit 0
Example physical topology map for HGX H200 8-GPU platform#
Output from lstopo -sv command run on the vGPU host:
1Machine (2015GB total)
2Package L#0
3 NUMANode L#0 (P#0 1007GB)
4 L3 L#0 (105MB)
5 L2 L#0 (2048KB) + L1d L#0 (48KB) + L1i L#0 (32KB) + Core L#0 + PU L#0 (P#0)
6 L2 L#1 (2048KB) + L1d L#1 (48KB) + L1i L#1 (32KB) + Core L#1 + PU L#1 (P#2)
7 L2 L#2 (2048KB) + L1d L#2 (48KB) + L1i L#2 (32KB) + Core L#2 + PU L#2 (P#4)
8 L2 L#3 (2048KB) + L1d L#3 (48KB) + L1i L#3 (32KB) + Core L#3 + PU L#3 (P#6)
9 L2 L#4 (2048KB) + L1d L#4 (48KB) + L1i L#4 (32KB) + Core L#4 + PU L#4 (P#8)
10 L2 L#5 (2048KB) + L1d L#5 (48KB) + L1i L#5 (32KB) + Core L#5 + PU L#5 (P#10)
11 L2 L#6 (2048KB) + L1d L#6 (48KB) + L1i L#6 (32KB) + Core L#6 + PU L#6 (P#12)
12 L2 L#7 (2048KB) + L1d L#7 (48KB) + L1i L#7 (32KB) + Core L#7 + PU L#7 (P#14)
13 L2 L#8 (2048KB) + L1d L#8 (48KB) + L1i L#8 (32KB) + Core L#8 + PU L#8 (P#16)
14 L2 L#9 (2048KB) + L1d L#9 (48KB) + L1i L#9 (32KB) + Core L#9 + PU L#9 (P#18)
15 L2 L#10 (2048KB) + L1d L#10 (48KB) + L1i L#10 (32KB) + Core L#10 + PU L#10 (P#20)
16 L2 L#11 (2048KB) + L1d L#11 (48KB) + L1i L#11 (32KB) + Core L#11 + PU L#11 (P#22)
17 L2 L#12 (2048KB) + L1d L#12 (48KB) + L1i L#12 (32KB) + Core L#12 + PU L#12 (P#24)
18 L2 L#13 (2048KB) + L1d L#13 (48KB) + L1i L#13 (32KB) + Core L#13 + PU L#13 (P#26)
19 L2 L#14 (2048KB) + L1d L#14 (48KB) + L1i L#14 (32KB) + Core L#14 + PU L#14 (P#28)
20 L2 L#15 (2048KB) + L1d L#15 (48KB) + L1i L#15 (32KB) + Core L#15 + PU L#15 (P#30)
21 L2 L#16 (2048KB) + L1d L#16 (48KB) + L1i L#16 (32KB) + Core L#16 + PU L#16 (P#32)
22 L2 L#17 (2048KB) + L1d L#17 (48KB) + L1i L#17 (32KB) + Core L#17 + PU L#17 (P#34)
23 L2 L#18 (2048KB) + L1d L#18 (48KB) + L1i L#18 (32KB) + Core L#18 + PU L#18 (P#36)
24 L2 L#19 (2048KB) + L1d L#19 (48KB) + L1i L#19 (32KB) + Core L#19 + PU L#19 (P#38)
25 L2 L#20 (2048KB) + L1d L#20 (48KB) + L1i L#20 (32KB) + Core L#20 + PU L#20 (P#40)
26 L2 L#21 (2048KB) + L1d L#21 (48KB) + L1i L#21 (32KB) + Core L#21 + PU L#21 (P#42)
27 L2 L#22 (2048KB) + L1d L#22 (48KB) + L1i L#22 (32KB) + Core L#22 + PU L#22 (P#44)
28 L2 L#23 (2048KB) + L1d L#23 (48KB) + L1i L#23 (32KB) + Core L#23 + PU L#23 (P#46)
29 L2 L#24 (2048KB) + L1d L#24 (48KB) + L1i L#24 (32KB) + Core L#24 + PU L#24 (P#48)
30 L2 L#25 (2048KB) + L1d L#25 (48KB) + L1i L#25 (32KB) + Core L#25 + PU L#25 (P#50)
31 L2 L#26 (2048KB) + L1d L#26 (48KB) + L1i L#26 (32KB) + Core L#26 + PU L#26 (P#52)
32 L2 L#27 (2048KB) + L1d L#27 (48KB) + L1i L#27 (32KB) + Core L#27 + PU L#27 (P#54)
33 L2 L#28 (2048KB) + L1d L#28 (48KB) + L1i L#28 (32KB) + Core L#28 + PU L#28 (P#56)
34 L2 L#29 (2048KB) + L1d L#29 (48KB) + L1i L#29 (32KB) + Core L#29 + PU L#29 (P#58)
35 L2 L#30 (2048KB) + L1d L#30 (48KB) + L1i L#30 (32KB) + Core L#30 + PU L#30 (P#60)
36 L2 L#31 (2048KB) + L1d L#31 (48KB) + L1i L#31 (32KB) + Core L#31 + PU L#31 (P#62)
37 L2 L#32 (2048KB) + L1d L#32 (48KB) + L1i L#32 (32KB) + Core L#32 + PU L#32 (P#64)
38 L2 L#33 (2048KB) + L1d L#33 (48KB) + L1i L#33 (32KB) + Core L#33 + PU L#33 (P#66)
39 L2 L#34 (2048KB) + L1d L#34 (48KB) + L1i L#34 (32KB) + Core L#34 + PU L#34 (P#68)
40 L2 L#35 (2048KB) + L1d L#35 (48KB) + L1i L#35 (32KB) + Core L#35 + PU L#35 (P#70)
41 L2 L#36 (2048KB) + L1d L#36 (48KB) + L1i L#36 (32KB) + Core L#36 + PU L#36 (P#72)
42 L2 L#37 (2048KB) + L1d L#37 (48KB) + L1i L#37 (32KB) + Core L#37 + PU L#37 (P#74)
43 L2 L#38 (2048KB) + L1d L#38 (48KB) + L1i L#38 (32KB) + Core L#38 + PU L#38 (P#76)
44 L2 L#39 (2048KB) + L1d L#39 (48KB) + L1i L#39 (32KB) + Core L#39 + PU L#39 (P#78)
45 L2 L#40 (2048KB) + L1d L#40 (48KB) + L1i L#40 (32KB) + Core L#40 + PU L#40 (P#80)
46 L2 L#41 (2048KB) + L1d L#41 (48KB) + L1i L#41 (32KB) + Core L#41 + PU L#41 (P#82)
47 L2 L#42 (2048KB) + L1d L#42 (48KB) + L1i L#42 (32KB) + Core L#42 + PU L#42 (P#84)
48 L2 L#43 (2048KB) + L1d L#43 (48KB) + L1i L#43 (32KB) + Core L#43 + PU L#43 (P#86)
49 L2 L#44 (2048KB) + L1d L#44 (48KB) + L1i L#44 (32KB) + Core L#44 + PU L#44 (P#88)
50 L2 L#45 (2048KB) + L1d L#45 (48KB) + L1i L#45 (32KB) + Core L#45 + PU L#45 (P#90)
51 L2 L#46 (2048KB) + L1d L#46 (48KB) + L1i L#46 (32KB) + Core L#46 + PU L#46 (P#92)
52 L2 L#47 (2048KB) + L1d L#47 (48KB) + L1i L#47 (32KB) + Core L#47 + PU L#47 (P#94)
53 HostBridge
54 PCIBridge
55 PCI 01:00.0 (NVMExp)
56 Block(Disk) "nvme0c0n1"
57 PCIBridge
58 PCI 02:00.0 (Ethernet)
59 Net "eno8303"
60 PCI 02:00.1 (Ethernet)
61 Net "eno8403"
62 PCIBridge
63 PCIBridge
64 PCI 04:00.0 (VGA)
65 PCI 00:18.0 (SATA)
66 PCI 00:19.0 (SATA)
67 HostBridge
68 PCIBridge
69 PCIBridge
70 PCIBridge
71 PCI 18:00.0 (NVMExp)
72 Block(Disk) "nvme1n1"
73 PCIBridge
74 PCI 19:00.0 (3D)
75 PCIBridge
76 PCI 1a:00.0 (InfiniBand)
77 OpenFabrics "mlx5_0"
78 PCIBridge
79 PCI 1c:00.0 (SAS)
80 HostBridge
81 PCIBridge
82 PCIBridge
83 PCIBridge
84 PCI 3a:00.0 (NVMExp)
85 Block(Disk) "nvme2n1"
86 PCIBridge
87 PCI 3b:00.0 (3D)
88 PCIBridge
89 PCI 3c:00.0 (InfiniBand)
90 OpenFabrics "mlx5_1"
91 PCIBridge
92 PCI 3d:00.0 (SAS)
93 HostBridge
94 PCIBridge
95 PCIBridge
96 PCIBridge
97 PCI 4c:00.0 (3D)
98 PCIBridge
99 PCI 4d:00.0 (InfiniBand)
100 OpenFabrics "mlx5_2"
101 PCIBridge
102 PCI 4e:00.0 (SAS)
103 HostBridge
104 PCIBridge
105 PCIBridge
106 PCIBridge
107 PCI 5d:00.0 (3D)
108 PCIBridge
109 PCI 5e:00.0 (InfiniBand)
110 OpenFabrics "mlx5_3"
111 PCIBridge
112 PCI 5f:00.0 (SAS)
113Package L#1
114 NUMANode L#1 (P#1 1008GB)
115 L3 L#1 (105MB)
116 L2 L#48 (2048KB) + L1d L#48 (48KB) + L1i L#48 (32KB) + Core L#48 + PU L#48 (P#1)
117 L2 L#49 (2048KB) + L1d L#49 (48KB) + L1i L#49 (32KB) + Core L#49 + PU L#49 (P#3)
118 L2 L#50 (2048KB) + L1d L#50 (48KB) + L1i L#50 (32KB) + Core L#50 + PU L#50 (P#5)
119 L2 L#51 (2048KB) + L1d L#51 (48KB) + L1i L#51 (32KB) + Core L#51 + PU L#51 (P#7)
120 L2 L#52 (2048KB) + L1d L#52 (48KB) + L1i L#52 (32KB) + Core L#52 + PU L#52 (P#9)
121 L2 L#53 (2048KB) + L1d L#53 (48KB) + L1i L#53 (32KB) + Core L#53 + PU L#53 (P#11)
122 L2 L#54 (2048KB) + L1d L#54 (48KB) + L1i L#54 (32KB) + Core L#54 + PU L#54 (P#13)
123 L2 L#55 (2048KB) + L1d L#55 (48KB) + L1i L#55 (32KB) + Core L#55 + PU L#55 (P#15)
124 L2 L#56 (2048KB) + L1d L#56 (48KB) + L1i L#56 (32KB) + Core L#56 + PU L#56 (P#17)
125 L2 L#57 (2048KB) + L1d L#57 (48KB) + L1i L#57 (32KB) + Core L#57 + PU L#57 (P#19)
126 L2 L#58 (2048KB) + L1d L#58 (48KB) + L1i L#58 (32KB) + Core L#58 + PU L#58 (P#21)
127 L2 L#59 (2048KB) + L1d L#59 (48KB) + L1i L#59 (32KB) + Core L#59 + PU L#59 (P#23)
128 L2 L#60 (2048KB) + L1d L#60 (48KB) + L1i L#60 (32KB) + Core L#60 + PU L#60 (P#25)
129 L2 L#61 (2048KB) + L1d L#61 (48KB) + L1i L#61 (32KB) + Core L#61 + PU L#61 (P#27)
130 L2 L#62 (2048KB) + L1d L#62 (48KB) + L1i L#62 (32KB) + Core L#62 + PU L#62 (P#29)
131 L2 L#63 (2048KB) + L1d L#63 (48KB) + L1i L#63 (32KB) + Core L#63 + PU L#63 (P#31)
132 L2 L#64 (2048KB) + L1d L#64 (48KB) + L1i L#64 (32KB) + Core L#64 + PU L#64 (P#33)
133 L2 L#65 (2048KB) + L1d L#65 (48KB) + L1i L#65 (32KB) + Core L#65 + PU L#65 (P#35)
134 L2 L#66 (2048KB) + L1d L#66 (48KB) + L1i L#66 (32KB) + Core L#66 + PU L#66 (P#37)
135 L2 L#67 (2048KB) + L1d L#67 (48KB) + L1i L#67 (32KB) + Core L#67 + PU L#67 (P#39)
136 L2 L#68 (2048KB) + L1d L#68 (48KB) + L1i L#68 (32KB) + Core L#68 + PU L#68 (P#41)
137 L2 L#69 (2048KB) + L1d L#69 (48KB) + L1i L#69 (32KB) + Core L#69 + PU L#69 (P#43)
138 L2 L#70 (2048KB) + L1d L#70 (48KB) + L1i L#70 (32KB) + Core L#70 + PU L#70 (P#45)
139 L2 L#71 (2048KB) + L1d L#71 (48KB) + L1i L#71 (32KB) + Core L#71 + PU L#71 (P#47)
140 L2 L#72 (2048KB) + L1d L#72 (48KB) + L1i L#72 (32KB) + Core L#72 + PU L#72 (P#49)
141 L2 L#73 (2048KB) + L1d L#73 (48KB) + L1i L#73 (32KB) + Core L#73 + PU L#73 (P#51)
142 L2 L#74 (2048KB) + L1d L#74 (48KB) + L1i L#74 (32KB) + Core L#74 + PU L#74 (P#53)
143 L2 L#75 (2048KB) + L1d L#75 (48KB) + L1i L#75 (32KB) + Core L#75 + PU L#75 (P#55)
144 L2 L#76 (2048KB) + L1d L#76 (48KB) + L1i L#76 (32KB) + Core L#76 + PU L#76 (P#57)
145 L2 L#77 (2048KB) + L1d L#77 (48KB) + L1i L#77 (32KB) + Core L#77 + PU L#77 (P#59)
146 L2 L#78 (2048KB) + L1d L#78 (48KB) + L1i L#78 (32KB) + Core L#78 + PU L#78 (P#61)
147 L2 L#79 (2048KB) + L1d L#79 (48KB) + L1i L#79 (32KB) + Core L#79 + PU L#79 (P#63)
148 L2 L#80 (2048KB) + L1d L#80 (48KB) + L1i L#80 (32KB) + Core L#80 + PU L#80 (P#65)
149 L2 L#81 (2048KB) + L1d L#81 (48KB) + L1i L#81 (32KB) + Core L#81 + PU L#81 (P#67)
150 L2 L#82 (2048KB) + L1d L#82 (48KB) + L1i L#82 (32KB) + Core L#82 + PU L#82 (P#69)
151 L2 L#83 (2048KB) + L1d L#83 (48KB) + L1i L#83 (32KB) + Core L#83 + PU L#83 (P#71)
152 L2 L#84 (2048KB) + L1d L#84 (48KB) + L1i L#84 (32KB) + Core L#84 + PU L#84 (P#73)
153 L2 L#85 (2048KB) + L1d L#85 (48KB) + L1i L#85 (32KB) + Core L#85 + PU L#85 (P#75)
154 L2 L#86 (2048KB) + L1d L#86 (48KB) + L1i L#86 (32KB) + Core L#86 + PU L#86 (P#77)
155 L2 L#87 (2048KB) + L1d L#87 (48KB) + L1i L#87 (32KB) + Core L#87 + PU L#87 (P#79)
156 L2 L#88 (2048KB) + L1d L#88 (48KB) + L1i L#88 (32KB) + Core L#88 + PU L#88 (P#81)
157 L2 L#89 (2048KB) + L1d L#89 (48KB) + L1i L#89 (32KB) + Core L#89 + PU L#89 (P#83)
158 L2 L#90 (2048KB) + L1d L#90 (48KB) + L1i L#90 (32KB) + Core L#90 + PU L#90 (P#85)
159 L2 L#91 (2048KB) + L1d L#91 (48KB) + L1i L#91 (32KB) + Core L#91 + PU L#91 (P#87)
160 L2 L#92 (2048KB) + L1d L#92 (48KB) + L1i L#92 (32KB) + Core L#92 + PU L#92 (P#89)
161 L2 L#93 (2048KB) + L1d L#93 (48KB) + L1i L#93 (32KB) + Core L#93 + PU L#93 (P#91)
162 L2 L#94 (2048KB) + L1d L#94 (48KB) + L1i L#94 (32KB) + Core L#94 + PU L#94 (P#93)
163 L2 L#95 (2048KB) + L1d L#95 (48KB) + L1i L#95 (32KB) + Core L#95 + PU L#95 (P#95)
164 HostBridge
165 PCIBridge
166 PCIBridge
167 PCIBridge
168 PCI 9b:00.0 (3D)
169 PCIBridge
170 PCI 9c:00.0 (InfiniBand)
171 OpenFabrics "mlx5_4"
172 PCIBridge
173 PCI 9e:00.0 (SAS)
174 HostBridge
175 PCIBridge
176 PCIBridge
177 PCIBridge
178 PCI bb:00.0 (3D)
179 PCIBridge
180 PCI bc:00.0 (InfiniBand)
181 OpenFabrics "mlx5_5"
182 PCIBridge
183 PCI bd:00.0 (SAS)
184 HostBridge
185 PCIBridge
186 PCIBridge
187 PCIBridge
188 PCI cb:00.0 (3D)
189 PCIBridge
190 PCI cc:00.0 (InfiniBand)
191 OpenFabrics "mlx5_6"
192 PCIBridge
193 PCI cd:00.0 (SAS)
194 HostBridge
195 PCIBridge
196 PCIBridge
197 PCIBridge
198 PCI db:00.0 (3D)
199 PCIBridge
200 PCI dc:00.0 (InfiniBand)
201 OpenFabrics "mlx5_7"
202 PCIBridge
203 PCI dd:00.0 (SAS)
Example libvirt domain XML for HGX H200 8-GPU platform#
Reference: libvirt: Domain XML format
This sample XML created on Ubuntu 22.04 is for the VM configured on the following physical topology, with the PCIe devices highlighted in green boxes passed through to the VM.
The VM is configured with 88 vCPUs arranged in two sockets, each with 44 cores. Each socket implements a single NUMA node with approximately 1TB of RAM. The virtual PCIe hierarchy is equivalent to the physical topology, with 4 PCIe switches on CPU socket 0 and 5 PCIe switches on CPU socket 1.
1<domain type='kvm' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'>
2<name>ubuntu-vm-numa0</name>
3<uuid>ea260ed6-8344-49fc-952c-ec6c3de1364e</uuid>
4<metadata>
5 <libosinfo:libosinfo xmlns:libosinfo="http://libosinfo.org/xmlns/libvirt/domain/1.0">
6 <libosinfo:os id="http://ubuntu.com/ubuntu/22.04"/>
7 </libosinfo:libosinfo>
8</metadata>
9<memory unit='KiB'>1887436800</memory>
10<currentMemory unit='KiB'>1887436800</currentMemory>
11<vcpu placement='static'>88</vcpu>
12<cputune>
13 <vcpupin vcpu='0' cpuset='4'/>
14 <vcpupin vcpu='1' cpuset='6'/>
15 <vcpupin vcpu='2' cpuset='8'/>
16 <vcpupin vcpu='3' cpuset='10'/>
17 <vcpupin vcpu='4' cpuset='12'/>
18 <vcpupin vcpu='5' cpuset='14'/>
19 <vcpupin vcpu='6' cpuset='16'/>
20 <vcpupin vcpu='7' cpuset='18'/>
21 <vcpupin vcpu='8' cpuset='20'/>
22 <vcpupin vcpu='9' cpuset='22'/>
23 <vcpupin vcpu='10' cpuset='24'/>
24 <vcpupin vcpu='11' cpuset='26'/>
25 <vcpupin vcpu='12' cpuset='28'/>
26 <vcpupin vcpu='13' cpuset='30'/>
27 <vcpupin vcpu='14' cpuset='32'/>
28 <vcpupin vcpu='15' cpuset='34'/>
29 <vcpupin vcpu='16' cpuset='36'/>
30 <vcpupin vcpu='17' cpuset='38'/>
31 <vcpupin vcpu='18' cpuset='40'/>
32 <vcpupin vcpu='19' cpuset='42'/>
33 <vcpupin vcpu='20' cpuset='44'/>
34 <vcpupin vcpu='21' cpuset='46'/>
35 <vcpupin vcpu='22' cpuset='48'/>
36 <vcpupin vcpu='23' cpuset='50'/>
37 <vcpupin vcpu='24' cpuset='52'/>
38 <vcpupin vcpu='25' cpuset='54'/>
39 <vcpupin vcpu='26' cpuset='56'/>
40 <vcpupin vcpu='27' cpuset='58'/>
41 <vcpupin vcpu='28' cpuset='60'/>
42 <vcpupin vcpu='29' cpuset='62'/>
43 <vcpupin vcpu='30' cpuset='64'/>
44 <vcpupin vcpu='31' cpuset='66'/>
45 <vcpupin vcpu='32' cpuset='68'/>
46 <vcpupin vcpu='33' cpuset='70'/>
47 <vcpupin vcpu='34' cpuset='72'/>
48 <vcpupin vcpu='35' cpuset='74'/>
49 <vcpupin vcpu='36' cpuset='76'/>
50 <vcpupin vcpu='37' cpuset='78'/>
51 <vcpupin vcpu='38' cpuset='80'/>
52 <vcpupin vcpu='39' cpuset='82'/>
53 <vcpupin vcpu='40' cpuset='84'/>
54 <vcpupin vcpu='41' cpuset='86'/>
55 <vcpupin vcpu='42' cpuset='88'/>
56 <vcpupin vcpu='43' cpuset='90'/>
57 <vcpupin vcpu='44' cpuset='5'/>
58 <vcpupin vcpu='45' cpuset='7'/>
59 <vcpupin vcpu='46' cpuset='9'/>
60 <vcpupin vcpu='47' cpuset='11'/>
61 <vcpupin vcpu='48' cpuset='13'/>
62 <vcpupin vcpu='49' cpuset='15'/>
63 <vcpupin vcpu='50' cpuset='17'/>
64 <vcpupin vcpu='51' cpuset='19'/>
65 <vcpupin vcpu='52' cpuset='21'/>
66 <vcpupin vcpu='53' cpuset='23'/>
67 <vcpupin vcpu='54' cpuset='25'/>
68 <vcpupin vcpu='55' cpuset='27'/>
69 <vcpupin vcpu='56' cpuset='29'/>
70 <vcpupin vcpu='57' cpuset='31'/>
71 <vcpupin vcpu='58' cpuset='33'/>
72 <vcpupin vcpu='59' cpuset='35'/>
73 <vcpupin vcpu='60' cpuset='37'/>
74 <vcpupin vcpu='61' cpuset='39'/>
75 <vcpupin vcpu='62' cpuset='41'/>
76 <vcpupin vcpu='63' cpuset='43'/>
77 <vcpupin vcpu='64' cpuset='45'/>
78 <vcpupin vcpu='65' cpuset='47'/>
79 <vcpupin vcpu='66' cpuset='49'/>
80 <vcpupin vcpu='67' cpuset='51'/>
81 <vcpupin vcpu='68' cpuset='53'/>
82 <vcpupin vcpu='69' cpuset='55'/>
83 <vcpupin vcpu='70' cpuset='57'/>
84 <vcpupin vcpu='71' cpuset='59'/>
85 <vcpupin vcpu='72' cpuset='61'/>
86 <vcpupin vcpu='73' cpuset='63'/>
87 <vcpupin vcpu='74' cpuset='65'/>
88 <vcpupin vcpu='75' cpuset='67'/>
89 <vcpupin vcpu='76' cpuset='69'/>
90 <vcpupin vcpu='77' cpuset='71'/>
91 <vcpupin vcpu='78' cpuset='73'/>
92 <vcpupin vcpu='79' cpuset='75'/>
93 <vcpupin vcpu='80' cpuset='77'/>
94 <vcpupin vcpu='81' cpuset='79'/>
95 <vcpupin vcpu='82' cpuset='81'/>
96 <vcpupin vcpu='83' cpuset='83'/>
97 <vcpupin vcpu='84' cpuset='85'/>
98 <vcpupin vcpu='85' cpuset='87'/>
99 <vcpupin vcpu='86' cpuset='89'/>
100 <vcpupin vcpu='87' cpuset='91'/>
101</cputune>
102<resource>
103 <partition>/machine</partition>
104</resource>
105<os>
106 <type arch='x86_64' machine='pc-q35-6.2'>hvm</type>
107 <loader readonly='yes' secure='no' type='pflash'>/usr/share/OVMF/OVMF_CODE_4M.ms.fd</loader>
108 <nvram>/var/lib/libvirt/qemu/nvram/ubuntu-vm-numa0_VARS.fd</nvram>
109 <boot dev='hd'/>
110 <smbios mode='host'/>
111</os>
112<features>
113 <acpi/>
114 <apic/>
115</features>
116<cpu mode='host-passthrough' check='none' migratable='on'>
117 <topology sockets='2' dies='1' cores='44' threads='1'/>
118 <numa>
119 <cell id='0' cpus='0-43' memory='943718400' unit='KiB'/>
120 <cell id='1' cpus='44-87' memory='943718400' unit='KiB'/>
121 </numa>
122</cpu>
123<numatune>
124 <memory mode='strict' nodeset='0-1'/>
125</numatune>
126<clock offset='utc'>
127 <timer name='rtc' tickpolicy='catchup'/>
128 <timer name='pit' tickpolicy='delay'/>
129 <timer name='hpet' present='no'/>
130</clock>
131<on_poweroff>destroy</on_poweroff>
132<on_reboot>restart</on_reboot>
133<on_crash>destroy</on_crash>
134<pm>
135 <suspend-to-mem enabled='no'/>
136 <suspend-to-disk enabled='no'/>
137</pm>
138<devices>
139 <emulator>/usr/bin/qemu-system-x86_64</emulator>
140 <disk type='file' device='disk'>
141 <driver name='qemu' type='qcow2' discard='unmap'/>
142 <source file='/var/lib/libvirt/images/ubuntu-vm-numa0.qcow2'/>
143 <target dev='vda' bus='virtio'/>
144 <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
145 </disk>
146 <controller type='usb' index='0' model='qemu-xhci' ports='15'>
147 <address type='pci' domain='0x0000' bus='0x02' slot='0x00' function='0x0'/>
148 </controller>
149 <controller type='pci' index='0' model='pcie-root'/>
150 <controller type='pci' index='1' model='pcie-root-port'>
151 <model name='pcie-root-port'/>
152 <target chassis='1' port='0x10'/>
153 <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0' multifunction='on'/>
154 </controller>
155 <controller type='pci' index='2' model='pcie-root-port'>
156 <model name='pcie-root-port'/>
157 <target chassis='2' port='0x11'/>
158 <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x1'/>
159 </controller>
160 <controller type='pci' index='3' model='pcie-root-port'>
161 <model name='pcie-root-port'/>
162 <target chassis='3' port='0x12'/>
163 <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x2'/>
164 </controller>
165 <controller type='pci' index='4' model='pcie-root-port'>
166 <model name='pcie-root-port'/>
167 <target chassis='4' port='0x13'/>
168 <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x3'/>
169 </controller>
170 <controller type='pci' index='5' model='pcie-root-port'>
171 <model name='pcie-root-port'/>
172 <target chassis='5' port='0x14'/>
173 <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x4'/>
174 </controller>
175 <controller type='pci' index='6' model='pcie-root-port'>
176 <model name='pcie-root-port'/>
177 <target chassis='6' port='0x15'/>
178 <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x5'/>
179 </controller>
180 <controller type='pci' index='7' model='pcie-root-port'>
181 <model name='pcie-root-port'/>
182 <target chassis='7' port='0x16'/>
183 <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x6'/>
184 </controller>
185 <controller type='pci' index='8' model='pcie-root-port'>
186 <model name='pcie-root-port'/>
187 <target chassis='8' port='0x17'/>
188 <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x7'/>
189 </controller>
190 <controller type='pci' index='9' model='pcie-root-port'>
191 <model name='pcie-root-port'/>
192 <target chassis='9' port='0x18'/>
193 <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0' multifunction='on'/>
194 </controller>
195 <controller type='pci' index='10' model='pcie-root-port'>
196 <model name='pcie-root-port'/>
197 <target chassis='10' port='0x19'/>
198 <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x1'/>
199 </controller>
200 <controller type='pci' index='11' model='pcie-root-port'>
201 <model name='pcie-root-port'/>
202 <target chassis='11' port='0x1a'/>
203 <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x2'/>
204 </controller>
205 <controller type='pci' index='12' model='pcie-root-port'>
206 <model name='pcie-root-port'/>
207 <target chassis='12' port='0x1b'/>
208 <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x3'/>
209 </controller>
210 <controller type='pci' index='13' model='pcie-root-port'>
211 <model name='pcie-root-port'/>
212 <target chassis='13' port='0x1c'/>
213 <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x4'/>
214 </controller>
215 <controller type='pci' index='14' model='pcie-root-port'>
216 <model name='pcie-root-port'/>
217 <target chassis='14' port='0x1d'/>
218 <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x5'/>
219 </controller>
220
221 <!-- PCI expander bus NUMA node 0 -->
222 <controller type='pci' index='15' model='pcie-expander-bus'>
223 <target busNr='0x20'>
224 <node>0</node>
225 </target>
226 <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x6'/>
227 </controller>
228
229 <!-- PCI expander bus NUMA node 1 -->
230 <controller type='pci' index='16' model='pcie-expander-bus'>
231 <target busNr='0x40'>
232 <node>1</node>
233 </target>
234 <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x7'/>
235 </controller>
236
237 <!-- 4 root ports on bus 15 (index of upstream expander bus) NUMA node 0 -->
238
239 <controller type='pci' index='17' model='pcie-root-port'>
240 <address type='pci' bus='15' slot='0x00' function='0x0' multifunction='on'/>
241 </controller>
242 <controller type='pci' index='18' model='pcie-root-port'>
243 <address type='pci' bus='15' slot='0x00' function='0x1'/>
244 </controller>
245 <controller type='pci' index='19' model='pcie-root-port'>
246 <address type='pci' bus='15' slot='0x00' function='0x2'/>
247 </controller>
248 <controller type='pci' index='20' model='pcie-root-port'>
249 <address type='pci' bus='15' slot='0x00' function='0x3'/>
250 </controller>
251
252 <!-- 4 port PCIe switch on bus 17 (index of upstream root port) / func 0 -->
253 <controller type='pci' index='21' model='pcie-switch-upstream-port'>
254 <address type='pci' bus='17' slot='0x00' function='0x0'/>
255 </controller>
256 <controller type='pci' index='22' model='pcie-switch-downstream-port'>
257 <address type='pci' bus='21' slot='0x00' function='0x0'/>
258 </controller>
259 <controller type='pci' index='23' model='pcie-switch-downstream-port'>
260 <address type='pci' bus='21' slot='0x01' function='0x0'/>
261 </controller>
262 <controller type='pci' index='24' model='pcie-switch-downstream-port'>
263 <address type='pci' bus='21' slot='0x02' function='0x0'/>
264 </controller>
265 <controller type='pci' index='25' model='pcie-switch-downstream-port'>
266 <address type='pci' bus='21' slot='0x03' function='0x0'/>
267 </controller>
268
269 <!-- 4 port PCIe switch on bus 18 (index of upstream root port) / func 1 -->
270 <controller type='pci' index='26' model='pcie-switch-upstream-port'>
271 <address type='pci' bus='18' slot='0x00' function='0x0'/>
272 </controller>
273 <controller type='pci' index='27' model='pcie-switch-downstream-port'>
274 <address type='pci' bus='26' slot='0x00' function='0x0'/>
275 </controller>
276 <controller type='pci' index='28' model='pcie-switch-downstream-port'>
277 <address type='pci' bus='26' slot='0x01' function='0x0'/>
278 </controller>
279 <controller type='pci' index='29' model='pcie-switch-downstream-port'>
280 <address type='pci' bus='26' slot='0x02' function='0x0'/>
281 </controller>
282 <controller type='pci' index='30' model='pcie-switch-downstream-port'>
283 <address type='pci' bus='26' slot='0x03' function='0x0'/>
284 </controller>
285
286 <!-- 4 port PCIe switch on bus 19 (index of upstream root port) / func 2 -->
287 <controller type='pci' index='31' model='pcie-switch-upstream-port'>
288 <address type='pci' bus='19' slot='0x00' function='0x0'/>
289 </controller>
290 <controller type='pci' index='32' model='pcie-switch-downstream-port'>
291 <address type='pci' bus='31' slot='0x00' function='0x0'/>
292 </controller>
293 <controller type='pci' index='33' model='pcie-switch-downstream-port'>
294 <address type='pci' bus='31' slot='0x01' function='0x0'/>
295 </controller>
296 <controller type='pci' index='34' model='pcie-switch-downstream-port'>
297 <address type='pci' bus='31' slot='0x02' function='0x0'/>
298 </controller>
299 <controller type='pci' index='35' model='pcie-switch-downstream-port'>
300 <address type='pci' bus='31' slot='0x03' function='0x0'/>
301 </controller>
302
303 <!-- 4 port PCIe switch on bus 20 (index of upstream root port) / func 3 -->
304 <controller type='pci' index='36' model='pcie-switch-upstream-port'>
305 <address type='pci' bus='20' slot='0x00' function='0x0'/>
306 </controller>
307 <controller type='pci' index='37' model='pcie-switch-downstream-port'>
308 <address type='pci' bus='36' slot='0x00' function='0x0'/>
309 </controller>
310 <controller type='pci' index='38' model='pcie-switch-downstream-port'>
311 <address type='pci' bus='36' slot='0x01' function='0x0'/>
312 </controller>
313 <controller type='pci' index='39' model='pcie-switch-downstream-port'>
314 <address type='pci' bus='36' slot='0x02' function='0x0'/>
315 </controller>
316 <controller type='pci' index='40' model='pcie-switch-downstream-port'>
317 <address type='pci' bus='36' slot='0x03' function='0x0'/>
318 </controller>
319
320 <!-- 5 root ports on bus 16 (index of upstream expander bus) NUMA node 1 -->
321
322 <controller type='pci' index='41' model='pcie-root-port'>
323 <address type='pci' bus='16' slot='0x00' function='0x0' multifunction='on'/>
324 </controller>
325 <controller type='pci' index='42' model='pcie-root-port'>
326 <address type='pci' bus='16' slot='0x00' function='0x1'/>
327 </controller>
328 <controller type='pci' index='43' model='pcie-root-port'>
329 <address type='pci' bus='16' slot='0x00' function='0x2'/>
330 </controller>
331 <controller type='pci' index='44' model='pcie-root-port'>
332 <address type='pci' bus='16' slot='0x00' function='0x3'/>
333 </controller>
334 <controller type='pci' index='45' model='pcie-root-port'>
335 <address type='pci' bus='16' slot='0x00' function='0x4'/>
336 </controller>
337
338 <!-- 4 port PCIe switch on bus 41 (index of upstream root port) / func 0 -->
339 <controller type='pci' index='46' model='pcie-switch-upstream-port'>
340 <address type='pci' bus='41' slot='0x00' function='0x0'/>
341 </controller>
342 <controller type='pci' index='47' model='pcie-switch-downstream-port'>
343 <address type='pci' bus='46' slot='0x00' function='0x0'/>
344 </controller>
345 <controller type='pci' index='48' model='pcie-switch-downstream-port'>
346 <address type='pci' bus='46' slot='0x01' function='0x0'/>
347 </controller>
348 <controller type='pci' index='49' model='pcie-switch-downstream-port'>
349 <address type='pci' bus='46' slot='0x02' function='0x0'/>
350 </controller>
351 <controller type='pci' index='50' model='pcie-switch-downstream-port'>
352 <address type='pci' bus='46' slot='0x03' function='0x0'/>
353 </controller>
354
355 <!-- 4 port PCIe switch on bus 42 (index of upstream root port) / func 1 -->
356 <controller type='pci' index='51' model='pcie-switch-upstream-port'>
357 <address type='pci' bus='42' slot='0x00' function='0x0'/>
358 </controller>
359 <controller type='pci' index='52' model='pcie-switch-downstream-port'>
360 <address type='pci' bus='51' slot='0x00' function='0x0'/>
361 </controller>
362 <controller type='pci' index='53' model='pcie-switch-downstream-port'>
363 <address type='pci' bus='51' slot='0x01' function='0x0'/>
364 </controller>
365 <controller type='pci' index='54' model='pcie-switch-downstream-port'>
366 <address type='pci' bus='51' slot='0x02' function='0x0'/>
367 </controller>
368 <controller type='pci' index='55' model='pcie-switch-downstream-port'>
369 <address type='pci' bus='51' slot='0x03' function='0x0'/>
370 </controller>
371
372 <!-- 4 port PCIe switch on bus 43 (index of upstream root port) / func 2 -->
373 <controller type='pci' index='56' model='pcie-switch-upstream-port'>
374 <address type='pci' bus='43' slot='0x00' function='0x0'/>
375 </controller>
376 <controller type='pci' index='57' model='pcie-switch-downstream-port'>
377 <address type='pci' bus='56' slot='0x00' function='0x0'/>
378 </controller>
379 <controller type='pci' index='58' model='pcie-switch-downstream-port'>
380 <address type='pci' bus='56' slot='0x01' function='0x0'/>
381 </controller>
382 <controller type='pci' index='59' model='pcie-switch-downstream-port'>
383 <address type='pci' bus='56' slot='0x02' function='0x0'/>
384 </controller>
385 <controller type='pci' index='60' model='pcie-switch-downstream-port'>
386 <address type='pci' bus='56' slot='0x03' function='0x0'/>
387 </controller>
388
389 <!-- 4 port PCIe switch on bus 44 (index of upstream root port) / func 3 -->
390 <controller type='pci' index='61' model='pcie-switch-upstream-port'>
391 <address type='pci' bus='44' slot='0x00' function='0x0'/>
392 </controller>
393 <controller type='pci' index='62' model='pcie-switch-downstream-port'>
394 <address type='pci' bus='61' slot='0x00' function='0x0'/>
395 </controller>
396 <controller type='pci' index='63' model='pcie-switch-downstream-port'>
397 <address type='pci' bus='61' slot='0x01' function='0x0'/>
398 </controller>
399 <controller type='pci' index='64' model='pcie-switch-downstream-port'>
400 <address type='pci' bus='61' slot='0x02' function='0x0'/>
401 </controller>
402 <controller type='pci' index='65' model='pcie-switch-downstream-port'>
403 <address type='pci' bus='61' slot='0x03' function='0x0'/>
404 </controller>
405
406 <!-- 4 port PCIe switch on bus 45 (index of upstream root port) / func 4 -->
407 <controller type='pci' index='66' model='pcie-switch-upstream-port'>
408 <address type='pci' bus='45' slot='0x00' function='0x0'/>
409 </controller>
410 <controller type='pci' index='67' model='pcie-switch-downstream-port'>
411 <address type='pci' bus='66' slot='0x00' function='0x0'/>
412 </controller>
413 <controller type='pci' index='68' model='pcie-switch-downstream-port'>
414 <address type='pci' bus='66' slot='0x01' function='0x0'/>
415 </controller>
416 <controller type='pci' index='69' model='pcie-switch-downstream-port'>
417 <address type='pci' bus='66' slot='0x02' function='0x0'/>
418 </controller>
419 <controller type='pci' index='70' model='pcie-switch-downstream-port'>
420 <address type='pci' bus='66' slot='0x03' function='0x0'/>
421 </controller>
422
423 <controller type='sata' index='0'>
424 <address type='pci' domain='0x0000' bus='0x00' slot='0x1f' function='0x2'/>
425 </controller>
426 <controller type='virtio-serial' index='0'>
427 <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
428 </controller>
429 <interface type='bridge'>
430 <mac address='52:54:00:e2:a3:d9'/>
431 <source bridge='br0'/>
432 <model type='virtio'/>
433 <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
434 </interface>
435 <serial type='pty'>
436 <target type='isa-serial' port='0'>
437 <model name='isa-serial'/>
438 </target>
439 </serial>
440 <console type='pty'>
441 <target type='serial' port='0'/>
442 </console>
443 <channel type='unix'>
444 <target type='virtio' name='org.qemu.guest_agent.0'/>
445 <address type='virtio-serial' controller='0' bus='0' port='1'/>
446 </channel>
447 <input type='tablet' bus='usb'>
448 <address type='usb' bus='0' port='1'/>
449 </input>
450 <input type='keyboard' bus='usb'>
451 <address type='usb' bus='0' port='2'/>
452 </input>
453 <input type='mouse' bus='ps2'/>
454 <input type='keyboard' bus='ps2'/>
455 <graphics type='spice' autoport='yes'>
456 <listen type='address'/>
457 <image compression='off'/>
458 </graphics>
459 <audio id='1' type='spice'/>
460 <video>
461 <model type='qxl' ram='65536' vram='65536' vgamem='16384' heads='1' primary='yes'/>
462 <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0'/>
463 </video>
464
465 <!-- GPUs, NICs, NVMe on socket 0 -->
466
467 <hostdev mode='subsystem' type='pci' managed='yes'>
468 <driver name='vfio'/>
469 <source>
470 <address domain='0x0000' bus='0x18' slot='0x00' function='0x0'/>
471 </source>
472 <address type='pci' domain='0x0000' bus='22' slot='0x00' function='0x0'/>
473 </hostdev>
474 <hostdev mode='subsystem' type='pci' managed='yes'>
475 <driver name='vfio'/>
476 <source>
477 <address domain='0x0000' bus='0x19' slot='0x00' function='0x0'/>
478 </source>
479 <address type='pci' domain='0x0000' bus='23' slot='0x00' function='0x0'/>
480 </hostdev>
481 <hostdev mode='subsystem' type='pci' managed='yes'>
482 <source>
483 <address domain='0x0000' bus='0x1a' slot='0x00' function='0x0'/>
484 </source>
485 <address type='pci' domain='0x0000' bus='24' slot='0x00' function='0x0'/>
486 </hostdev>
487 <hostdev mode='subsystem' type='pci' managed='yes'>
488 <driver name='vfio'/>
489 <source>
490 <address domain='0x0000' bus='0x3b' slot='0x00' function='0x0'/>
491 </source>
492 <address type='pci' domain='0x0000' bus='27' slot='0x00' function='0x0'/>
493 </hostdev>
494 <hostdev mode='subsystem' type='pci' managed='yes'>
495 <source>
496 <address domain='0x0000' bus='0x3c' slot='0x00' function='0x0'/>
497 </source>
498 <address type='pci' domain='0x0000' bus='28' slot='0x00' function='0x0'/>
499 </hostdev>
500 <hostdev mode='subsystem' type='pci' managed='yes'>
501 <driver name='vfio'/>
502 <source>
503 <address domain='0x0000' bus='0x4c' slot='0x00' function='0x0'/>
504 </source>
505 <address type='pci' domain='0x0000' bus='32' slot='0x00' function='0x0'/>
506 </hostdev>
507 <hostdev mode='subsystem' type='pci' managed='yes'>
508 <source>
509 <address domain='0x0000' bus='0x4d' slot='0x00' function='0x0'/>
510 </source>
511 <address type='pci' domain='0x0000' bus='33' slot='0x00' function='0x0'/>
512 </hostdev>
513 <hostdev mode='subsystem' type='pci' managed='yes'>
514 <driver name='vfio'/>
515 <source>
516 <address domain='0x0000' bus='0x5d' slot='0x00' function='0x0'/>
517 </source>
518 <address type='pci' domain='0x0000' bus='37' slot='0x00' function='0x0'/>
519 </hostdev>
520 <hostdev mode='subsystem' type='pci' managed='yes'>
521 <source>
522 <address domain='0x0000' bus='0x5e' slot='0x00' function='0x0'/>
523 </source>
524 <address type='pci' domain='0x0000' bus='38' slot='0x00' function='0x0'/>
525 </hostdev>
526
527 <!-- GPUs, NICs on socket 1 -->
528
529 <hostdev mode='subsystem' type='pci' managed='yes'>
530 <driver name='vfio'/>
531 <source>
532 <address domain='0x0000' bus='0x9b' slot='0x00' function='0x0'/>
533 </source>
534 <address type='pci' domain='0x0000' bus='52' slot='0x00' function='0x0'/>
535 </hostdev>
536 <hostdev mode='subsystem' type='pci' managed='yes'>
537 <source>
538 <address domain='0x0000' bus='0x9c' slot='0x00' function='0x0'/>
539 </source>
540 <address type='pci' domain='0x0000' bus='53' slot='0x00' function='0x0'/>
541 </hostdev>
542 <hostdev mode='subsystem' type='pci' managed='yes'>
543 <driver name='vfio'/>
544 <source>
545 <address domain='0x0000' bus='0xbb' slot='0x00' function='0x0'/>
546 </source>
547 <address type='pci' domain='0x0000' bus='57' slot='0x00' function='0x0'/>
548 </hostdev>
549 <hostdev mode='subsystem' type='pci' managed='yes'>
550 <source>
551 <address domain='0x0000' bus='0xbc' slot='0x00' function='0x0'/>
552 </source>
553 <address type='pci' domain='0x0000' bus='58' slot='0x00' function='0x0'/>
554 </hostdev>
555 <hostdev mode='subsystem' type='pci' managed='yes'>
556 <driver name='vfio'/>
557 <source>
558 <address domain='0x0000' bus='0xcb' slot='0x00' function='0x0'/>
559 </source>
560 <address type='pci' domain='0x0000' bus='62' slot='0x00' function='0x0'/>
561 </hostdev>
562 <hostdev mode='subsystem' type='pci' managed='yes'>
563 <source>
564 <address domain='0x0000' bus='0xcc' slot='0x00' function='0x0'/>
565 </source>
566 <address type='pci' domain='0x0000' bus='63' slot='0x00' function='0x0'/>
567 </hostdev>
568 <hostdev mode='subsystem' type='pci' managed='yes'>
569 <driver name='vfio'/>
570 <source>
571 <address domain='0x0000' bus='0xdb' slot='0x00' function='0x0'/>
572 </source>
573 <address type='pci' domain='0x0000' bus='67' slot='0x00' function='0x0'/>
574 </hostdev>
575 <hostdev mode='subsystem' type='pci' managed='yes'>
576 <source>
577 <address domain='0x0000' bus='0xdc' slot='0x00' function='0x0'/>
578 </source>
579 <address type='pci' domain='0x0000' bus='68' slot='0x00' function='0x0'/>
580 </hostdev>
581
582 <!-- NVswitches socket 1 -->
583
584 <hostdev mode='subsystem' type='pci' managed='yes'>
585 <source>
586 <address domain='0x0000' bus='0x83' slot='0x00' function='0x0'/>
587 </source>
588 <address type='pci' domain='0x0000' bus='47' slot='0x00' function='0x0'/>
589 </hostdev>
590 <hostdev mode='subsystem' type='pci' managed='yes'>
591 <source>
592 <address domain='0x0000' bus='0x84' slot='0x00' function='0x0'/>
593 </source>
594 <address type='pci' domain='0x0000' bus='48' slot='0x00' function='0x0'/>
595 </hostdev>
596 <hostdev mode='subsystem' type='pci' managed='yes'>
597 <source>
598 <address domain='0x0000' bus='0x85' slot='0x00' function='0x0'/>
599 </source>
600 <address type='pci' domain='0x0000' bus='49' slot='0x00' function='0x0'/>
601 </hostdev>
602 <hostdev mode='subsystem' type='pci' managed='yes'>
603 <source>
604 <address domain='0x0000' bus='0x86' slot='0x00' function='0x0'/>
605 </source>
606 <address type='pci' domain='0x0000' bus='50' slot='0x00' function='0x0'/>
607 </hostdev>
608
609 <memballoon model='virtio'>
610 <address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x0'/>
611 </memballoon>
612 <rng model='virtio'>
613 <backend model='random'>/dev/urandom</backend>
614 <address type='pci' domain='0x0000' bus='0x06' slot='0x00' function='0x0'/>
615 </rng>
616</devices>
617<seclabel type='dynamic' model='apparmor' relabel='yes'/>
618<qemu:commandline>
619 <qemu:arg value='-fw_cfg'/>
620 <qemu:arg value='opt/ovmf/X-PciMmio64Mb,string=4718592'/>
621</qemu:commandline>
622</domain>
NCCL#
NCCL (NVIDIA Collective Communication Library) is a communication layer widely used in distributed AI/ML training. It provides efficient, multi-GPU and multi-node collective operations (like all-reduce, all-gather, reduce-broadcast) that form the basis for training algorithms for large-scale deep learning models. NCCL offers a library of routines optimized for GPU-to-GPU communication and abstracts complexity from AI frameworks (e.g. TensorFlow, PyTorch). These routines take advantage of the underlying topology to minimize latency and maximize bandwidth utilization [2].
Topological awareness is a prerequisite for running NCCL. If the hypervisor obscures the physical hardware topology within a VM, NCCL will create suboptimal communication paths. This can introduce additional overhead and reduce overall effective bandwidth. For multi-node HGX server systems designed to handle distributed training or foundation model training workloads, sub-optimal NCCL performance directly translates to wasted GPU cycles, longer training durations, and reduced overall efficiency.
It is imperative that the VM NUMA and PCIe hierarchy mirrors the underlying physical topology so that NCCL can detect and utilize the correct GPU, NIC, and NUMA relationships. This ensures that the overall performance of NCCL-based applications in a virtualized environment approaches bare metal.