Appendix#

ATS and ACS configuration#

To achieve maximum GPUDirect RDMA performance in a VM, PCIe Address Translation Services (ATS) must be enabled on the NIC, and PCIe Access Control Services (ACS) settings on PCIe switches and root ports must be configured to allow ATS to work.

Enabling ATS on NVIDIA NICs#

To enable ATS, first make sure the Mellanox Support Tool Kit (MST) is running with sudo mst start, then use the following commands, one per NIC, to check and/or set the ATS status:

To check the current state:

1# mlxconfig -d /dev/mst/mt4129_pciconf0 q | grep ATS_ENABLED
2        ATS_ENABLED                                 False(0)

In the example above, the first NIC (pciconf0) has ATS disabled, which is the default setting. Each NIC should be checked individually.

To enable it on a single NIC, use the following command:

# mlxconfig -d /dev/mst/mt4123_pciconf0 set ATS_ENABLED=true

After running the command above for all NICs, a host reboot is necessary. Keep in mind that if the NIC is already in passthrough mode, rebooting the VM alone will not suffice to complete this change; the host must be rebooted in order to fully reinitialise the networking card.

When ATS is enabled on the NIC, the Address Translation Service capability should show Enable``+ in ``lspci output:

11a:00.0 Infiniband controller: Mellanox Technologies MT2910 Family [ConnectX-7]
2        Subsystem: Mellanox Technologies MT2910 Family [ConnectX-7]
3...
4        Capabilities: [480 v1] Address Translation Service (ATS)
5                ATSCap: Invalidate Queue Depth: 00
6                ATSCtl: Enable+, Smallest Translation Unit: 00
7        Kernel driver in use: mlx5_core
8        Kernel modules: mlx5_core

Alternately, check the status on all NVIDIA NICs in the system using the following command:

sudo lspci -d 15b3: -vvv | grep ATSCtl
1user@hgx1:~$ sudo lspci -d 15b3: -vvv | grep ATSCtl
2                ATSCtl: Enable+, Smallest Translation Unit: 00
3                ATSCtl: Enable+, Smallest Translation Unit: 00
4                ATSCtl: Enable+, Smallest Translation Unit: 00
5                ATSCtl: Enable+, Smallest Translation Unit: 00
6                ATSCtl: Enable+, Smallest Translation Unit: 00
7                ATSCtl: Enable+, Smallest Translation Unit: 00
8                ATSCtl: Enable+, Smallest Translation Unit: 00
9                ATSCtl: Enable+, Smallest Translation Unit: 00

Configuring ACS#

The following ACS settings should be configured on PCIe root ports and switch downstream ports to allow use of ATS by NICs:

  • SrcValid (Source Validation) - enabled

  • TransBlk (Translation Blocking) - disabled

  • ReqRedir (P2P Request Redirection) - enabled

  • CmpltRedir (P2P Completion Redirection) - enabled

  • UpstreamFwd (Upstream Forwarding) - enabled

  • EgressCtrl (Egress Control) - disabled

  • DirectTrans (Direct Translated P2P) - enabled

These settings are exposed in the ACS Capability:

sudo lspci -vvv | less
1...
217:00.0 PCI bridge: Broadcom / LSI PEX890xx PCIe Gen 5 Switch (rev b0) (prog-if 00 [Normal decode])
3...
4        Capabilities: [170 v1] Access Control Services
5        ACSCap: SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+ UpstreamFwd+ EgressCtrl- DirectTrans+
6        ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
7        Capabilities: [1f0 v1] Advanced Error Reporting
8...

These ACS settings are applied on the virtualization host, not in guest VMs. The following script should be run each time the system is powered on or rebooted, as these settings are not retained across resets / reboots:

 1#!/bin/bash
 2#
 3# Copyright (c) 2020, NVIDIA Corporation.
 4#
 5# Permission is hereby granted, free of charge, to any person obtaining a copy of this
 6# software and associated documentation files (the “Software”), to deal in the Software
 7# without restriction, including without limitation the rights to use, copy, modify,
 8# merge, publish, distribute, sublicense, and/or sell copies of the Software, and to
 9# permit persons to whom the Software is furnished to do so, subject to the following
10# conditions:
11#
12# The above copyright notice and this permission notice shall be included in all
13# copies or substantial portions of the Software.
14#
15# THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED,
16# INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A
17# PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
18# HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
19# OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
20# SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
21#
22#
23# Enable P2P specific ACS bits on every device that supports it
24
25PLATFORM=$(dmidecode --string system-product-name)
26logger "PLATFORM=${PLATFORM}"
27
28# must be root to access extended PCI config space
29if [ "$EUID" -ne 0 ]; then
30echo "ERROR: $0 must be run as root"
31exit 1
32fi
33
34for BDF in `lspci -d "*:*:*" | awk '{print $1}'`; do
35
36    # skip if it doesn't support ACS
37    setpci -v -s ${BDF} ECAP_ACS+0x6.w > /dev/null 2>&1
38    if [ $? -ne 0 ]; then
39        # echo "${BDF} does not support ACS, skipping"
40        continue
41    fi
42
43    logger "Enabling ACS on `lspci -s ${BDF}`"q
44    setpci -v -s ${BDF} ECAP_ACS+0x6.w
45    setpci -v -s ${BDF} ECAP_ACS+0x6.w=0x5D
46    setpci -v -s ${BDF} ECAP_ACS+0x6.w
47    if [ $? -ne 0 ]; then
48        logger "Error enabling ACS on ${BDF}"
49        continue
50    fi
51    NEW_VAL=`setpci -v -s ${BDF} ECAP_ACS+0x6.w | awk '{print $NF}'`
52    if [ "${NEW_VAL}" != "0x5D" ]; then
53        logger "Failed to Enable ACS on ${BDF}"
54        continue
55    fi
56done
57exit 0

Example physical topology map for HGX H200 8-GPU platform#

Output from lstopo -sv command run on the vGPU host:

  1Machine (2015GB total)
  2Package L#0
  3    NUMANode L#0 (P#0 1007GB)
  4    L3 L#0 (105MB)
  5    L2 L#0 (2048KB) + L1d L#0 (48KB) + L1i L#0 (32KB) + Core L#0 + PU L#0 (P#0)
  6    L2 L#1 (2048KB) + L1d L#1 (48KB) + L1i L#1 (32KB) + Core L#1 + PU L#1 (P#2)
  7    L2 L#2 (2048KB) + L1d L#2 (48KB) + L1i L#2 (32KB) + Core L#2 + PU L#2 (P#4)
  8    L2 L#3 (2048KB) + L1d L#3 (48KB) + L1i L#3 (32KB) + Core L#3 + PU L#3 (P#6)
  9    L2 L#4 (2048KB) + L1d L#4 (48KB) + L1i L#4 (32KB) + Core L#4 + PU L#4 (P#8)
 10    L2 L#5 (2048KB) + L1d L#5 (48KB) + L1i L#5 (32KB) + Core L#5 + PU L#5 (P#10)
 11    L2 L#6 (2048KB) + L1d L#6 (48KB) + L1i L#6 (32KB) + Core L#6 + PU L#6 (P#12)
 12    L2 L#7 (2048KB) + L1d L#7 (48KB) + L1i L#7 (32KB) + Core L#7 + PU L#7 (P#14)
 13    L2 L#8 (2048KB) + L1d L#8 (48KB) + L1i L#8 (32KB) + Core L#8 + PU L#8 (P#16)
 14    L2 L#9 (2048KB) + L1d L#9 (48KB) + L1i L#9 (32KB) + Core L#9 + PU L#9 (P#18)
 15    L2 L#10 (2048KB) + L1d L#10 (48KB) + L1i L#10 (32KB) + Core L#10 + PU L#10 (P#20)
 16    L2 L#11 (2048KB) + L1d L#11 (48KB) + L1i L#11 (32KB) + Core L#11 + PU L#11 (P#22)
 17    L2 L#12 (2048KB) + L1d L#12 (48KB) + L1i L#12 (32KB) + Core L#12 + PU L#12 (P#24)
 18    L2 L#13 (2048KB) + L1d L#13 (48KB) + L1i L#13 (32KB) + Core L#13 + PU L#13 (P#26)
 19    L2 L#14 (2048KB) + L1d L#14 (48KB) + L1i L#14 (32KB) + Core L#14 + PU L#14 (P#28)
 20    L2 L#15 (2048KB) + L1d L#15 (48KB) + L1i L#15 (32KB) + Core L#15 + PU L#15 (P#30)
 21    L2 L#16 (2048KB) + L1d L#16 (48KB) + L1i L#16 (32KB) + Core L#16 + PU L#16 (P#32)
 22    L2 L#17 (2048KB) + L1d L#17 (48KB) + L1i L#17 (32KB) + Core L#17 + PU L#17 (P#34)
 23    L2 L#18 (2048KB) + L1d L#18 (48KB) + L1i L#18 (32KB) + Core L#18 + PU L#18 (P#36)
 24    L2 L#19 (2048KB) + L1d L#19 (48KB) + L1i L#19 (32KB) + Core L#19 + PU L#19 (P#38)
 25    L2 L#20 (2048KB) + L1d L#20 (48KB) + L1i L#20 (32KB) + Core L#20 + PU L#20 (P#40)
 26    L2 L#21 (2048KB) + L1d L#21 (48KB) + L1i L#21 (32KB) + Core L#21 + PU L#21 (P#42)
 27    L2 L#22 (2048KB) + L1d L#22 (48KB) + L1i L#22 (32KB) + Core L#22 + PU L#22 (P#44)
 28    L2 L#23 (2048KB) + L1d L#23 (48KB) + L1i L#23 (32KB) + Core L#23 + PU L#23 (P#46)
 29    L2 L#24 (2048KB) + L1d L#24 (48KB) + L1i L#24 (32KB) + Core L#24 + PU L#24 (P#48)
 30    L2 L#25 (2048KB) + L1d L#25 (48KB) + L1i L#25 (32KB) + Core L#25 + PU L#25 (P#50)
 31    L2 L#26 (2048KB) + L1d L#26 (48KB) + L1i L#26 (32KB) + Core L#26 + PU L#26 (P#52)
 32    L2 L#27 (2048KB) + L1d L#27 (48KB) + L1i L#27 (32KB) + Core L#27 + PU L#27 (P#54)
 33    L2 L#28 (2048KB) + L1d L#28 (48KB) + L1i L#28 (32KB) + Core L#28 + PU L#28 (P#56)
 34    L2 L#29 (2048KB) + L1d L#29 (48KB) + L1i L#29 (32KB) + Core L#29 + PU L#29 (P#58)
 35    L2 L#30 (2048KB) + L1d L#30 (48KB) + L1i L#30 (32KB) + Core L#30 + PU L#30 (P#60)
 36    L2 L#31 (2048KB) + L1d L#31 (48KB) + L1i L#31 (32KB) + Core L#31 + PU L#31 (P#62)
 37    L2 L#32 (2048KB) + L1d L#32 (48KB) + L1i L#32 (32KB) + Core L#32 + PU L#32 (P#64)
 38    L2 L#33 (2048KB) + L1d L#33 (48KB) + L1i L#33 (32KB) + Core L#33 + PU L#33 (P#66)
 39    L2 L#34 (2048KB) + L1d L#34 (48KB) + L1i L#34 (32KB) + Core L#34 + PU L#34 (P#68)
 40    L2 L#35 (2048KB) + L1d L#35 (48KB) + L1i L#35 (32KB) + Core L#35 + PU L#35 (P#70)
 41    L2 L#36 (2048KB) + L1d L#36 (48KB) + L1i L#36 (32KB) + Core L#36 + PU L#36 (P#72)
 42    L2 L#37 (2048KB) + L1d L#37 (48KB) + L1i L#37 (32KB) + Core L#37 + PU L#37 (P#74)
 43    L2 L#38 (2048KB) + L1d L#38 (48KB) + L1i L#38 (32KB) + Core L#38 + PU L#38 (P#76)
 44    L2 L#39 (2048KB) + L1d L#39 (48KB) + L1i L#39 (32KB) + Core L#39 + PU L#39 (P#78)
 45    L2 L#40 (2048KB) + L1d L#40 (48KB) + L1i L#40 (32KB) + Core L#40 + PU L#40 (P#80)
 46    L2 L#41 (2048KB) + L1d L#41 (48KB) + L1i L#41 (32KB) + Core L#41 + PU L#41 (P#82)
 47    L2 L#42 (2048KB) + L1d L#42 (48KB) + L1i L#42 (32KB) + Core L#42 + PU L#42 (P#84)
 48    L2 L#43 (2048KB) + L1d L#43 (48KB) + L1i L#43 (32KB) + Core L#43 + PU L#43 (P#86)
 49    L2 L#44 (2048KB) + L1d L#44 (48KB) + L1i L#44 (32KB) + Core L#44 + PU L#44 (P#88)
 50    L2 L#45 (2048KB) + L1d L#45 (48KB) + L1i L#45 (32KB) + Core L#45 + PU L#45 (P#90)
 51    L2 L#46 (2048KB) + L1d L#46 (48KB) + L1i L#46 (32KB) + Core L#46 + PU L#46 (P#92)
 52    L2 L#47 (2048KB) + L1d L#47 (48KB) + L1i L#47 (32KB) + Core L#47 + PU L#47 (P#94)
 53    HostBridge
 54    PCIBridge
 55        PCI 01:00.0 (NVMExp)
 56        Block(Disk) "nvme0c0n1"
 57    PCIBridge
 58        PCI 02:00.0 (Ethernet)
 59        Net "eno8303"
 60        PCI 02:00.1 (Ethernet)
 61        Net "eno8403"
 62    PCIBridge
 63        PCIBridge
 64        PCI 04:00.0 (VGA)
 65    PCI 00:18.0 (SATA)
 66    PCI 00:19.0 (SATA)
 67    HostBridge
 68    PCIBridge
 69        PCIBridge
 70        PCIBridge
 71            PCI 18:00.0 (NVMExp)
 72            Block(Disk) "nvme1n1"
 73        PCIBridge
 74            PCI 19:00.0 (3D)
 75        PCIBridge
 76            PCI 1a:00.0 (InfiniBand)
 77            OpenFabrics "mlx5_0"
 78        PCIBridge
 79            PCI 1c:00.0 (SAS)
 80    HostBridge
 81    PCIBridge
 82        PCIBridge
 83        PCIBridge
 84            PCI 3a:00.0 (NVMExp)
 85            Block(Disk) "nvme2n1"
 86        PCIBridge
 87            PCI 3b:00.0 (3D)
 88        PCIBridge
 89            PCI 3c:00.0 (InfiniBand)
 90            OpenFabrics "mlx5_1"
 91        PCIBridge
 92            PCI 3d:00.0 (SAS)
 93    HostBridge
 94    PCIBridge
 95        PCIBridge
 96        PCIBridge
 97            PCI 4c:00.0 (3D)
 98        PCIBridge
 99            PCI 4d:00.0 (InfiniBand)
100            OpenFabrics "mlx5_2"
101        PCIBridge
102            PCI 4e:00.0 (SAS)
103    HostBridge
104    PCIBridge
105        PCIBridge
106        PCIBridge
107            PCI 5d:00.0 (3D)
108        PCIBridge
109            PCI 5e:00.0 (InfiniBand)
110            OpenFabrics "mlx5_3"
111        PCIBridge
112            PCI 5f:00.0 (SAS)
113Package L#1
114    NUMANode L#1 (P#1 1008GB)
115    L3 L#1 (105MB)
116    L2 L#48 (2048KB) + L1d L#48 (48KB) + L1i L#48 (32KB) + Core L#48 + PU L#48 (P#1)
117    L2 L#49 (2048KB) + L1d L#49 (48KB) + L1i L#49 (32KB) + Core L#49 + PU L#49 (P#3)
118    L2 L#50 (2048KB) + L1d L#50 (48KB) + L1i L#50 (32KB) + Core L#50 + PU L#50 (P#5)
119    L2 L#51 (2048KB) + L1d L#51 (48KB) + L1i L#51 (32KB) + Core L#51 + PU L#51 (P#7)
120    L2 L#52 (2048KB) + L1d L#52 (48KB) + L1i L#52 (32KB) + Core L#52 + PU L#52 (P#9)
121    L2 L#53 (2048KB) + L1d L#53 (48KB) + L1i L#53 (32KB) + Core L#53 + PU L#53 (P#11)
122    L2 L#54 (2048KB) + L1d L#54 (48KB) + L1i L#54 (32KB) + Core L#54 + PU L#54 (P#13)
123    L2 L#55 (2048KB) + L1d L#55 (48KB) + L1i L#55 (32KB) + Core L#55 + PU L#55 (P#15)
124    L2 L#56 (2048KB) + L1d L#56 (48KB) + L1i L#56 (32KB) + Core L#56 + PU L#56 (P#17)
125    L2 L#57 (2048KB) + L1d L#57 (48KB) + L1i L#57 (32KB) + Core L#57 + PU L#57 (P#19)
126    L2 L#58 (2048KB) + L1d L#58 (48KB) + L1i L#58 (32KB) + Core L#58 + PU L#58 (P#21)
127    L2 L#59 (2048KB) + L1d L#59 (48KB) + L1i L#59 (32KB) + Core L#59 + PU L#59 (P#23)
128    L2 L#60 (2048KB) + L1d L#60 (48KB) + L1i L#60 (32KB) + Core L#60 + PU L#60 (P#25)
129    L2 L#61 (2048KB) + L1d L#61 (48KB) + L1i L#61 (32KB) + Core L#61 + PU L#61 (P#27)
130    L2 L#62 (2048KB) + L1d L#62 (48KB) + L1i L#62 (32KB) + Core L#62 + PU L#62 (P#29)
131    L2 L#63 (2048KB) + L1d L#63 (48KB) + L1i L#63 (32KB) + Core L#63 + PU L#63 (P#31)
132    L2 L#64 (2048KB) + L1d L#64 (48KB) + L1i L#64 (32KB) + Core L#64 + PU L#64 (P#33)
133    L2 L#65 (2048KB) + L1d L#65 (48KB) + L1i L#65 (32KB) + Core L#65 + PU L#65 (P#35)
134    L2 L#66 (2048KB) + L1d L#66 (48KB) + L1i L#66 (32KB) + Core L#66 + PU L#66 (P#37)
135    L2 L#67 (2048KB) + L1d L#67 (48KB) + L1i L#67 (32KB) + Core L#67 + PU L#67 (P#39)
136    L2 L#68 (2048KB) + L1d L#68 (48KB) + L1i L#68 (32KB) + Core L#68 + PU L#68 (P#41)
137    L2 L#69 (2048KB) + L1d L#69 (48KB) + L1i L#69 (32KB) + Core L#69 + PU L#69 (P#43)
138    L2 L#70 (2048KB) + L1d L#70 (48KB) + L1i L#70 (32KB) + Core L#70 + PU L#70 (P#45)
139    L2 L#71 (2048KB) + L1d L#71 (48KB) + L1i L#71 (32KB) + Core L#71 + PU L#71 (P#47)
140    L2 L#72 (2048KB) + L1d L#72 (48KB) + L1i L#72 (32KB) + Core L#72 + PU L#72 (P#49)
141    L2 L#73 (2048KB) + L1d L#73 (48KB) + L1i L#73 (32KB) + Core L#73 + PU L#73 (P#51)
142    L2 L#74 (2048KB) + L1d L#74 (48KB) + L1i L#74 (32KB) + Core L#74 + PU L#74 (P#53)
143    L2 L#75 (2048KB) + L1d L#75 (48KB) + L1i L#75 (32KB) + Core L#75 + PU L#75 (P#55)
144    L2 L#76 (2048KB) + L1d L#76 (48KB) + L1i L#76 (32KB) + Core L#76 + PU L#76 (P#57)
145    L2 L#77 (2048KB) + L1d L#77 (48KB) + L1i L#77 (32KB) + Core L#77 + PU L#77 (P#59)
146    L2 L#78 (2048KB) + L1d L#78 (48KB) + L1i L#78 (32KB) + Core L#78 + PU L#78 (P#61)
147    L2 L#79 (2048KB) + L1d L#79 (48KB) + L1i L#79 (32KB) + Core L#79 + PU L#79 (P#63)
148    L2 L#80 (2048KB) + L1d L#80 (48KB) + L1i L#80 (32KB) + Core L#80 + PU L#80 (P#65)
149    L2 L#81 (2048KB) + L1d L#81 (48KB) + L1i L#81 (32KB) + Core L#81 + PU L#81 (P#67)
150    L2 L#82 (2048KB) + L1d L#82 (48KB) + L1i L#82 (32KB) + Core L#82 + PU L#82 (P#69)
151    L2 L#83 (2048KB) + L1d L#83 (48KB) + L1i L#83 (32KB) + Core L#83 + PU L#83 (P#71)
152    L2 L#84 (2048KB) + L1d L#84 (48KB) + L1i L#84 (32KB) + Core L#84 + PU L#84 (P#73)
153    L2 L#85 (2048KB) + L1d L#85 (48KB) + L1i L#85 (32KB) + Core L#85 + PU L#85 (P#75)
154    L2 L#86 (2048KB) + L1d L#86 (48KB) + L1i L#86 (32KB) + Core L#86 + PU L#86 (P#77)
155    L2 L#87 (2048KB) + L1d L#87 (48KB) + L1i L#87 (32KB) + Core L#87 + PU L#87 (P#79)
156    L2 L#88 (2048KB) + L1d L#88 (48KB) + L1i L#88 (32KB) + Core L#88 + PU L#88 (P#81)
157    L2 L#89 (2048KB) + L1d L#89 (48KB) + L1i L#89 (32KB) + Core L#89 + PU L#89 (P#83)
158    L2 L#90 (2048KB) + L1d L#90 (48KB) + L1i L#90 (32KB) + Core L#90 + PU L#90 (P#85)
159    L2 L#91 (2048KB) + L1d L#91 (48KB) + L1i L#91 (32KB) + Core L#91 + PU L#91 (P#87)
160    L2 L#92 (2048KB) + L1d L#92 (48KB) + L1i L#92 (32KB) + Core L#92 + PU L#92 (P#89)
161    L2 L#93 (2048KB) + L1d L#93 (48KB) + L1i L#93 (32KB) + Core L#93 + PU L#93 (P#91)
162    L2 L#94 (2048KB) + L1d L#94 (48KB) + L1i L#94 (32KB) + Core L#94 + PU L#94 (P#93)
163    L2 L#95 (2048KB) + L1d L#95 (48KB) + L1i L#95 (32KB) + Core L#95 + PU L#95 (P#95)
164    HostBridge
165    PCIBridge
166        PCIBridge
167        PCIBridge
168            PCI 9b:00.0 (3D)
169        PCIBridge
170            PCI 9c:00.0 (InfiniBand)
171            OpenFabrics "mlx5_4"
172        PCIBridge
173            PCI 9e:00.0 (SAS)
174    HostBridge
175    PCIBridge
176        PCIBridge
177        PCIBridge
178            PCI bb:00.0 (3D)
179        PCIBridge
180            PCI bc:00.0 (InfiniBand)
181            OpenFabrics "mlx5_5"
182        PCIBridge
183            PCI bd:00.0 (SAS)
184    HostBridge
185    PCIBridge
186        PCIBridge
187        PCIBridge
188            PCI cb:00.0 (3D)
189        PCIBridge
190            PCI cc:00.0 (InfiniBand)
191            OpenFabrics "mlx5_6"
192        PCIBridge
193            PCI cd:00.0 (SAS)
194    HostBridge
195    PCIBridge
196        PCIBridge
197        PCIBridge
198            PCI db:00.0 (3D)
199        PCIBridge
200            PCI dc:00.0 (InfiniBand)
201            OpenFabrics "mlx5_7"
202        PCIBridge
203            PCI dd:00.0 (SAS)

Example libvirt domain XML for HGX H200 8-GPU platform#

Reference: libvirt: Domain XML format

This sample XML created on Ubuntu 22.04 is for the VM configured on the following physical topology, with the PCIe devices highlighted in green boxes passed through to the VM.

_images/appendix1.png

The VM is configured with 88 vCPUs arranged in two sockets, each with 44 cores. Each socket implements a single NUMA node with approximately 1TB of RAM. The virtual PCIe hierarchy is equivalent to the physical topology, with 4 PCIe switches on CPU socket 0 and 5 PCIe switches on CPU socket 1.

  1<domain type='kvm' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'>
  2<name>ubuntu-vm-numa0</name>
  3<uuid>ea260ed6-8344-49fc-952c-ec6c3de1364e</uuid>
  4<metadata>
  5    <libosinfo:libosinfo xmlns:libosinfo="http://libosinfo.org/xmlns/libvirt/domain/1.0">
  6    <libosinfo:os id="http://ubuntu.com/ubuntu/22.04"/>
  7    </libosinfo:libosinfo>
  8</metadata>
  9<memory unit='KiB'>1887436800</memory>
 10<currentMemory unit='KiB'>1887436800</currentMemory>
 11<vcpu placement='static'>88</vcpu>
 12<cputune>
 13    <vcpupin vcpu='0' cpuset='4'/>
 14    <vcpupin vcpu='1' cpuset='6'/>
 15    <vcpupin vcpu='2' cpuset='8'/>
 16    <vcpupin vcpu='3' cpuset='10'/>
 17    <vcpupin vcpu='4' cpuset='12'/>
 18    <vcpupin vcpu='5' cpuset='14'/>
 19    <vcpupin vcpu='6' cpuset='16'/>
 20    <vcpupin vcpu='7' cpuset='18'/>
 21    <vcpupin vcpu='8' cpuset='20'/>
 22    <vcpupin vcpu='9' cpuset='22'/>
 23    <vcpupin vcpu='10' cpuset='24'/>
 24    <vcpupin vcpu='11' cpuset='26'/>
 25    <vcpupin vcpu='12' cpuset='28'/>
 26    <vcpupin vcpu='13' cpuset='30'/>
 27    <vcpupin vcpu='14' cpuset='32'/>
 28    <vcpupin vcpu='15' cpuset='34'/>
 29    <vcpupin vcpu='16' cpuset='36'/>
 30    <vcpupin vcpu='17' cpuset='38'/>
 31    <vcpupin vcpu='18' cpuset='40'/>
 32    <vcpupin vcpu='19' cpuset='42'/>
 33    <vcpupin vcpu='20' cpuset='44'/>
 34    <vcpupin vcpu='21' cpuset='46'/>
 35    <vcpupin vcpu='22' cpuset='48'/>
 36    <vcpupin vcpu='23' cpuset='50'/>
 37    <vcpupin vcpu='24' cpuset='52'/>
 38    <vcpupin vcpu='25' cpuset='54'/>
 39    <vcpupin vcpu='26' cpuset='56'/>
 40    <vcpupin vcpu='27' cpuset='58'/>
 41    <vcpupin vcpu='28' cpuset='60'/>
 42    <vcpupin vcpu='29' cpuset='62'/>
 43    <vcpupin vcpu='30' cpuset='64'/>
 44    <vcpupin vcpu='31' cpuset='66'/>
 45    <vcpupin vcpu='32' cpuset='68'/>
 46    <vcpupin vcpu='33' cpuset='70'/>
 47    <vcpupin vcpu='34' cpuset='72'/>
 48    <vcpupin vcpu='35' cpuset='74'/>
 49    <vcpupin vcpu='36' cpuset='76'/>
 50    <vcpupin vcpu='37' cpuset='78'/>
 51    <vcpupin vcpu='38' cpuset='80'/>
 52    <vcpupin vcpu='39' cpuset='82'/>
 53    <vcpupin vcpu='40' cpuset='84'/>
 54    <vcpupin vcpu='41' cpuset='86'/>
 55    <vcpupin vcpu='42' cpuset='88'/>
 56    <vcpupin vcpu='43' cpuset='90'/>
 57    <vcpupin vcpu='44' cpuset='5'/>
 58    <vcpupin vcpu='45' cpuset='7'/>
 59    <vcpupin vcpu='46' cpuset='9'/>
 60    <vcpupin vcpu='47' cpuset='11'/>
 61    <vcpupin vcpu='48' cpuset='13'/>
 62    <vcpupin vcpu='49' cpuset='15'/>
 63    <vcpupin vcpu='50' cpuset='17'/>
 64    <vcpupin vcpu='51' cpuset='19'/>
 65    <vcpupin vcpu='52' cpuset='21'/>
 66    <vcpupin vcpu='53' cpuset='23'/>
 67    <vcpupin vcpu='54' cpuset='25'/>
 68    <vcpupin vcpu='55' cpuset='27'/>
 69    <vcpupin vcpu='56' cpuset='29'/>
 70    <vcpupin vcpu='57' cpuset='31'/>
 71    <vcpupin vcpu='58' cpuset='33'/>
 72    <vcpupin vcpu='59' cpuset='35'/>
 73    <vcpupin vcpu='60' cpuset='37'/>
 74    <vcpupin vcpu='61' cpuset='39'/>
 75    <vcpupin vcpu='62' cpuset='41'/>
 76    <vcpupin vcpu='63' cpuset='43'/>
 77    <vcpupin vcpu='64' cpuset='45'/>
 78    <vcpupin vcpu='65' cpuset='47'/>
 79    <vcpupin vcpu='66' cpuset='49'/>
 80    <vcpupin vcpu='67' cpuset='51'/>
 81    <vcpupin vcpu='68' cpuset='53'/>
 82    <vcpupin vcpu='69' cpuset='55'/>
 83    <vcpupin vcpu='70' cpuset='57'/>
 84    <vcpupin vcpu='71' cpuset='59'/>
 85    <vcpupin vcpu='72' cpuset='61'/>
 86    <vcpupin vcpu='73' cpuset='63'/>
 87    <vcpupin vcpu='74' cpuset='65'/>
 88    <vcpupin vcpu='75' cpuset='67'/>
 89    <vcpupin vcpu='76' cpuset='69'/>
 90    <vcpupin vcpu='77' cpuset='71'/>
 91    <vcpupin vcpu='78' cpuset='73'/>
 92    <vcpupin vcpu='79' cpuset='75'/>
 93    <vcpupin vcpu='80' cpuset='77'/>
 94    <vcpupin vcpu='81' cpuset='79'/>
 95    <vcpupin vcpu='82' cpuset='81'/>
 96    <vcpupin vcpu='83' cpuset='83'/>
 97    <vcpupin vcpu='84' cpuset='85'/>
 98    <vcpupin vcpu='85' cpuset='87'/>
 99    <vcpupin vcpu='86' cpuset='89'/>
100    <vcpupin vcpu='87' cpuset='91'/>
101</cputune>
102<resource>
103    <partition>/machine</partition>
104</resource>
105<os>
106    <type arch='x86_64' machine='pc-q35-6.2'>hvm</type>
107    <loader readonly='yes' secure='no' type='pflash'>/usr/share/OVMF/OVMF_CODE_4M.ms.fd</loader>
108    <nvram>/var/lib/libvirt/qemu/nvram/ubuntu-vm-numa0_VARS.fd</nvram>
109    <boot dev='hd'/>
110    <smbios mode='host'/>
111</os>
112<features>
113    <acpi/>
114    <apic/>
115</features>
116<cpu mode='host-passthrough' check='none' migratable='on'>
117    <topology sockets='2' dies='1' cores='44' threads='1'/>
118    <numa>
119    <cell id='0' cpus='0-43' memory='943718400' unit='KiB'/>
120    <cell id='1' cpus='44-87' memory='943718400' unit='KiB'/>
121    </numa>
122</cpu>
123<numatune>
124    <memory mode='strict' nodeset='0-1'/>
125</numatune>
126<clock offset='utc'>
127    <timer name='rtc' tickpolicy='catchup'/>
128    <timer name='pit' tickpolicy='delay'/>
129    <timer name='hpet' present='no'/>
130</clock>
131<on_poweroff>destroy</on_poweroff>
132<on_reboot>restart</on_reboot>
133<on_crash>destroy</on_crash>
134<pm>
135    <suspend-to-mem enabled='no'/>
136    <suspend-to-disk enabled='no'/>
137</pm>
138<devices>
139    <emulator>/usr/bin/qemu-system-x86_64</emulator>
140    <disk type='file' device='disk'>
141    <driver name='qemu' type='qcow2' discard='unmap'/>
142    <source file='/var/lib/libvirt/images/ubuntu-vm-numa0.qcow2'/>
143    <target dev='vda' bus='virtio'/>
144    <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
145    </disk>
146    <controller type='usb' index='0' model='qemu-xhci' ports='15'>
147    <address type='pci' domain='0x0000' bus='0x02' slot='0x00' function='0x0'/>
148    </controller>
149    <controller type='pci' index='0' model='pcie-root'/>
150    <controller type='pci' index='1' model='pcie-root-port'>
151    <model name='pcie-root-port'/>
152    <target chassis='1' port='0x10'/>
153    <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0' multifunction='on'/>
154    </controller>
155    <controller type='pci' index='2' model='pcie-root-port'>
156    <model name='pcie-root-port'/>
157    <target chassis='2' port='0x11'/>
158    <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x1'/>
159    </controller>
160    <controller type='pci' index='3' model='pcie-root-port'>
161    <model name='pcie-root-port'/>
162    <target chassis='3' port='0x12'/>
163    <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x2'/>
164    </controller>
165    <controller type='pci' index='4' model='pcie-root-port'>
166    <model name='pcie-root-port'/>
167    <target chassis='4' port='0x13'/>
168    <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x3'/>
169    </controller>
170    <controller type='pci' index='5' model='pcie-root-port'>
171    <model name='pcie-root-port'/>
172    <target chassis='5' port='0x14'/>
173    <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x4'/>
174    </controller>
175    <controller type='pci' index='6' model='pcie-root-port'>
176    <model name='pcie-root-port'/>
177    <target chassis='6' port='0x15'/>
178    <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x5'/>
179    </controller>
180    <controller type='pci' index='7' model='pcie-root-port'>
181    <model name='pcie-root-port'/>
182    <target chassis='7' port='0x16'/>
183    <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x6'/>
184    </controller>
185    <controller type='pci' index='8' model='pcie-root-port'>
186    <model name='pcie-root-port'/>
187    <target chassis='8' port='0x17'/>
188    <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x7'/>
189    </controller>
190    <controller type='pci' index='9' model='pcie-root-port'>
191    <model name='pcie-root-port'/>
192    <target chassis='9' port='0x18'/>
193    <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0' multifunction='on'/>
194    </controller>
195    <controller type='pci' index='10' model='pcie-root-port'>
196    <model name='pcie-root-port'/>
197    <target chassis='10' port='0x19'/>
198    <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x1'/>
199    </controller>
200    <controller type='pci' index='11' model='pcie-root-port'>
201    <model name='pcie-root-port'/>
202    <target chassis='11' port='0x1a'/>
203    <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x2'/>
204    </controller>
205    <controller type='pci' index='12' model='pcie-root-port'>
206    <model name='pcie-root-port'/>
207    <target chassis='12' port='0x1b'/>
208    <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x3'/>
209    </controller>
210    <controller type='pci' index='13' model='pcie-root-port'>
211    <model name='pcie-root-port'/>
212    <target chassis='13' port='0x1c'/>
213    <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x4'/>
214    </controller>
215    <controller type='pci' index='14' model='pcie-root-port'>
216    <model name='pcie-root-port'/>
217    <target chassis='14' port='0x1d'/>
218    <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x5'/>
219    </controller>
220
221    <!-- PCI expander bus NUMA node 0 -->
222    <controller type='pci' index='15' model='pcie-expander-bus'>
223    <target busNr='0x20'>
224        <node>0</node>
225    </target>
226    <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x6'/>
227    </controller>
228
229    <!-- PCI expander bus NUMA node 1 -->
230    <controller type='pci' index='16' model='pcie-expander-bus'>
231    <target busNr='0x40'>
232        <node>1</node>
233    </target>
234    <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x7'/>
235    </controller>
236
237    <!-- 4 root ports on bus 15 (index of upstream expander bus) NUMA node 0 -->
238
239    <controller type='pci' index='17' model='pcie-root-port'>
240    <address type='pci' bus='15' slot='0x00' function='0x0' multifunction='on'/>
241    </controller>
242    <controller type='pci' index='18' model='pcie-root-port'>
243    <address type='pci' bus='15' slot='0x00' function='0x1'/>
244    </controller>
245    <controller type='pci' index='19' model='pcie-root-port'>
246    <address type='pci' bus='15' slot='0x00' function='0x2'/>
247    </controller>
248    <controller type='pci' index='20' model='pcie-root-port'>
249    <address type='pci' bus='15' slot='0x00' function='0x3'/>
250    </controller>
251
252    <!-- 4 port PCIe switch on bus 17 (index of upstream root port) / func 0 -->
253    <controller type='pci' index='21' model='pcie-switch-upstream-port'>
254    <address type='pci' bus='17' slot='0x00' function='0x0'/>
255    </controller>
256    <controller type='pci' index='22' model='pcie-switch-downstream-port'>
257    <address type='pci' bus='21' slot='0x00' function='0x0'/>
258    </controller>
259    <controller type='pci' index='23' model='pcie-switch-downstream-port'>
260    <address type='pci' bus='21' slot='0x01' function='0x0'/>
261    </controller>
262    <controller type='pci' index='24' model='pcie-switch-downstream-port'>
263    <address type='pci' bus='21' slot='0x02' function='0x0'/>
264    </controller>
265    <controller type='pci' index='25' model='pcie-switch-downstream-port'>
266    <address type='pci' bus='21' slot='0x03' function='0x0'/>
267    </controller>
268
269    <!-- 4 port PCIe switch on bus 18 (index of upstream root port) / func 1 -->
270    <controller type='pci' index='26' model='pcie-switch-upstream-port'>
271    <address type='pci' bus='18' slot='0x00' function='0x0'/>
272    </controller>
273    <controller type='pci' index='27' model='pcie-switch-downstream-port'>
274    <address type='pci' bus='26' slot='0x00' function='0x0'/>
275    </controller>
276    <controller type='pci' index='28' model='pcie-switch-downstream-port'>
277    <address type='pci' bus='26' slot='0x01' function='0x0'/>
278    </controller>
279    <controller type='pci' index='29' model='pcie-switch-downstream-port'>
280    <address type='pci' bus='26' slot='0x02' function='0x0'/>
281    </controller>
282    <controller type='pci' index='30' model='pcie-switch-downstream-port'>
283    <address type='pci' bus='26' slot='0x03' function='0x0'/>
284    </controller>
285
286    <!-- 4 port PCIe switch on bus 19 (index of upstream root port) / func 2 -->
287    <controller type='pci' index='31' model='pcie-switch-upstream-port'>
288    <address type='pci' bus='19' slot='0x00' function='0x0'/>
289    </controller>
290    <controller type='pci' index='32' model='pcie-switch-downstream-port'>
291    <address type='pci' bus='31' slot='0x00' function='0x0'/>
292    </controller>
293    <controller type='pci' index='33' model='pcie-switch-downstream-port'>
294    <address type='pci' bus='31' slot='0x01' function='0x0'/>
295    </controller>
296    <controller type='pci' index='34' model='pcie-switch-downstream-port'>
297    <address type='pci' bus='31' slot='0x02' function='0x0'/>
298    </controller>
299    <controller type='pci' index='35' model='pcie-switch-downstream-port'>
300    <address type='pci' bus='31' slot='0x03' function='0x0'/>
301    </controller>
302
303    <!-- 4 port PCIe switch on bus 20 (index of upstream root port) / func 3 -->
304    <controller type='pci' index='36' model='pcie-switch-upstream-port'>
305    <address type='pci' bus='20' slot='0x00' function='0x0'/>
306    </controller>
307    <controller type='pci' index='37' model='pcie-switch-downstream-port'>
308    <address type='pci' bus='36' slot='0x00' function='0x0'/>
309    </controller>
310    <controller type='pci' index='38' model='pcie-switch-downstream-port'>
311    <address type='pci' bus='36' slot='0x01' function='0x0'/>
312    </controller>
313    <controller type='pci' index='39' model='pcie-switch-downstream-port'>
314    <address type='pci' bus='36' slot='0x02' function='0x0'/>
315    </controller>
316    <controller type='pci' index='40' model='pcie-switch-downstream-port'>
317    <address type='pci' bus='36' slot='0x03' function='0x0'/>
318    </controller>
319
320
321    <!-- 5 root ports on bus 16 (index of upstream expander bus) NUMA node 1 -->
322
323    <controller type='pci' index='41' model='pcie-root-port'>
324    <address type='pci' bus='16' slot='0x00' function='0x0' multifunction='on'/>
325    </controller>
326    <controller type='pci' index='42' model='pcie-root-port'>
327    <address type='pci' bus='16' slot='0x00' function='0x1'/>
328    </controller>
329    <controller type='pci' index='43' model='pcie-root-port'>
330    <address type='pci' bus='16' slot='0x00' function='0x2'/>
331    </controller>
332    <controller type='pci' index='44' model='pcie-root-port'>
333    <address type='pci' bus='16' slot='0x00' function='0x3'/>
334    </controller>
335    <controller type='pci' index='45' model='pcie-root-port'>
336    <address type='pci' bus='16' slot='0x00' function='0x4'/>
337    </controller>
338
339    <!-- 4 port PCIe switch on bus 41 (index of upstream root port) / func 0 -->
340    <controller type='pci' index='46' model='pcie-switch-upstream-port'>
341    <address type='pci' bus='41' slot='0x00' function='0x0'/>
342    </controller>
343    <controller type='pci' index='47' model='pcie-switch-downstream-port'>
344    <address type='pci' bus='46' slot='0x00' function='0x0'/>
345    </controller>
346    <controller type='pci' index='48' model='pcie-switch-downstream-port'>
347    <address type='pci' bus='46' slot='0x01' function='0x0'/>
348    </controller>
349    <controller type='pci' index='49' model='pcie-switch-downstream-port'>
350    <address type='pci' bus='46' slot='0x02' function='0x0'/>
351    </controller>
352    <controller type='pci' index='50' model='pcie-switch-downstream-port'>
353    <address type='pci' bus='46' slot='0x03' function='0x0'/>
354    </controller>
355
356    <!-- 4 port PCIe switch on bus 42 (index of upstream root port) / func 1 -->
357    <controller type='pci' index='51' model='pcie-switch-upstream-port'>
358    <address type='pci' bus='42' slot='0x00' function='0x0'/>
359    </controller>
360    <controller type='pci' index='52' model='pcie-switch-downstream-port'>
361    <address type='pci' bus='51' slot='0x00' function='0x0'/>
362    </controller>
363    <controller type='pci' index='53' model='pcie-switch-downstream-port'>
364    <address type='pci' bus='51' slot='0x01' function='0x0'/>
365    </controller>
366    <controller type='pci' index='54' model='pcie-switch-downstream-port'>
367    <address type='pci' bus='51' slot='0x02' function='0x0'/>
368    </controller>
369    <controller type='pci' index='55' model='pcie-switch-downstream-port'>
370    <address type='pci' bus='51' slot='0x03' function='0x0'/>
371    </controller>
372
373    <!-- 4 port PCIe switch on bus 43 (index of upstream root port) / func 2 -->
374    <controller type='pci' index='56' model='pcie-switch-upstream-port'>
375    <address type='pci' bus='43' slot='0x00' function='0x0'/>
376    </controller>
377    <controller type='pci' index='57' model='pcie-switch-downstream-port'>
378    <address type='pci' bus='56' slot='0x00' function='0x0'/>
379    </controller>
380    <controller type='pci' index='58' model='pcie-switch-downstream-port'>
381    <address type='pci' bus='56' slot='0x01' function='0x0'/>
382    </controller>
383    <controller type='pci' index='59' model='pcie-switch-downstream-port'>
384    <address type='pci' bus='56' slot='0x02' function='0x0'/>
385    </controller>
386    <controller type='pci' index='60' model='pcie-switch-downstream-port'>
387    <address type='pci' bus='56' slot='0x03' function='0x0'/>
388    </controller>
389
390    <!-- 4 port PCIe switch on bus 44 (index of upstream root port) / func 3 -->
391    <controller type='pci' index='61' model='pcie-switch-upstream-port'>
392    <address type='pci' bus='44' slot='0x00' function='0x0'/>
393    </controller>
394    <controller type='pci' index='62' model='pcie-switch-downstream-port'>
395    <address type='pci' bus='61' slot='0x00' function='0x0'/>
396    </controller>
397    <controller type='pci' index='63' model='pcie-switch-downstream-port'>
398    <address type='pci' bus='61' slot='0x01' function='0x0'/>
399    </controller>
400    <controller type='pci' index='64' model='pcie-switch-downstream-port'>
401    <address type='pci' bus='61' slot='0x02' function='0x0'/>
402    </controller>
403    <controller type='pci' index='65' model='pcie-switch-downstream-port'>
404    <address type='pci' bus='61' slot='0x03' function='0x0'/>
405    </controller>
406
407    <!-- 4 port PCIe switch on bus 45 (index of upstream root port) / func 4 -->
408    <controller type='pci' index='66' model='pcie-switch-upstream-port'>
409    <address type='pci' bus='45' slot='0x00' function='0x0'/>
410    </controller>
411    <controller type='pci' index='67' model='pcie-switch-downstream-port'>
412    <address type='pci' bus='66' slot='0x00' function='0x0'/>
413    </controller>
414    <controller type='pci' index='68' model='pcie-switch-downstream-port'>
415    <address type='pci' bus='66' slot='0x01' function='0x0'/>
416    </controller>
417    <controller type='pci' index='69' model='pcie-switch-downstream-port'>
418    <address type='pci' bus='66' slot='0x02' function='0x0'/>
419    </controller>
420    <controller type='pci' index='70' model='pcie-switch-downstream-port'>
421    <address type='pci' bus='66' slot='0x03' function='0x0'/>
422    </controller>
423
424
425    <controller type='sata' index='0'>
426    <address type='pci' domain='0x0000' bus='0x00' slot='0x1f' function='0x2'/>
427    </controller>
428    <controller type='virtio-serial' index='0'>
429    <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
430    </controller>
431    <interface type='bridge'>
432    <mac address='52:54:00:e2:a3:d9'/>
433    <source bridge='br0'/>
434    <model type='virtio'/>
435    <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
436    </interface>
437    <serial type='pty'>
438    <target type='isa-serial' port='0'>
439        <model name='isa-serial'/>
440    </target>
441    </serial>
442    <console type='pty'>
443    <target type='serial' port='0'/>
444    </console>
445    <channel type='unix'>
446    <target type='virtio' name='org.qemu.guest_agent.0'/>
447    <address type='virtio-serial' controller='0' bus='0' port='1'/>
448    </channel>
449    <input type='tablet' bus='usb'>
450    <address type='usb' bus='0' port='1'/>
451    </input>
452    <input type='keyboard' bus='usb'>
453    <address type='usb' bus='0' port='2'/>
454    </input>
455    <input type='mouse' bus='ps2'/>
456    <input type='keyboard' bus='ps2'/>
457    <graphics type='spice' autoport='yes'>
458    <listen type='address'/>
459    <image compression='off'/>
460    </graphics>
461    <audio id='1' type='spice'/>
462    <video>
463    <model type='qxl' ram='65536' vram='65536' vgamem='16384' heads='1' primary='yes'/>
464    <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0'/>
465    </video>
466
467    <!-- GPUs, NICs, NVMe on socket 0 -->
468
469    <hostdev mode='subsystem' type='pci' managed='yes'>
470    <driver name='vfio'/>
471    <source>
472        <address domain='0x0000' bus='0x18' slot='0x00' function='0x0'/>
473    </source>
474    <address type='pci' domain='0x0000' bus='22' slot='0x00' function='0x0'/>
475    </hostdev>
476    <hostdev mode='subsystem' type='pci' managed='yes'>
477    <driver name='vfio'/>
478    <source>
479        <address domain='0x0000' bus='0x19' slot='0x00' function='0x0'/>
480    </source>
481    <address type='pci' domain='0x0000' bus='23' slot='0x00' function='0x0'/>
482    </hostdev>
483    <hostdev mode='subsystem' type='pci' managed='yes'>
484    <source>
485        <address domain='0x0000' bus='0x1a' slot='0x00' function='0x0'/>
486    </source>
487    <address type='pci' domain='0x0000' bus='24' slot='0x00' function='0x0'/>
488    </hostdev>
489    <hostdev mode='subsystem' type='pci' managed='yes'>
490    <driver name='vfio'/>
491    <source>
492        <address domain='0x0000' bus='0x3b' slot='0x00' function='0x0'/>
493    </source>
494    <address type='pci' domain='0x0000' bus='27' slot='0x00' function='0x0'/>
495    </hostdev>
496    <hostdev mode='subsystem' type='pci' managed='yes'>
497    <source>
498        <address domain='0x0000' bus='0x3c' slot='0x00' function='0x0'/>
499    </source>
500    <address type='pci' domain='0x0000' bus='28' slot='0x00' function='0x0'/>
501    </hostdev>
502    <hostdev mode='subsystem' type='pci' managed='yes'>
503    <driver name='vfio'/>
504    <source>
505        <address domain='0x0000' bus='0x4c' slot='0x00' function='0x0'/>
506    </source>
507    <address type='pci' domain='0x0000' bus='32' slot='0x00' function='0x0'/>
508    </hostdev>
509    <hostdev mode='subsystem' type='pci' managed='yes'>
510    <source>
511        <address domain='0x0000' bus='0x4d' slot='0x00' function='0x0'/>
512    </source>
513    <address type='pci' domain='0x0000' bus='33' slot='0x00' function='0x0'/>
514    </hostdev>
515    <hostdev mode='subsystem' type='pci' managed='yes'>
516    <driver name='vfio'/>
517    <source>
518        <address domain='0x0000' bus='0x5d' slot='0x00' function='0x0'/>
519    </source>
520    <address type='pci' domain='0x0000' bus='37' slot='0x00' function='0x0'/>
521    </hostdev>
522    <hostdev mode='subsystem' type='pci' managed='yes'>
523    <source>
524        <address domain='0x0000' bus='0x5e' slot='0x00' function='0x0'/>
525    </source>
526    <address type='pci' domain='0x0000' bus='38' slot='0x00' function='0x0'/>
527    </hostdev>
528
529    <!-- GPUs, NICs on socket 1 -->
530
531    <hostdev mode='subsystem' type='pci' managed='yes'>
532    <driver name='vfio'/>
533    <source>
534        <address domain='0x0000' bus='0x9b' slot='0x00' function='0x0'/>
535    </source>
536    <address type='pci' domain='0x0000' bus='52' slot='0x00' function='0x0'/>
537    </hostdev>
538    <hostdev mode='subsystem' type='pci' managed='yes'>
539    <source>
540        <address domain='0x0000' bus='0x9c' slot='0x00' function='0x0'/>
541    </source>
542    <address type='pci' domain='0x0000' bus='53' slot='0x00' function='0x0'/>
543    </hostdev>
544    <hostdev mode='subsystem' type='pci' managed='yes'>
545    <driver name='vfio'/>
546    <source>
547        <address domain='0x0000' bus='0xbb' slot='0x00' function='0x0'/>
548    </source>
549    <address type='pci' domain='0x0000' bus='57' slot='0x00' function='0x0'/>
550    </hostdev>
551    <hostdev mode='subsystem' type='pci' managed='yes'>
552    <source>
553        <address domain='0x0000' bus='0xbc' slot='0x00' function='0x0'/>
554    </source>
555    <address type='pci' domain='0x0000' bus='58' slot='0x00' function='0x0'/>
556    </hostdev>
557    <hostdev mode='subsystem' type='pci' managed='yes'>
558    <driver name='vfio'/>
559    <source>
560        <address domain='0x0000' bus='0xcb' slot='0x00' function='0x0'/>
561    </source>
562    <address type='pci' domain='0x0000' bus='62' slot='0x00' function='0x0'/>
563    </hostdev>
564    <hostdev mode='subsystem' type='pci' managed='yes'>
565    <source>
566        <address domain='0x0000' bus='0xcc' slot='0x00' function='0x0'/>
567    </source>
568    <address type='pci' domain='0x0000' bus='63' slot='0x00' function='0x0'/>
569    </hostdev>
570    <hostdev mode='subsystem' type='pci' managed='yes'>
571    <driver name='vfio'/>
572    <source>
573        <address domain='0x0000' bus='0xdb' slot='0x00' function='0x0'/>
574    </source>
575    <address type='pci' domain='0x0000' bus='67' slot='0x00' function='0x0'/>
576    </hostdev>
577    <hostdev mode='subsystem' type='pci' managed='yes'>
578    <source>
579        <address domain='0x0000' bus='0xdc' slot='0x00' function='0x0'/>
580    </source>
581    <address type='pci' domain='0x0000' bus='68' slot='0x00' function='0x0'/>
582    </hostdev>
583
584    <!-- NVswitches socket 1 -->
585
586    <hostdev mode='subsystem' type='pci' managed='yes'>
587    <source>
588        <address domain='0x0000' bus='0x83' slot='0x00' function='0x0'/>
589    </source>
590    <address type='pci' domain='0x0000' bus='47' slot='0x00' function='0x0'/>
591    </hostdev>
592    <hostdev mode='subsystem' type='pci' managed='yes'>
593    <source>
594        <address domain='0x0000' bus='0x84' slot='0x00' function='0x0'/>
595    </source>
596    <address type='pci' domain='0x0000' bus='48' slot='0x00' function='0x0'/>
597    </hostdev>
598    <hostdev mode='subsystem' type='pci' managed='yes'>
599    <source>
600        <address domain='0x0000' bus='0x85' slot='0x00' function='0x0'/>
601    </source>
602    <address type='pci' domain='0x0000' bus='49' slot='0x00' function='0x0'/>
603    </hostdev>
604    <hostdev mode='subsystem' type='pci' managed='yes'>
605    <source>
606        <address domain='0x0000' bus='0x86' slot='0x00' function='0x0'/>
607    </source>
608    <address type='pci' domain='0x0000' bus='50' slot='0x00' function='0x0'/>
609    </hostdev>
610
611    <memballoon model='virtio'>
612    <address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x0'/>
613    </memballoon>
614    <rng model='virtio'>
615    <backend model='random'>/dev/urandom</backend>
616    <address type='pci' domain='0x0000' bus='0x06' slot='0x00' function='0x0'/>
617    </rng>
618</devices>
619<seclabel type='dynamic' model='apparmor' relabel='yes'/>
620<qemu:commandline>
621    <qemu:arg value='-fw_cfg'/>
622    <qemu:arg value='opt/ovmf/X-PciMmio64Mb,string=4718592'/>
623</qemu:commandline>
624</domain>

NCCL#

NCCL (NVIDIA Collective Communication Library) is a communication layer widely used in distributed AI/ML training. It provides efficient, multi-GPU and multi-node collective operations (like all-reduce, all-gather, reduce-broadcast) that form the basis for training algorithms for large-scale deep learning models. NCCL offers a library of routines optimized for GPU-to-GPU communication and abstracts complexity from AI frameworks (e.g. TensorFlow, PyTorch). These routines take advantage of the underlying topology to minimize latency and maximize bandwidth utilization [2].

Topological awareness is a prerequisite for running NCCL. If the hypervisor obscures the physical hardware topology within a VM, NCCL will create suboptimal communication paths. This can introduce additional overhead and reduce overall effective bandwidth. For multi-node HGX server systems designed to handle distributed training or foundation model training workloads, sub-optimal NCCL performance directly translates to wasted GPU cycles, longer training durations, and reduced overall efficiency.

It is imperative that the VM NUMA and PCIe hierarchy mirrors the underlying physical topology so that NCCL can detect and utilize the correct GPU, NIC, and NUMA relationships. This ensures that the overall performance of NCCL-based applications in a virtualized environment approaches bare metal.