PCIe - GPU Bandwidth Plugin#

Overview#

The PCIe plugin’s purpose is to stress the communication from the host to the GPUs as well as among the GPUs on the system. It checks for p2p (peer-to-peer) correctness, any errors or replays while writing the data, and can be used to measure the bandwidth and latency to and from the GPUs and the host.

Preconditions#

None

Sub tests#

The plugin consists of several self-tests that each measure a different aspect of bandwidth or latency. Each subtest has either a pinned/unpinned pair or a p2p enabled/p2p disabled pair of identical tests. Pinned/unpinned tests use either pinned or unpinned memory when copying data between the host and the GPUs.

This plugin will use NvLink to communicate between GPUs when possible. Otherwise, communication between GPUs will occur over PCIe

Each sub test is represented with a tag that is used both for specifying configuration parameters for the sub test and for outputting stats for the sub test. P2p enabled/p2p disabled tests enable or disable GPUs writing and reading to and from each other directly rather than through the PCIe bus.

Sub Test Tag	Pinned/Unpinned P2P Enabled/P2P Disabled	Description
h2d_d2h_single_pinned	Pinned	Device <-> Host Bandwidth, one GPU at a time
h2d_d2h_single_unpinned	Unpinned	Device <-> Host Bandwidth, one GPU at a time
h2d_d2h_latency_pinned	Pinned	Device <-> Host Latency, one GPU at a time
h2d_d2h_latency_unpinned	Unpinned	Device <-> Host Latency, one GPU at a time
p2p_bw_p2p_enabled	P2P Enabled	Device <-> Device bandwidth one GPU pair at a time
p2p_bw_p2p_disabled	P2P Disabled	Device <-> Device bandwidth one GPU pair at a time
p2p_bw_concurrent_p2p_enabled	P2P Enabled	Device <-> Device bandwidth, concurrently, focusing on bandwidth between GPUs between GPUs likely to be directly connected to each other -> for each (index / 2) and (index / 2)+1
p2p_bw_concurrent_p2p_disabled	P2P Disabled	Device <-> Device bandwidth, concurrently, focusing on bandwidth between GPUs between GPUs likely to be directly connected to each other -> for each (index / 2) and (index / 2)+1
1d_exch_bw_p2p_enabled	P2P Enabled	Device <-> Device bandwidth, concurrently, focusing on bandwidth between gpus, every GPU either sending to the gpu with the index higher than itself (l2r) or to the gpu with the index lower than itself (r2l)
1d_exch_bw_p2p_disabled	P2P Disabled	Device <-> Device bandwidth, concurrently, focusing on bandwidth between gpus, every GPU either sending to the gpu with the index higher than itself (l2r) or to the gpu with the index lower than itself (r2l)
p2p_latency_p2p_enabled	P2P Enabled	Device <-> Device Latency, one GPU pair at a time
p2p_latency_p2p_disabled	P2P Disabled	Device <-> Device Latency, one GPU pair at a time

The following table lists the global parameters for the PCIe plugin.

Parameter Name	Type	Default	Description
test_pinned	Bool	True	Include subtests that test using pinned memory.
test_unpinned	Bool	True	Include subtests that test using unpinned memory.
test_p2p_on	Bool	True	Run relevant subtests with peer to peer (P2P) memory transfers between GPUs enabled.
test_p2p_off	Bool	True	Run relevant subtests with peer to peer (P2P) memory transfers between GPUs disabled.
max_pcie_replays	Float	80.0	Maximum number of PCIe replays to allow per GPU for the duration of this plugin. This is based on an expected replay rate less than per minute for PCIe Gen 3.0, assuming this plugin will run for less than a minute and allowing 10x as many replays before failure.

The following table lists the parameters to specific subtests for the PCIe plugin.

Parameter Name	Default	Sub Tests	Description
min_bandwidth	0	h2d_d2h_single_pinned, h2d_d2h_single_unpinned, h2d_d2h_concurrent_pinned, h2d_d2h_concurrent_unpinned	Minimum bandwidth in GB/s that must be reached for this sub-test to pass.
max_latency	100,000	h2d_d2h_latency_pinned, h2d_d2h_latency_unpinned	Latency in microseconds that cannot be exceeded for this sub-test to pass.
min_pci_generation	1.0	h2d_d2h_single_pinned, h2d_d2h_single_unpinned	Minimum allowed PCI generation that the GPU must be at or exceed for this sub-test to pass.
min_pci_width	1.0	h2d_d2h_single_pinned, h2d_d2h_single_unpinned	Minimum allowed PCI width that the GPU must be at or exceed for this sub-test to pass. For example, 16x = 16.0.