PCIe - GPU Bandwidth Plugin
Overview
The PCIe plugin’s purpose is to stress the communication from the host to the GPUs as well as among the GPUs on the system. It checks for p2p (peer-to-peer) correctness, any errors or replays while writing the data, and can be used to measure the bandwidth and latency to and from the GPUs and the host.
Preconditions
None
Sub tests
The plugin consists of several self-tests that each measure a different aspect of bandwidth or latency. Each subtest has either a pinned/unpinned pair or a p2p enabled/p2p disabled pair of identical tests. Pinned/unpinned tests use either pinned or unpinned memory when copying data between the host and the GPUs.
This plugin will use NvLink to communicate between GPUs when possible. Otherwise, communication between GPUs will occur over PCIe
Each sub test is represented with a tag that is used both for specifying configuration parameters for the sub test and for outputting stats for the sub test. P2p enabled/p2p disabled tests enable or disable GPUs writing and reading to and from each other directly rather than through the PCIe bus.
Sub Test Tag |
Pinned/Unpinned P2P Enabled/P2P Disabled |
Description |
|---|---|---|
h2d_d2h_single_pinned |
Pinned |
Device <-> Host Bandwidth, one GPU at a time |
h2d_d2h_single_unpinned |
Unpinned |
Device <-> Host Bandwidth, one GPU at a time |
h2d_d2h_latency_pinned |
Pinned |
Device <-> Host Latency, one GPU at a time |
h2d_d2h_latency_unpinned |
Unpinned |
Device <-> Host Latency, one GPU at a time |
p2p_bw_p2p_enabled |
P2P Enabled |
Device <-> Device bandwidth one GPU pair at a time |
p2p_bw_p2p_disabled |
P2P Disabled |
Device <-> Device bandwidth one GPU pair at a time |
p2p_bw_concurrent_p2p_enabled |
P2P Enabled |
Device <-> Device bandwidth, concurrently, focusing on bandwidth between GPUs between GPUs likely to be directly connected to each other -> for each (index / 2) and (index / 2)+1 |
p2p_bw_concurrent_p2p_disabled |
P2P Disabled |
Device <-> Device bandwidth, concurrently, focusing on bandwidth between GPUs between GPUs likely to be directly connected to each other -> for each (index / 2) and (index / 2)+1 |
1d_exch_bw_p2p_enabled |
P2P Enabled |
Device <-> Device bandwidth, concurrently, focusing on bandwidth between gpus, every GPU either sending to the gpu with the index higher than itself (l2r) or to the gpu with the index lower than itself (r2l) |
1d_exch_bw_p2p_disabled |
P2P Disabled |
Device <-> Device bandwidth, concurrently, focusing on bandwidth between gpus, every GPU either sending to the gpu with the index higher than itself (l2r) or to the gpu with the index lower than itself (r2l) |
p2p_latency_p2p_enabled |
P2P Enabled |
Device <-> Device Latency, one GPU pair at a time |
p2p_latency_p2p_disabled |
P2P Disabled |
Device <-> Device Latency, one GPU pair at a time |
The following table lists the global parameters for the PCIe plugin.
Parameter Name |
Type |
Default |
Description |
|---|---|---|---|
test_pinned |
Bool |
True |
Include subtests that test using pinned memory. |
test_unpinned |
Bool |
True |
Include subtests that test using unpinned memory. |
test_p2p_on |
Bool |
True |
Run relevant subtests with peer to peer (P2P) memory transfers between GPUs enabled. |
test_p2p_off |
Bool |
True |
Run relevant subtests with peer to peer (P2P) memory transfers between GPUs disabled. |
max_pcie_replays |
Float |
80.0 |
Maximum number of PCIe replays to allow per GPU for the duration of this plugin. This is based on an expected replay rate less than per minute for PCIe Gen 3.0, assuming this plugin will run for less than a minute and allowing 10x as many replays before failure. |
The following table lists the parameters to specific subtests for the PCIe plugin.
Parameter Name |
Default |
Sub Tests |
Description |
|---|---|---|---|
min_bandwidth |
0 |
h2d_d2h_single_pinned, h2d_d2h_single_unpinned, h2d_d2h_concurrent_pinned, h2d_d2h_concurrent_unpinned |
Minimum bandwidth in GB/s that must be reached for this sub-test to pass. |
max_latency |
100,000 |
h2d_d2h_latency_pinned, h2d_d2h_latency_unpinned |
Latency in microseconds that cannot be exceeded for this sub-test to pass. |
min_pci_generation |
1.0 |
h2d_d2h_single_pinned, h2d_d2h_single_unpinned |
Minimum allowed PCI generation that the GPU must be at or exceed for this sub-test to pass. |
min_pci_width |
1.0 |
h2d_d2h_single_pinned, h2d_d2h_single_unpinned |
Minimum allowed PCI width that the GPU must be at or exceed for this sub-test to pass. For example, 16x = 16.0. |