PCIe - GPU Bandwidth Plugin

Overview

The PCIe plugin’s purpose is to stress the communication from the host to the GPUs as well as among the GPUs on the system. It checks for p2p (peer-to-peer) correctness, any errors or replays while writing the data, and can be used to measure the bandwidth and latency to and from the GPUs and the host.

Preconditions

None

Sub tests

The plugin consists of several self-tests that each measure a different aspect of bandwidth or latency. Each subtest has either a pinned/unpinned pair or a p2p enabled/p2p disabled pair of identical tests. Pinned/unpinned tests use either pinned or unpinned memory when copying data between the host and the GPUs.

This plugin will use NvLink to communicate between GPUs when possible. Otherwise, communication between GPUs will occur over PCIe

Each sub test is represented with a tag that is used both for specifying configuration parameters for the sub test and for outputting stats for the sub test. P2p enabled/p2p disabled tests enable or disable GPUs writing and reading to and from each other directly rather than through the PCIe bus.

Sub Test Tag

Pinned/Unpinned P2P Enabled/P2P Disabled

Description

h2d_d2h_single_pinned

Pinned

Device <-> Host Bandwidth, one GPU at a time

h2d_d2h_single_unpinned

Unpinned

Device <-> Host Bandwidth, one GPU at a time

h2d_d2h_latency_pinned

Pinned

Device <-> Host Latency, one GPU at a time

h2d_d2h_latency_unpinned

Unpinned

Device <-> Host Latency, one GPU at a time

p2p_bw_p2p_enabled

P2P Enabled

Device <-> Device bandwidth one GPU pair at a time

p2p_bw_p2p_disabled

P2P Disabled

Device <-> Device bandwidth one GPU pair at a time

p2p_bw_concurrent_p2p_enabled

P2P Enabled

Device <-> Device bandwidth, concurrently, focusing on bandwidth between GPUs between GPUs likely to be directly connected to each other -> for each (index / 2) and (index / 2)+1

p2p_bw_concurrent_p2p_disabled

P2P Disabled

Device <-> Device bandwidth, concurrently, focusing on bandwidth between GPUs between GPUs likely to be directly connected to each other -> for each (index / 2) and (index / 2)+1

1d_exch_bw_p2p_enabled

P2P Enabled

Device <-> Device bandwidth, concurrently, focusing on bandwidth between gpus, every GPU either sending to the gpu with the index higher than itself (l2r) or to the gpu with the index lower than itself (r2l)

1d_exch_bw_p2p_disabled

P2P Disabled

Device <-> Device bandwidth, concurrently, focusing on bandwidth between gpus, every GPU either sending to the gpu with the index higher than itself (l2r) or to the gpu with the index lower than itself (r2l)

p2p_latency_p2p_enabled

P2P Enabled

Device <-> Device Latency, one GPU pair at a time

p2p_latency_p2p_disabled

P2P Disabled

Device <-> Device Latency, one GPU pair at a time

The following table lists the global parameters for the PCIe plugin.

Parameter Name

Type

Default

Description

test_pinned

Bool

True

Include subtests that test using pinned memory.

test_unpinned

Bool

True

Include subtests that test using unpinned memory.

test_p2p_on

Bool

True

Run relevant subtests with peer to peer (P2P) memory transfers between GPUs enabled.

test_p2p_off

Bool

True

Run relevant subtests with peer to peer (P2P) memory transfers between GPUs disabled.

max_pcie_replays

Float

80.0

Maximum number of PCIe replays to allow per GPU for the duration of this plugin. This is based on an expected replay rate less than per minute for PCIe Gen 3.0, assuming this plugin will run for less than a minute and allowing 10x as many replays before failure.

The following table lists the parameters to specific subtests for the PCIe plugin.

Parameter Name

Default

Sub Tests

Description

min_bandwidth

0

h2d_d2h_single_pinned, h2d_d2h_single_unpinned, h2d_d2h_concurrent_pinned, h2d_d2h_concurrent_unpinned

Minimum bandwidth in GB/s that must be reached for this sub-test to pass.

max_latency

100,000

h2d_d2h_latency_pinned, h2d_d2h_latency_unpinned

Latency in microseconds that cannot be exceeded for this sub-test to pass.

min_pci_generation

1.0

h2d_d2h_single_pinned, h2d_d2h_single_unpinned

Minimum allowed PCI generation that the GPU must be at or exceed for this sub-test to pass.

min_pci_width

1.0

h2d_d2h_single_pinned, h2d_d2h_single_unpinned

Minimum allowed PCI width that the GPU must be at or exceed for this sub-test to pass. For example, 16x = 16.0.