Validate PCI Switch System Topology
GDS requires specific hardware topology to function. Verify the PCI topology to ensure GPUs and NICs are under the same switch. One way to accomplish this is through lstopo
:
sudo apt install hwloc -y
lstopo --output-format png > lstopo.png
Example of compatible topology from a DGX-A100. The GPUs and the NICs are under the same switch:
![gds-03.png](https://docscontent.nvidia.com/dims4/default/3c59e1d/2147483647/strip/true/crop/624x555+0+0/resize/624x555!/quality/90/?url=https%3A%2F%2Fk3-prod-nvidia-docs.s3.us-west-2.amazonaws.com%2Fbrightspot%2Fsphinx%2F0000018e-a47d-dedb-a79e-aefd1a130000%2Fai-enterprise%2Fdeployment-guide-bare-metal%2F0.1.0%2F_images%2Fgds-03.png)
Example of incompatible topology - the GPU (PCI 17:00.0) is directly attached to the CPU:
![gds-04.png](https://docscontent.nvidia.com/dims4/default/28b145a/2147483647/strip/true/crop/1027x790+0+0/resize/1027x790!/quality/90/?url=https%3A%2F%2Fk3-prod-nvidia-docs.s3.us-west-2.amazonaws.com%2Fbrightspot%2Fsphinx%2F0000018e-a47d-dedb-a79e-aefd1a130000%2Fai-enterprise%2Fdeployment-guide-bare-metal%2F0.1.0%2F_images%2Fgds-04.png)