GDS requires specific hardware topology to function. Verify the PCI topology to ensure GPUs and NICs are under the same switch. One way to accomplish this is through lstopo
:
sudo apt install hwloc -y
lstopo --output-format png > lstopo.png
Example of compatible topology from a DGX-A100. The GPUs and the NICs are under the same switch:
![gds-03.png](https://docscontent.nvidia.com/dims4/default/3c59e1d/2147483647/strip/true/crop/624x555+0+0/resize/624x555!/quality/90/?url=https%3A%2F%2Fk3-prod-nvidia-docs.s3.us-west-2.amazonaws.com%2Fbrightspot%2Fsphinx%2F0000018e-a47d-dedb-a79e-aefd1a130000%2Fai-enterprise%2Fdeployment-guide-bare-metal%2F0.1.0%2F_images%2Fgds-03.png)
Example of incompatible topology - the GPU (PCI 17:00.0) is directly attached to the CPU:
![gds-04.png](https://docscontent.nvidia.com/dims4/default/28b145a/2147483647/strip/true/crop/1027x790+0+0/resize/1027x790!/quality/90/?url=https%3A%2F%2Fk3-prod-nvidia-docs.s3.us-west-2.amazonaws.com%2Fbrightspot%2Fsphinx%2F0000018e-a47d-dedb-a79e-aefd1a130000%2Fai-enterprise%2Fdeployment-guide-bare-metal%2F0.1.0%2F_images%2Fgds-04.png)