PCIe Multi-GPU systems#

On multi-GPU systems, the Triton server uses peer-to-peer memory copy to transfer data between GPUs whenever this feature is available; that is, when cudaDeviceCanAccessPeer() returns true.

However, on bare-metal Linux systems with PCIe topology, IOMMU-enabled peer-to-peer memory copy is not supported. For more information, refer to IOMMU on Linux. WHen IOMMU is enabled, we recommend setting it to passthrough (by setting the Linux kernel parameter iommu=pt) to ensure optimal performance.

The following are the steps to set IOMMU to passthrough in GRUB:

  1. Determine whether IOMMU is enabled by running the following command:

    dmesg | grep -e DMAR -e IOMMU
    

    If the command produces no output, IOMMU is not enabled, and no further action is required.

    If the command produces output, IOMMU is enabled. Continue with the next step.

  2. Determine whether IOMMU is set to passthrough by running the following command:

    dmesg | grep -i -e iommu=pt -e iommu.*passthrough
    

    If the command produces output, IOMMU is already set to passthrough, and no further action is required.

    If the command produces no output, continue with the following steps.

  3. Open /etc/default/grub file for edit and add iommu=pt to GRUB_CMDLINE_LINUX option. For example:

    .....
    
    GRUB_CMDLINE_LINUX="crashkernel=auto quiet iommu=pt"
    
    .....
    
  4. Based on the system’s OS, use grub-mkconfig or grub2-mkconfig to generate the configuration file:

    • On systems with BIOS:

      grub-mkconfig -o /boot/grub2/grub.cfg #on ubuntu, debian
      grub2-mkconfig -o /boot/grub2/grub.cfg #on centos, rockylinux
      
    • On systems with UEFI:

      grub-mkconfig -o /boot/efi/EFI/<os_name>/grub.cfg #on ubuntu, debian
      
      grub2-mkconfig -o /boot/efi/EFI/<os_name>/grub.cfg #on centos, rockylinux
      
         #replace <os_name> with ubuntu, centos, debian, or rocky
      
  5. On Ubuntu and Debian, you might need to install grub-mkconfig:

    apt install grub-common
    
  6. Reboot the system:

    systemctl reboot
    
  7. Verify that IOMMU is set to passthrough by repeating step 2.