Overview
This document is intended for network operators responsible for the bring-up of InfiniBand (IB) clusters. The purpose of this document is to outline the necessary automation tools, required tests, and essential information needed when installing a new cluster. Additionally, the document provides recommendations and guidance on how to obtain the necessary inputs for these procedures and how to execute the bring-up operations effectively. The document's content is structured logically to facilitate easy reference and understanding.
To complete the bring-up process, please follow the checklist in the following link.
Related Documentation
Document |
Link |
ibdiagnet InfiniBand Fabric Diagnostic Tool User Manual |
|
MLNX_OFED User Manual |
https://docs.nvidia.com/networking/software/adapter-software/index.html#mlnx-ofed |
MLNX-OS User Manual |
https://docs.nvidia.com/networking/software/switch-software/index.html#mlnx-os-infiniband |
MFT User Manual |
https://docs.nvidia.com/networking/software/firmware-management/index.html#mft |
UFM User Manual |
https://docs.nvidia.com/networking/software/management-software/index.html#nvidia-ufm |
HPC-X User Manual |
https://docs.nvidia.com/networking/software/accelerator-software/index.html#hpc-x |
Document Revision History
For the list of changes made to this document, refer to Document Revision History.