NDT Plugin

NDT plugin is a self-contained Docker container with REST API support managed by UFM. The NDT plugin introduces the following capabilities:

    1. NDT topology comparison: Allows the user to compare InfiniBand fabric managed by the UFM and NDT files which Microsoft uses for the description of InfiniBand clusters network topology.

      • Verifies the IB fabric connectivity during cluster bring-up.

      • Verifies the specific parts of IB fabric after component replacements.

      • Automatically detects any changes in topology.

    2. Subnet Merger - Expansion of the fabric based on NDT topology files

      Allows users to gradually extend the InfiniBand fabric without causing any disruption to the running fabric. The system administrator should prepare the NDT topology files, which describe the InfiniBand fabric extensions. Then, an intuitive and user-friendly UI wizard facilitates the topology extension process with a step-by-step guidance for performing necessary actions.

      • The Subnet Merger tool verifies the fabric topology within a predefined NDT file, and reports issues encountered for immediate resolution.

      • Once the verification results are acceptable by the network administrator, the tool creates a topoconfig file to serve as input for OpenSM. This allows setting the physical port states of the designated boundary ports as desired (physical ports can be set as disabled or no-discover).

      • Once the topoconfig file is deployed, the IB network can be extended and verified for the next IB extension.

The following are the possible ways NDT plugin can be deployed:

  1. On UFM Appliance

  2. On UFM Software

For detailed instructions on how to deploy the NDT plugin refer to this page.

Following authentication types are supported:

  • basic (/ufmRest)

  • client (/ufmRestV2)

  • token (/ufmRestV3)

The following REST APIs are supported:

Topodiff

  • GET /help

  • GET /version

  • POST /upload_metadata

  • GET /list

  • POST /compare

  • POST /cancel

  • GET /reports

  • GET /reports/<report_id>

  • POST /delete

Subnet Merger

  • GET /merger_ndts_list

  • GET /merger_ndts_list/<ndt_file_name>

  • POST /merger_upload_ndt

  • POST /merger_verify_ndt

  • GET /merger_verify_ndt_reports

  • GET /merger_verify_ndt_reports/<report_id>

  • POST /merger_update_topoconfig

  • POST /merger_deploy_ndt_config

  • POST /merger_update_deploy_ndt_config

  • POST /merger_delete_ndt

  • GET /merger_deployed_ndt

  • POST /merger_create_topoconfig

For detailed information on how to interact with NDT plugin, refer to the NVIDIA UFM Enterprise > Rest API > NDT Plugin REST API.

NDT is a CSV file containing data relevant to the IB fabric connectivity. The NDT plugin extracts the IB connectivity data based on the following fields:

  1. Start device

  2. Start port

  3. End device

  4. End port

  5. Link type

Switch to Switch NDT

By default, IB links are filtered by:

  • Link Type is Data

  • Start Device and End Device end with IBn, where n is a numeric value.

For TOR switches, Start port/End port field should be in the format Port N, where N is a numeric value.

For Director switches, Start port/End port should be in the format Blade N_Port i/j, where N is a leaf number, i is an internal ASIC number and j is a port number.

Examples:

Start Device

Start Port

End Device

End Port

Link Type

DSM07-0101-0702-01IB0

Port 21

DSM07-0101-0702-01IB1

Blade 2_Port 1/1

Data

DSM07-0101-0702-01IB0

Port 22

DSM07-0101-0702-01IB1

Blade 2_Port 1/1

Data

DSM07-0101-0702-01IB0

Port 23

DSM07-0101-0702-02IB1

Blade 3_Port 1/1

Data

DSM09-0101-0617-001IB2

Port 33

DSM09-0101-0721-001IB4

Port 1

Data

DSM09-0101-0617-001IB2

Port 34

DSM09-0101-0721-001IB4

Port 2

Data

DSM09-0101-0617-001IB2

Port 35

DSM09-0101-0721-001IB4

Port 3

Data

Switch to Host NDT

NDT is a CSV file containing data not only relevant to the IB connectivity.

Extracting the IB connectivity data is based on the following five fields:

  1. Start device

  2. Start port

  3. End device

  4. End port

  5. Link type

IB links should be filtered by the following:

  • Link type is "Data".

  • "Start Device" or "End Device" end with IBN, where N is a numeric value.

    • The other Port should be based on persistent naming convention: ibpXsYfZ, where X, Y and Z are numeric values.

For TOR switches, Start port/End port field will be in the format Port n, where n is a numeric value.

For Director switches, Start port/End port will be in the format Blade N_Port i/j, where N is a leaf number, i is an internal ASIC number and j is a port number.

Examples:

Start Device

Start Port

End Device

End Port

Link Type

DSM071081704019

DSM071081704019 ibp11s0f0

DSM07-0101-0514-01IB0

Port 1

Data

DSM071081704019

DSM071081704019 ibp21s0f0

DSM07-0101-0514-01IB0

Port 2

Data

DSM071081704019

DSM071081704019 ibp75s0f0

DSM07-0101-0514-01IB0

Port 3

Data

Other

Comparison results are forwarded to syslog as events. Example of /var/log/messages content:

  1. Dec 9 12:32:31 <server_ip> ad158f423225[4585]: NDT: missing in UFM "SAT111090310019/SAT111090310019 ibp203s0f0 - SAT11-0101-0903-19IB0/15"

  2. Dec 9 12:32:31 <server_ip> ad158f423225[4585]: NDT: missing in UFM "SAT11-0101-0903-09IB0/27 - SAT11-0101-0905-01IB1-A/Blade 12_Port 1/9"

  3. Dec 9 12:32:31 <server_ip> ad158f423225[4585]: NDT: missing in UFM "SAT11-0101-0901-13IB0/23 - SAT11-0101-0903-01IB1-A/Blade 08_Port 2/13"

For detailed information about how to check syslog, please refer to the NVIDIA UFM-SDN Appliance Command Reference Guide > UFM Commands > UFM Logs.

Minimal interval value for periodic comparison in five minutes.

In case of an error the clarification will be provided.

For example, the request “POST /compare” without NDTs uploaded will return the following:

Configurations could be found in “ufm/conf/ndt.conf

  • Log level (default: INFO)

  • Log size (default: 10240000)

  • Log file backup count (default: 5)

  • Reports number to save (default: 10)

  • NDT format check (default: enabled)

  • Switch to switch and host to switch patterns (default: see NDT format section)

For detailed information on how to export or import the configuration, refer to the NVIDIA UFM-SDN Appliance Command Reference Guide > UFM Commands > UFM Configuration Management.

Logs could be found in “ufm/logs/ndt.log”.

For detailed information on how to generate a debug dump, refer to the NVIDIA UFM-SDN Appliance Command Reference Guide > System Management > Configuration Management > File System.

The Subnet Merger tool refers to a NDT (in csv format) file that contains the following data related to InfiniBand fabric.

In addition to the standard NDT format, for Subnet Merger tool, the file contains two additional columns, "state" and "domain".

  1. Start device

  2. Start port

  3. End device

  4. End port

  5. State

  6. Domain

For boundary ports, only "Start device" and "Start Port" should be defined. Also, "End device" and "End Port" should be empty. The following is an example of a simple NDT config file for the subnet merger tool:

Copy
Copied!
            

rack #,U height,#Fields:StartDevice,StartPort,StartDeviceLocation,EndDevice,EndPort,EndDeviceLocation,U height_1,LinkType,Speed,_2,Cable Length,_3,_4,_5,_6,_7,State,Domain ,,MF0;r-ufm-sw13:MQM8700/U1,Port 1,,,,,,,,,,,,,,,Disabled,Boundary ,,MF0;r-ufm-sw13:MQM8700/U1,Port 30,,r-ufm55 mlx5_1,Port 1,,,,,,,,,,,,Active,In-Scope ,,MF0;r-ufm-sw13:MQM8700/U1,Port 29,,r-ufm55 mlx5_0,Port 1,,,,,,,,,,,,Active,In-Scope ,,MF0;r-ufm-sw13:MQM8700/U1,Port 26,,r-ufm64 mlx5_0,Port 1,,,,,,,,,,,,Active,In-Scope ,,MF0;r-ufm-sw13:MQM8700/U1,Port 3,,,,,,,,,,,,,,,Disabled,Boundary

Switch r-ufm-sw13's port number 1 and port number 3 are designated as boundary ports, meaning they do not have any peer information. When an additional subnet is connected to the fabric, the NDT file for the "new" setup should be modified as follows:

Copy
Copied!
            

rack #,U height,#Fields:StartDevice,StartPort,StartDeviceLocation,EndDevice,EndPort,EndDeviceLocation,U height_1,LinkType,Speed,_2,Cable Length,_3,_4,_5,_6,_7,State,Domain ,,NEMO-LEAF-2,Port 31,,r-ufm142 mlx5_0,Port 1,,,,,,,,,,,,Active,In-Scope ,,MF0;r-ufm-sw13:MQM8700/U1,Port 1,,NEMO-LEAF-2,Port 1,,,,,,,,,,,,Active,In-Scope ,,MF0;r-ufm-sw13:MQM8700/U1,Port 30,,r-ufm55 mlx5_1,Port 1,,,,,,,,,,,,Active,In-Scope ,,MF0;r-ufm-sw13:MQM8700/U1,Port 29,,r-ufm55 mlx5_0,Port 1,,,,,,,,,,,,Active,In-Scope ,,NEMO-LEAF-2,Port 11,,r-ufm57 mlx5_0,Port 1,,,,,,,,,,,,Active,In-Scope ,,MF0;r-ufm-sw13:MQM8700/U1,Port 26,,r-ufm64 mlx5_0,Port 1,,,,,,,,,,,,Active,In-Scope ,,NEMO-LEAF-2,Port 1,,MF0;r-ufm-sw13,Port 1,,,,,,,,,,,,Active,In-Scope ,,MF0;r-ufm-sw13:MQM8700/U1,Port 3,,NEMO-LEAF-2,Port 3,,,,,,,,,,,,Active,In-Scope ,,NEMO-LEAF-2,Port 3,,MF0;r-ufm-sw13,Port 3,,,,,,,,,,,,Active,In-Scope ,,MF0;r-ufm-sw13:MQM8700/U1,Port 9,,,,,,,,,,,,,,,Disabled,Boundary

The previously designated boundary ports now have peers, and a new boundary port (number 9) has been established for the next merge.

Subnet Merger Flow

  1. Upload a new NDT topology file which describes the desired topology by the user. Before deploying the new NDT topology file, it should be verified against the existing topology.

    After the verification, the plugin generates reports including information about:

    • Duplicated GUIDs

    • Miswired links

    • Non-existent links in the pre-defined NDT files

    • Links that exist in the fabric and not in the NDT file

  2. Following the issues detected in the plugin reports, the network administrator changes the NDT file or the fabric. The verification process can be repeated as many times as necessary until the network administrator is satisfied with the results.

  3. If the NDT verification results are satisfactory, a topoconfig file is generated and can be deployed to the UFM server to be used as input for OpenSM

  4. The IB fabric can be extended, if desired (repeat step 1).

Subnet Merger UI

Bring-Up Merger Wizard

  1. Add the NDT plugin to UFM by loading the plugin's image through Settings->Plugins Management. A new item will appear in the main left navigator menu of the UFM labeled "Subnet Merger".

    subnet-manager-1.png

  2. Access "Subnet Merger" to initiate the bring-up wizard.

    subnet-manager-2.png

  3. The wizard will guide you through the process, containing the following steps: Upload the initial NDT tab and validate it. images/networking/download/attachments/132466700/subnet-manager-3.png images/networking/download/attachments/132466700/subnet-manager-4.png Once you are satisfied with the results of the validation in the previous tab, you can proceed to deploy the file. images/networking/download/attachments/132466700/subnet-manager-5.png images/networking/download/attachments/132466700/subnet-manager-6.png This is the initial IB fabric state: images/networking/download/attachments/132466700/1.png

New Subnet Merger

Once you have successfully deployed the initial NDT file, you can initiate a new merger process by clicking the "New Merger" button.

merger-wizard-1.png

  1. "Connect" Tab, it is important to physically connect the new equipment and confirm the connection. Then, click on a button which will open the boundary ports and deploy the active file again.

    merger-wizard-2.png

  2. "Merge" Tab: Once the new equipment is connected and the boundary ports are updated, upload a new NDT file that includes both the current and newly added equipment, along with their boundary ports for future merges. Please note that you cannot merge the file if there are duplicate GUIDs in the report's results.

    merger-wizard-3.png

  3. After completing the merge wizard, and if necessary, you can further proceed to extend the IB fabric.

    merger-wizard-4.png

    This is the IB fabric state after the extension:

    2.png

© Copyright 2023, NVIDIA. Last updated on Sep 5, 2023.