NDT Plugin
NDT plugin is a self-contained Docker container with REST API support managed by UFM. The NDT plugin introduces the following capabilities:
NDT topology comparison: Allows the user to compare InfiniBand fabric managed by the UFM and NDT files which Microsoft uses for the description of InfiniBand clusters network topology.
Verifies the IB fabric connectivity during cluster bring-up.
Verifies the specific parts of IB fabric after component replacements.
Automatically detects any changes in topology.
Subnet Merger - Expansion of the fabric based on NDT topology files
Allows users to gradually extend the InfiniBand fabric without causing any disruption to the running fabric. The system administrator should prepare the NDT topology files, which describe the InfiniBand fabric extensions. Then, an intuitive and user-friendly UI wizard facilitates the topology extension process with a step-by-step guidance for performing necessary actions.
The Subnet Merger tool verifies the fabric topology within a predefined NDT file, and reports issues encountered for immediate resolution.
Once the verification results are acceptable by the network administrator, the tool creates a topoconfig file to serve as input for OpenSM. This allows setting the physical port states of the designated boundary ports as desired (physical ports can be set as disabled or no-discover).
Once the topoconfig file is deployed, the IB network can be extended and verified for the next IB extension.
The following are the possible ways NDT plugin can be deployed:
On UFM Appliance
On UFM Software
For detailed instructions on how to deploy the NDT plugin refer to this page.
Following authentication types are supported:
basic (/ufmRest)
client (/ufmRestV2)
token (/ufmRestV3)
The following REST APIs are supported:
Topodiff
GET /help
GET /version
POST /upload_metadata
GET /list
POST /compare
POST /cancel
GET /reports
GET /reports/<report_id>
POST /delete
Subnet Merger
GET /merger_ndts_list
GET /merger_ndts_list/<ndt_file_name>
POST /merger_upload_ndt
POST /merger_verify_ndt
GET /merger_verify_ndt_reports
GET /merger_verify_ndt_reports/<report_id>
POST /merger_update_topoconfig
POST /merger_deploy_ndt_config
POST /merger_update_deploy_ndt_config
POST /merger_delete_ndt
GET /merger_deployed_ndt
POST /merger_create_topoconfig
For detailed information on how to interact with NDT plugin, refer to the NVIDIA UFM Enterprise > Rest API > NDT Plugin REST API.
NDT is a CSV file containing data relevant to the IB fabric connectivity. The NDT plugin extracts the IB connectivity data based on the following fields:
Start device
Start port
End device
End port
Link type
Switch to Switch NDT
By default, IB links are filtered by:
Link Type is Data
Start Device and End Device end with IBn, where n is a numeric value.
For TOR switches, Start port/End port field should be in the format Port N, where N is a numeric value.
For Director switches, Start port/End port should be in the format Blade N_Port i/j, where N is a leaf number, i is an internal ASIC number and j is a port number.
Examples:
Start Device |
Start Port |
End Device |
End Port |
Link Type |
DSM07-0101-0702-01IB0 |
Port 21 |
DSM07-0101-0702-01IB1 |
Blade 2_Port 1/1 |
Data |
DSM07-0101-0702-01IB0 |
Port 22 |
DSM07-0101-0702-01IB1 |
Blade 2_Port 1/1 |
Data |
DSM07-0101-0702-01IB0 |
Port 23 |
DSM07-0101-0702-02IB1 |
Blade 3_Port 1/1 |
Data |
DSM09-0101-0617-001IB2 |
Port 33 |
DSM09-0101-0721-001IB4 |
Port 1 |
Data |
DSM09-0101-0617-001IB2 |
Port 34 |
DSM09-0101-0721-001IB4 |
Port 2 |
Data |
DSM09-0101-0617-001IB2 |
Port 35 |
DSM09-0101-0721-001IB4 |
Port 3 |
Data |
Switch to Host NDT
NDT is a CSV file containing data not only relevant to the IB connectivity.
Extracting the IB connectivity data is based on the following five fields:
Start device
Start port
End device
End port
Link type
IB links should be filtered by the following:
Link type is "Data".
"Start Device" or "End Device" end with IBN, where N is a numeric value.
The other Port should be based on persistent naming convention: ibpXsYfZ, where X, Y and Z are numeric values.
For TOR switches, Start port/End port field will be in the format Port n, where n is a numeric value.
For Director switches, Start port/End port will be in the format Blade N_Port i/j, where N is a leaf number, i is an internal ASIC number and j is a port number.
Examples:
Start Device |
Start Port |
End Device |
End Port |
Link Type |
DSM071081704019 |
DSM071081704019 ibp11s0f0 |
DSM07-0101-0514-01IB0 |
Port 1 |
Data |
DSM071081704019 |
DSM071081704019 ibp21s0f0 |
DSM07-0101-0514-01IB0 |
Port 2 |
Data |
DSM071081704019 |
DSM071081704019 ibp75s0f0 |
DSM07-0101-0514-01IB0 |
Port 3 |
Data |
Other
Comparison results are forwarded to syslog as events. Example of /var/log/messages content:
Dec 9 12:32:31 <server_ip> ad158f423225[4585]: NDT: missing in UFM "SAT111090310019/SAT111090310019 ibp203s0f0 - SAT11-0101-0903-19IB0/15"
Dec 9 12:32:31 <server_ip> ad158f423225[4585]: NDT: missing in UFM "SAT11-0101-0903-09IB0/27 - SAT11-0101-0905-01IB1-A/Blade 12_Port 1/9"
Dec 9 12:32:31 <server_ip> ad158f423225[4585]: NDT: missing in UFM "SAT11-0101-0901-13IB0/23 - SAT11-0101-0903-01IB1-A/Blade 08_Port 2/13"
For detailed information about how to check syslog, please refer to the NVIDIA UFM-SDN Appliance Command Reference Guide > UFM Commands > UFM Logs.
Minimal interval value for periodic comparison in five minutes.
In case of an error the clarification will be provided.
For example, the request “POST /compare” without NDTs uploaded will return the following:
response code: 400
Response:
{ "error": [ "No NDTs were uploaded for comparison" ] }
Configurations could be found in “ufm/conf/ndt.conf”
Log level (default: INFO)
Log size (default: 10240000)
Log file backup count (default: 5)
Reports number to save (default: 10)
NDT format check (default: enabled)
Switch to switch and host to switch patterns (default: see NDT format section)
For detailed information on how to export or import the configuration, refer to the NVIDIA UFM-SDN Appliance Command Reference Guide > UFM Commands > UFM Configuration Management.
Logs could be found in “ufm/logs/ndt.log”.
For detailed information on how to generate a debug dump, refer to the NVIDIA UFM-SDN Appliance Command Reference Guide > System Management > Configuration Management > File System.
The Subnet Merger tool refers to a NDT (in csv format) file that contains the following data related to InfiniBand fabric.
In addition to the standard NDT format, for Subnet Merger tool, the file contains two additional columns, "state" and "domain".
Start device
Start port
End device
End port
State
Domain
For boundary ports, only "Start device" and "Start Port" should be defined. Also, "End device" and "End Port" should be empty. The following is an example of a simple NDT config file for the subnet merger tool:
rack #,U height,#Fields:StartDevice,StartPort,StartDeviceLocation,EndDevice,EndPort,EndDeviceLocation,U height_1,LinkType,Speed,_2,Cable Length,_3,_4,_5,_6,_7,State,Domain
,,MF0;r-ufm-sw13:MQM8700/U1,Port 1
,,,,,,,,,,,,,,,Disabled,Boundary
,,MF0;r-ufm-sw13:MQM8700/U1,Port 30
,,r-ufm55 mlx5_1,Port 1
,,,,,,,,,,,,Active,In-Scope
,,MF0;r-ufm-sw13:MQM8700/U1,Port 29
,,r-ufm55 mlx5_0,Port 1
,,,,,,,,,,,,Active,In-Scope
,,MF0;r-ufm-sw13:MQM8700/U1,Port 26
,,r-ufm64 mlx5_0,Port 1
,,,,,,,,,,,,Active,In-Scope
,,MF0;r-ufm-sw13:MQM8700/U1,Port 3
,,,,,,,,,,,,,,,Disabled,Boundary
Switch r-ufm-sw13's port number 1 and port number 3 are designated as boundary ports, meaning they do not have any peer information. When an additional subnet is connected to the fabric, the NDT file for the "new" setup should be modified as follows:
rack #,U height,#Fields:StartDevice,StartPort,StartDeviceLocation,EndDevice,EndPort,EndDeviceLocation,U height_1,LinkType,Speed,_2,Cable Length,_3,_4,_5,_6,_7,State,Domain
,,NEMO-LEAF-2
,Port 31
,,r-ufm142 mlx5_0,Port 1
,,,,,,,,,,,,Active,In-Scope
,,MF0;r-ufm-sw13:MQM8700/U1,Port 1
,,NEMO-LEAF-2
,Port 1
,,,,,,,,,,,,Active,In-Scope
,,MF0;r-ufm-sw13:MQM8700/U1,Port 30
,,r-ufm55 mlx5_1,Port 1
,,,,,,,,,,,,Active,In-Scope
,,MF0;r-ufm-sw13:MQM8700/U1,Port 29
,,r-ufm55 mlx5_0,Port 1
,,,,,,,,,,,,Active,In-Scope
,,NEMO-LEAF-2
,Port 11
,,r-ufm57 mlx5_0,Port 1
,,,,,,,,,,,,Active,In-Scope
,,MF0;r-ufm-sw13:MQM8700/U1,Port 26
,,r-ufm64 mlx5_0,Port 1
,,,,,,,,,,,,Active,In-Scope
,,NEMO-LEAF-2
,Port 1
,,MF0;r-ufm-sw13,Port 1
,,,,,,,,,,,,Active,In-Scope
,,MF0;r-ufm-sw13:MQM8700/U1,Port 3
,,NEMO-LEAF-2
,Port 3
,,,,,,,,,,,,Active,In-Scope
,,NEMO-LEAF-2
,Port 3
,,MF0;r-ufm-sw13,Port 3
,,,,,,,,,,,,Active,In-Scope
,,MF0;r-ufm-sw13:MQM8700/U1,Port 9
,,,,,,,,,,,,,,,Disabled,Boundary
The previously designated boundary ports now have peers, and a new boundary port (number 9) has been established for the next merge.
Subnet Merger Flow
Upload a new NDT topology file which describes the desired topology by the user. Before deploying the new NDT topology file, it should be verified against the existing topology.
After the verification, the plugin generates reports including information about:
Duplicated GUIDs
Miswired links
Non-existent links in the pre-defined NDT files
Links that exist in the fabric and not in the NDT file
Following the issues detected in the plugin reports, the network administrator changes the NDT file or the fabric. The verification process can be repeated as many times as necessary until the network administrator is satisfied with the results.
If the NDT verification results are satisfactory, a topoconfig file is generated and can be deployed to the UFM server to be used as input for OpenSM
The IB fabric can be extended, if desired (repeat step 1).
Subnet Merger UI
Bring-Up Merger Wizard
Add the NDT plugin to UFM by loading the plugin's image through Settings->Plugins Management. A new item will appear in the main left navigator menu of the UFM labeled "Subnet Merger".
Access "Subnet Merger" to initiate the bring-up wizard.
- The wizard will guide you through the process, containing the following steps: Upload the initial NDT tab and validate it. images/networking/download/attachments/125800526/subnet-manager-3.png images/networking/download/attachments/125800526/subnet-manager-4.png Once you are satisfied with the results of the validation in the previous tab, you can proceed to deploy the file. images/networking/download/attachments/125800526/subnet-manager-5.png images/networking/download/attachments/125800526/subnet-manager-6.png This is the initial IB fabric state: images/networking/download/attachments/125800526/1.png
New Subnet Merger
Once you have successfully deployed the initial NDT file, you can initiate a new merger process by clicking the "New Merger" button.
"Connect" Tab, it is important to physically connect the new equipment and confirm the connection. Then, click on a button which will open the boundary ports and deploy the active file again.
"Merge" Tab: Once the new equipment is connected and the boundary ports are updated, upload a new NDT file that includes both the current and newly added equipment, along with their boundary ports for future merges. Please note that you cannot merge the file if there are duplicate GUIDs in the report's results.
After completing the merge wizard, and if necessary, you can further proceed to extend the IB fabric.
This is the IB fabric state after the extension: