NDT Plugin

NVIDIA UFM Enterprise User Manual v6.11.1

NDT plugin is a self-contained Docker container with REST API support managed by UFM. NDT plugin provides NDT topo diff capability. This feature allows the user to compare IB fabric managed by UFM and NDT files which are used by Microsoft for description of IB clusters network topology.

Main usage cases:

  • Get confidence on the IB fabric connectivity during cluster bring-up.

  • Get confidence on the specific parts of IB fabric after component replacements.

  • Automatically detect any changes in topology.

The following are the possible ways NDT plugin can be deployed:

  1. On UFM Appliance

  2. On UFM Software

Detailed instructions on how to deploy NDT plugin could be found on page mellanox/ufm-plugin-ndt.

Following authentication types are supported:

  • basic (/ufmRest)

  • client (/ufmRestV2)

  • token (/ufmRestV3)

The following REST APIs are supported:

  • GET /help

  • GET /version

  • POST /upload_metadata

  • GET /list

  • POST /compare

  • POST /cancel

  • GET /reports

  • GET /reports/<report_id>

  • POST /delete

For detailed information on how to interact with NDT plugin, refer to the NVIDIA UFM Enterprise > Rest API > NDT Plugin REST API.

NDT is a CSV file containing data relevant to the IB fabric connectivity.

NDT plugin extracts the IB connectivity data based on the following five fields:

  1. Start device

  2. Start port

  3. End device

  4. End port

  5. Link type

Switch to Switch NDT

By default, IB links are filtered by:

  • Link Type is Data

  • Start Device and End Device end with IBn, where n is a numeric value.

For TOR switches, Start port/End port field should be in the format Port N, where N is a numeric value.

For Director switches, Start port/End port should be in the format Blade N_Port i/j, where N is a leaf number, i is an internal ASIC number and j is a port number.

Examples:

Start Device

Start Port

End Device

End Port

Link Type

DSM07-0101-0702-01IB0

Port 21

DSM07-0101-0702-01IB1

Blade 2_Port 1/1

Data

DSM07-0101-0702-01IB0

Port 22

DSM07-0101-0702-01IB1

Blade 2_Port 1/1

Data

DSM07-0101-0702-01IB0

Port 23

DSM07-0101-0702-02IB1

Blade 3_Port 1/1

Data

DSM09-0101-0617-001IB2

Port 33

DSM09-0101-0721-001IB4

Port 1

Data

DSM09-0101-0617-001IB2

Port 34

DSM09-0101-0721-001IB4

Port 2

Data

DSM09-0101-0617-001IB2

Port 35

DSM09-0101-0721-001IB4

Port 3

Data

Switch to Host NDT

NDT is a CSV file containing data not only relevant to the IB connectivity.

Extracting the IB connectivity data is based on the following five fields:

  1. Start device

  2. Start port

  3. End device

  4. End port

  5. Link type

IB links should be filtered by the following:

  • Link type is Data

  • Start device or End device end with IBN, where N is a numeric value.

    • The other Port should be based on persistent naming convention: ibpXsYfZ, where X, Y and Z are numeric values.

For TOR switches, Start port/End port field will be in the format Port n, where n is a numeric value.

For Director switches, Start port/End port will be in the format Blade N_Port i/j, where N is a leaf number, i is an internal ASIC number and j is a port number.

Examples:

Start Device

Start Port

End Device

End Port

Link Type

DSM071081704019

DSM071081704019 ibp11s0f0

DSM07-0101-0514-01IB0

Port 1

Data

DSM071081704019

DSM071081704019 ibp21s0f0

DSM07-0101-0514-01IB0

Port 2

Data

DSM071081704019

DSM071081704019 ibp75s0f0

DSM07-0101-0514-01IB0

Port 3

Data

Comparison results are forwarded to syslog as events. Example of /var/log/messages content:

  1. Dec 9 12:32:31 <server_ip> ad158f423225[4585]: NDT: missing in UFM "SAT111090310019/SAT111090310019 ibp203s0f0 - SAT11-0101-0903-19IB0/15"

  2. Dec 9 12:32:31 <server_ip> ad158f423225[4585]: NDT: missing in UFM "SAT11-0101-0903-09IB0/27 - SAT11-0101-0905-01IB1-A/Blade 12_Port 1/9"

  3. Dec 9 12:32:31 <server_ip> ad158f423225[4585]: NDT: missing in UFM "SAT11-0101-0901-13IB0/23 - SAT11-0101-0903-01IB1-A/Blade 08_Port 2/13"

For detailed information about how to check syslog, please refer to the NVIDIA UFM-SDN Appliance Command Reference Guide > UFM Commands > UFM Logs.

Minimal interval value for periodic comparison in five minutes.

In case of an error the clarification will be provided.

For example, the request “POST /compare” without NDTs uploaded will return the following:

Configurations could be found in “ufm/conf/ndt.conf

  • Log level (default: INFO)

  • Log size (default: 10240000)

  • Log file backup count (default: 5)

  • Reports number to save (default: 10)

  • NDT format check (default: enabled)

  • Switch to switch and host to switch patterns (default: see NDT format section)

For detailed information on how to export or import the configuration, refer to the NVIDIA UFM-SDN Appliance Command Reference Guide > UFM Commands > UFM Configuration Management.

Logs could be found in “ufm/logs/ndt.log”.

For detailed information on how to generate a debug dump, refer to the NVIDIA UFM-SDN Appliance Command Reference Guide > System Management > Configuration Management > File System.

© Copyright 2023, NVIDIA. Last updated on Sep 5, 2023.