image image image image image

On This Page

Inside the container, the directory /config contains the configuration files for the NVIDIA® UFM® Telemetry application. The file launch_ibdiagnet_config.ini is the main configuration file.

The basic configurations of launch_ibdiagnet_config.ini are listed in the following table.

SectionKeyTypeDefault ValueDescription
ibdiagnet






ibdiagnet_enabledbooltrueEnable/disable run ibdiagnet process
data_dirString/dataDirectory in which UFM Telemetry data is placed
ibdiag_output_dirString/tmp/ibdDirectory in which ibdiagnet places files
sample_rateInt-Frequency of collecting ports counters data
hcaStringmlx5_2Card to use. Can provide a comma-separated list of cards for local high availability
app_nameString/opt/collectx/bin/ibdiagnetAllow user to specify full path of the ibdiagnet application if necessary
topology_modeStringdiscoverTopology policy
topology_discovery_factorInt0Every "n" iterations, do discovery, otherwise, use result from last run if 0 or 1
Retention

retention_enabledbooltrueEnable/disable retention service
retention_intervaltime1dInterval to wait before running the retention process
retention_agetime100dPeriod to reserve the collected data
compression

compression_enablebooltrueEnable/disable compression service
compression_intervaltime6hInterval to wait before running the compression service
compression_agetime12hPeriod to reserve the compressed data
cable_infocable_info_scheduleCSV-

weekday/hr:min,hr:hm

Time to collect cable info data

Enable BER Collection

To enable the BER collection, make sure the following lines appear and are not commented out. Specifically, the --enabled_regs dd_ppcnt_plsc needs to be added.

lookup_BER_counters=--get_phy_info --enabled_regs dd_ppcnt_plsc
param_4=BER_counters

Verify that the following flag is commented out or set to 0 (default is 1):

plugin_env_CLX_EXPORT_API_SKIP_PHY_STAT

Enable Temperature Collection

Comment out the following line to make sure temperature sensing will not be skipped:

# arg_13=--skip temp_sensing

Enable Grade Collection

To enable the BER collection, make sure the following lines appear and are not commented out. Specifically, the --enabled_regs dd_ppcnt_plsc needs to be added.

lookup_Grade_counters=--get_phy_info --enabled_regs slrg
param_6=Grade_counters

Verify that the following flag is commented out or set to 0 (default is 1):

plugin_env_CLX_EXPORT_API_SKIP_SLRG

Enable PPCC

To enable PPCC, ensure that the following line is added and not commented: 

arg_x=--congestion_counters # x should be replaced with the next available index!

Verify that the following flag is set to 0: 

plugin_env_CLX_EXPORT_API_DISABLE_PPCCINFO

The following events are created:

ppcc_algo_config, ppcc_algo_config_params, ppcc_algo_config_support, ppcc_algo_counters

Enable XMIT_WAIT per vl

To enableXMIT_WAIT per vl, ensure that the following line is added and not commented:  

arg_x=--per_slvl_cntrs #  x should be replaced with the next available index!

Verify the following line does not exist / is set to 0: 

plugin_env_CLX_EXPORT_API_SKIP_PORT_VL=1

The following counters are created:
PortXmitWaitVLExt[0-15]

Enable MLNX_COUNTERS

To enable MLNX_COUNTERS (page0, 1, 255), ensure that the following line is added and not commented: 

arg_x=--sc #  x should be replaced with the next available index!

Verify the following line does not exist / is set to 0: 

plugin_env_CLX_EXPORT_API_SKIP_MLNX_COUNTER=0
plugin_env_CLX_EXPORT_API_SKIP_MLNX_COUNTERS_PAGE1=0
plugin_env_CLX_EXPORT_API_SKIP_MLNX_COUNTERS_PAGE255=0

Managed Switch Data Collection

Prerequisite: Access to UFM that is running the sysinfo plugin. The following configs are mandatory to enable the collection. 

To enables the feature, run:

plugin_env_CLX_EXPORT_API_DISABLE_MANAGED_SWITCHINFO=0 

UFM endpoint: 

plugin_env_MANAGED_SWITCH_DATA_EP=https://localhost/ufmRest/plugin/sysinfo/query 

UFM token:

plugin_env_CLX_UFM_TOKEN=YWRtaW46MTIzNDU2 

The UFM Telemetry server endpoint must be the same as the PROMETHEUS_ENDPOINT 

plugin_env_CLX_EXPORT_API_MANAGED_SWITCH_CB_EP=http://localhost:1234/management/key_value 

The following configs are optional:

  • The list of managed switches to sample, the default are all the managed switches on the fabric, defined by the sysinfo plugin: 

    plugin_env_CLX_EXPORT_API_MANAGED_SWITCH_LIST=11.222.33.44,11.333.444.55 
  • sample_rate of managed_switches(seconds) should not be set faster then switch collection sample rate, default is 10 minutes. 

    plugin_env_CLX_EXPORT_API_MANAGED_SWITCH_INTERVAL=600 


Log File Rotation

UFM telemetry log file “ibdiagnet2_port_counters.log” size is monitored by log rotation mechanism. This is highly relevant for cases of long execution time and/or high verbosity, where the number of logs can get excessively big.

To disable log rotation, verify that the following flag is set to 0 (default is 1): 

plugin_env_CLX_LOG_ROTATE_ENABLED

To change the number of rotated files, set the following flag (default is 3): 

plugin_env_CLX_LOG_ROTATE_NUM_FILES

To change the rotation’s threshold, set the following flag (default is 100M), use [K|M|G] as units: 

plugin_env_CLX_LOG_ROTATE_SIZE

There are three optional rotation methods, used in the following order:

  1. rotatelogs - If this executable exists, it will be used for logs rotation, and the rotated files name will differ by index suffix.
  2. logrotate - If this executable exists, it will be used for logs rotation, and the rotated files name will differ by timestamp suffix.
  3. manual rotation - In case both executables are not available, UFM telemetry will manually rotate 2 log files. The older log file will have “.bck

To skip options, the following flag set the executables to use (default is “rotatelogs,logrotate”): 

plugin_env_CLX_LOG_ROTATE_APP