Inside the container, the directory /config
contains the configuration files for the NVIDIA® UFM® Telemetry application. The file launch_ibdiagnet_config.ini
is the main configuration file.
The basic configurations of launch_ibdiagnet_config.ini
are listed in the following table.
Section | Key | Type | Default Value | Description |
---|---|---|---|---|
ibdiagnet | ibdiagnet_enabled | bool | true | Enable/disable run ibdiagnet process |
data_dir | String | /data | Directory in which UFM Telemetry data is placed | |
ibdiag_output_dir | String | /tmp/ibd | Directory in which ibdiagnet places files | |
sample_rate | Int | - | Frequency of collecting ports counters data | |
hca | String | mlx5_2 | Card to use. Can provide a comma-separated list of cards for local high availability | |
app_name | String | /opt/collectx/bin/ibdiagnet | Allow user to specify full path of the ibdiagnet application if necessary | |
topology_mode | String | discover | Topology policy | |
topology_discovery_factor | Int | 0 | Every "n" iterations, do discovery, otherwise, use result from last run if 0 or 1 | |
Retention | retention_enabled | bool | true | Enable/disable retention service |
retention_interval | time | 1d | Interval to wait before running the retention process | |
retention_age | time | 100d | Period to reserve the collected data | |
compression | compression_enable | bool | true | Enable/disable compression service |
compression_interval | time | 6h | Interval to wait before running the compression service | |
compression_age | time | 12h | Period to reserve the compressed data | |
cable_info | cable_info_schedule | CSV | - | weekday/hr:min,hr:hm Time to collect cable info data |
Enable BER Collection
To enable the BER collection, make sure the following lines appear and are not commented out. Specifically, the --enabled_regs dd_ppcnt_plsc
needs to be added.
lookup_BER_counters=--get_phy_info --enabled_regs dd_ppcnt_plsc param_4=BER_counters
Verify that the following flag is commented out or set to 0 (default is 1):
plugin_env_CLX_EXPORT_API_SKIP_PHY_STAT
Enable Temperature Collection
Comment out the following line to make sure temperature sensing will not be skipped:
# arg_13=--skip temp_sensing
Enable Grade Collection
To enable the BER collection, make sure the following lines appear and are not commented out. Specifically, the --enabled_regs dd_ppcnt_plsc
needs to be added.
lookup_Grade_counters=--get_phy_info --enabled_regs slrg param_6=Grade_counters
Verify that the following flag is commented out or set to 0 (default is 1):
plugin_env_CLX_EXPORT_API_SKIP_SLRG
Enable PPCC
To enable PPCC, ensure that the following line is added and not commented:
arg_x=--congestion_counters # x should be replaced with the next available index!
Verify that the following flag is set to 0:
plugin_env_CLX_EXPORT_API_DISABLE_PPCCINFO
The following events are created:
ppcc_algo_config, ppcc_algo_config_params, ppcc_algo_config_support, ppcc_algo_counters
Enable XMIT_WAIT per vl
To enableXMIT_WAIT per vl, ensure that the following line is added and not commented:
arg_x=--per_slvl_cntrs # x should be replaced with the next available index!
Verify the following line does not exist / is set to 0:
plugin_env_CLX_EXPORT_API_SKIP_PORT_VL=1
The following counters are created:PortXmitWaitVLExt[0-15]
Enable MLNX_COUNTERS
To enable MLNX_COUNTERS (page0, 1, 255), ensure that the following line is added and not commented:
arg_x=--sc # x should be replaced with the next available index!
Verify the following line does not exist / is set to 0:
plugin_env_CLX_EXPORT_API_SKIP_MLNX_COUNTER=0 plugin_env_CLX_EXPORT_API_SKIP_MLNX_COUNTERS_PAGE1=0 plugin_env_CLX_EXPORT_API_SKIP_MLNX_COUNTERS_PAGE255=0
Managed Switch Data Collection
Prerequisite: Access to UFM that is running the sysinfo plugin. The following configs are mandatory to enable the collection.
To enables the feature, run:
plugin_env_CLX_EXPORT_API_DISABLE_MANAGED_SWITCHINFO=0
UFM endpoint:
plugin_env_MANAGED_SWITCH_DATA_EP=https://localhost/ufmRest/plugin/sysinfo/query
UFM token:
plugin_env_CLX_UFM_TOKEN=YWRtaW46MTIzNDU2
The UFM Telemetry server endpoint must be the same as the PROMETHEUS_ENDPOINT
plugin_env_CLX_EXPORT_API_MANAGED_SWITCH_CB_EP=http://localhost:1234/management/key_value
The following configs are optional:
The list of managed switches to sample, the default are all the managed switches on the fabric, defined by the sysinfo plugin:
plugin_env_CLX_EXPORT_API_MANAGED_SWITCH_LIST=11.222.33.44,11.333.444.55
sample_rate of managed_switches(seconds)
should not be set faster then switch collection sample rate, default is 10 minutes.plugin_env_CLX_EXPORT_API_MANAGED_SWITCH_INTERVAL=600
Log File Rotation
UFM telemetry log file “ibdiagnet2_port_counters.log
” size is monitored by log rotation mechanism. This is highly relevant for cases of long execution time and/or high verbosity, where the number of logs can get excessively big.
To disable log rotation, verify that the following flag is set to 0 (default is 1):
plugin_env_CLX_LOG_ROTATE_ENABLED
To change the number of rotated files, set the following flag (default is 3):
plugin_env_CLX_LOG_ROTATE_NUM_FILES
To change the rotation’s threshold, set the following flag (default is 100M), use [K|M|G] as units:
plugin_env_CLX_LOG_ROTATE_SIZE
There are three optional rotation methods, used in the following order:
- rotatelogs - If this executable exists, it will be used for logs rotation, and the rotated files name will differ by index suffix.
- logrotate - If this executable exists, it will be used for logs rotation, and the rotated files name will differ by timestamp suffix.
- manual rotation - In case both executables are not available, UFM telemetry will manually rotate 2 log files. The older log file will have “
.bck
”
To skip options, the following flag set the executables to use (default is “rotatelogs,logrotate”):
plugin_env_CLX_LOG_ROTATE_APP