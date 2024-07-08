NVIDIA UFM Telemetry Documentation v1.15.7
NVIDIA Docs Hub Homepage  NVIDIA Networking  Networking Software  Management Software  NVIDIA UFM Telemetry Documentation v1.15.7  Settings and Configuration

On This Page

Settings and Configuration

Inside the container, the directory /config contains the configuration files for the NVIDIA® UFM® Telemetry application. The file launch_ibdiagnet_config.ini is the main configuration file.

The basic configurations of launch_ibdiagnet_config.ini are listed in the following table.

Section

Key

Type

Default Value

Description

ibdiagnet

ibdiagnet_enabled

bool

true

Enable/disable run ibdiagnet process

data_dir

String

/data

Directory in which UFM Telemetry data is placed

ibdiag_output_dir

String

/tmp/ibd

Directory in which ibdiagnet places files

sample_rate

Int

-

Frequency of collecting ports counters data

hca

String

mlx5_2

Card to use. Can provide a comma-separated list of cards for local high availability

force_hca

bool

false

Skip hca state check

app_name

String

/opt/collectx/bin/ibdiagnet

Allow user to specify full path of the ibdiagnet application if necessary

topology_mode

String

discover

Topology policy

topology_discovery_factor

Int

0

Every "n" iterations, do discovery, otherwise, use result from last run if 0 or 1

Retention

retention_enabled

bool

true

Enable/disable retention service

retention_interval

time

1d

Interval to wait before running the retention process

retention_age

time

100d

Period to reserve the collected data

compression

compression_enable

bool

true

Enable/disable compression service

compression_interval

time

6h

Interval to wait before running the compression service

compression_age

time

12h

Period to reserve the compressed data

cable_info

cable_info_schedule

CSV

-

weekday/hr:min,hr:hm

Time to collect cable info data

Enable BER Collection

To enable the BER collection, make sure the following lines appear and are not commented out. Specifically, the --enabled_regs dd_ppcnt_plsc needs to be added.

Copy
Copied!
            

            
lookup_BER_counters=--get_phy_info --enabled_regs dd_ppcnt_plsc
param_4=BER_counters

Verify that the following flag is commented out or set to 0 (default is 1):

Copy
Copied!
            

            
plugin_env_CLX_EXPORT_API_SKIP_PHY_STAT

Enable Temperature Collection

Comment out the following line to make sure temperature sensing will not be skipped:

Copy
Copied!
            

            
# arg_13=--skip temp_sensing

Enable Grade Collection

To enable the BER collection, make sure the following lines appear and are not commented out. Specifically, the --enabled_regs dd_ppcnt_plsc needs to be added.

Copy
Copied!
            

            
lookup_Grade_counters=--get_phy_info --enabled_regs slrg
param_6=Grade_counters

Verify that the following flag is commented out or set to 0 (default is 1):

Copy
Copied!
            

            
plugin_env_CLX_EXPORT_API_SKIP_SLRG

Enable PPCC

To enable PPCC, ensure that the following line is added and not commented:

Copy
Copied!
            

            
arg_x=--congestion_counters # x should be replaced with the next available index!

Verify that the following flag is set to 0:

Copy
Copied!
            

            
plugin_env_CLX_EXPORT_API_DISABLE_PPCCINFO

The following events are created:

ppcc_algo_config, ppcc_algo_config_params, ppcc_algo_config_support, ppcc_algo_counters

Enable XMIT_WAIT per vl

To enableXMIT_WAIT per vl, ensure that the following line is added and not commented:

Copy
Copied!
            

            
arg_x=--per_slvl_cntrs #  x should be replaced with the next available index!

Verify the following line does not exist / is set to 0:

Copy
Copied!
            

            
plugin_env_CLX_EXPORT_API_SKIP_PORT_VL=1

The following counters are created:
PortXmitWaitVLExt[0-15]

Enable MLNX_COUNTERS

To enable MLNX_COUNTERS (page0, 1, 255), ensure that the following line is added and not commented:

Copy
Copied!
            

            
arg_x=--sc #  x should be replaced with the next available index!

Verify the following line does not exist / is set to 0:

Copy
Copied!
            

            
plugin_env_CLX_EXPORT_API_SKIP_MLNX_COUNTER=0
plugin_env_CLX_EXPORT_API_SKIP_MLNX_COUNTERS_PAGE1=0
plugin_env_CLX_EXPORT_API_SKIP_MLNX_COUNTERS_PAGE255=0

Switch Power Sensors Data

To enable Switch power sensors, ensure that the following line is added and not commented:

Copy
Copied!
            

            
arg_x= --get_phy_info --enabled_reg mvcr #  x should be replaced with the next available index!

Verify the following line does not exist / is set to 0:

Copy
Copied!
            

            
plugin_env_CLX_EXPORT_API_DISABLE_SWITCHINFO=0

Switch Power Supplies Data

To enable switch power supplies, ensure that the following line is added and not commented:

Copy
Copied!
            

            
arg_x= --get_phy_info --enabled_reg msps #  x should be replaced with the next available index!

Verify the following line does not exist / is set to 0:

Copy
Copied!
            

            
plugin_env_CLX_EXPORT_API_DISABLE_SWITCHINFO=0

SHARP HW Counters

To enable Sharp HW (PM) counters, ensure the following line is added and not commented:

Copy
Copied!
            

            
arg_x=--sharp –sharp_opt dsc # x should be replaced with next available index!

Verify the following line does not exist / is set to 0:

Copy
Copied!
            

            
plugin_env_CLX_EXPORT_API_SKIP_SHARP_PM_COUNTERS=0

Managed Switch Data Collection

Prerequisite: Access to UFM that is running the sysinfo plugin. The following configs are mandatory to enable the collection.

To enables the feature, run:

Copy
Copied!
            

            
plugin_env_CLX_EXPORT_API_DISABLE_MANAGED_SWITCHINFO=0 

UFM endpoint:

Copy
Copied!
            

            
plugin_env_MANAGED_SWITCH_DATA_EP=https://localhost/ufmRest/plugin/sysinfo/query 

UFM token:

Copy
Copied!
            

            
plugin_env_CLX_UFM_TOKEN=YWRtaW46MTIzNDU2 

The UFM Telemetry server endpoint must be the same as the PROMETHEUS_ENDPOINT

Copy
Copied!
            

            
plugin_env_CLX_EXPORT_API_MANAGED_SWITCH_CB_EP=http://localhost:1234/management/key_value 

The following configs are optional:

  • The list of managed switches to sample, the default are all the managed switches on the fabric, defined by the sysinfo plugin:

    Copy
    Copied!
                
    
            
    plugin_env_CLX_EXPORT_API_MANAGED_SWITCH_LIST=11.222.33.44,11.333.444.55 

  • sample_rate of managed_switches(seconds) should not be set faster then switch collection sample rate, default is 10 minutes.

    Copy
    Copied!
                
    
            
    plugin_env_CLX_EXPORT_API_MANAGED_SWITCH_INTERVAL=600 

Log File Rotation

UFM telemetry log file “ibdiagnet2_port_counters.log” size is monitored by log rotation mechanism. This is highly relevant for cases of long execution time and/or high verbosity, where the number of logs can get excessively big.

To disable log rotation, verify that the following flag is set to 0 (default is 1):

Copy
Copied!
            

            
plugin_env_CLX_LOG_ROTATE_ENABLED

To change the number of rotated files, set the following flag (default is 3):

Copy
Copied!
            

            
plugin_env_CLX_LOG_ROTATE_NUM_FILES

To change the rotation’s threshold, set the following flag (default is 100M), use [K|M|G] as units:

Copy
Copied!
            

            
plugin_env_CLX_LOG_ROTATE_SIZE

There are three optional rotation methods, used in the following order:

  1. rotatelogs - If this executable exists, it will be used for logs rotation, and the rotated files name will differ by index suffix.

  2. logrotate - If this executable exists, it will be used for logs rotation, and the rotated files name will differ by timestamp suffix.

  3. manual rotation - In case both executables are not available, UFM telemetry will manually rotate 2 log files. The older log file will have “.bck

To skip options, the following flag set the executables to use (default is “rotatelogs,logrotate”):

Copy
Copied!
            

            
plugin_env_CLX_LOG_ROTATE_APP

© Copyright 2024, NVIDIA. Last updated on Jul 8, 2024
content here