image image image image image

On This Page

NVIDIA® UFM® Telemetry uses the configuration file launch_ibdiagnet_config.ini to control the process of collecting the data. It collects two types of data: Cable info and port counters.

Port counters are collected periodically by setting the parameter sample_rate in seconds.

Bare Metal Mode

By default cable info data will not be collected. To enable its collection, add the following flag:

plugin_env_CLX_EXPORT_API_DISABLE_CABLEINFO=0

When enabled, cable info data is collected, by default, on every run. It is possible to change the collection frequency to be once every num_iterations using the following setting:

plugin_env_CLX_EXPORT_API_CABLE_RUN_ONCE=1

To work with the collected data, you may use the Telemetry CLI, which can be accessed as follows:

./bin/clxcli
CollectX: set_data_root /tmp/clx_data
CollectX: set_data_template {{year}}/{{month}}{{day}}/{{hash1023}}/{{source}}/{{tag}}{{id}}.bin

Container Mode

Cable info data is collected based on a weekly schedule, set with the parameter cable_info_schedule. Time parameter is in the format "day/hrs:mins". For daily collection, it is "hrs:mins".

It is possible to collect the data multiple times during the week. To do that use a comma to separate the times at which collection is to take place. For example,

  • cable_info_schedule= 5/00:00 – collects cable info data on 5th day of the week at midnight
  • cable_info_schedule= 12:00 – collects cable info data midnight at 12:00 every day
  • cable_info_schedule= 5/00:00,12:00 – combines the previous two examples

To work with the collected data, you may use the Telemetry CLI, which can be accessed as follows:

[root@r-ufm145 ~]# docker exec -it ufm-telemetry clxcli
Read configuration from:  /opt/mellanox/collectx/etc/collectx.ini
agx_data_root = /data
Loaded 2 schemas from /data/schema/schema*.json
 
CollectX:

Cable Info Data

The main commands to query and retrieve cable info data are cable_times and cable_info.

  • cables_times – dump times and file names of cable info data files, and you can redirect the output to a file
  • cable_info – dump cable info for a given date or range of dates

The following presents the help menu of the cable_time command:

CollectX: help cable_times
 
Usage:
            cable_times [TIME] [out=]
 
            [TIME] is one the following:
                                        date=
                                        past=n[hours|days]
 
Description:
            Dump times and file names of cable info data files
 
Examples:
            cable_times
            cable_times date=jun04
            cable_times past=15d out=out.csv

Example for cable_time command: 

CollectX: cable_times
Opened 202 files in 0.05 seconds
 
Cable
-----
 
idx   Date Time          Filename
---   ----------------   -----------------------------------
1     2020-07-26 04:13   /…/cables_1595725983912963.bin
3     2020-07-26 04:28   /…/cables_1595726884030804.bin

Help menu of cable_info command:

CollectX: help cable_info
 
Usage:
            cable_info  [TIME] [out=]
 
            [TIME] is one of the following:
                                                last
                                                date=
                                                past=n[hours|days]
            [out=] is to specify output file (optional)
 
Description:
            Dump cable info for a given date or range of dates.
            If "last" arg is given, dumps only the last file.
            If "out=" file name specified, data will be also dumped to that file.
 
Examples:
            cable_info  filename
            cable_info  file=filename
            cable_info  last
            cable_info  date=jun04
            cable_info  past=15d out=cable_info.csv

Example for cable_info command: 

cable_info /…/cables_1595764809124997.bin
 
time,source,timestamp,port,lid,guid,port_name,vendor,oui,pn,sn,rev,length,type,supportedspeed,temperature,powerclass,
nominalbitrate,cdrenabletxrx,inputeq,outputamp,outputemp,fw_version,attenuation_2.5_5_7_12,rx_power_type,
rx_power.1.mw,rx_power.1.dbm,rx_power.2.mw,rx_power.2.dbm,rx_power.3.mw,rx_power.3.dbm,rx_power.4.mw,
rx_power.4.dbm,tx_bias.1,tx_bias.2,tx_bias.3,tx_bias.4,tx_power.1.mw,tx_power.1.dbm,tx_power.2.mw,tx_power.2.dbm,tx_power.3.mw,tx_power.3.dbm,tx_power.4.mw,tx_power.4.dbm,cdr_tx_rx_loss_indicator,adaptive_equalization_fault,tx_rx_lol_indicator,temperature_alarm_and_warning,voltage_alarm_and_warning,rx_power_alarm_and_warning,tx_bias_alarm_and_warning,tx_power_alarm_and_warning,diag_supply_voltage,transmitter_technology,eth_com_codes_ext,datacode,lot,tx_adaptive_equalization_freeze,rx_output_disable,tx_adaptive_equalization_enable,
2020-07-26T15:00:12.742710,cable_info,1595764812742710,1,117,0x248a0703008b20ec,ufm-hercules-01/U1/P1,Mellanox,0x2c9,MC2207130-002,MT1442VS07035,A3,2 m,Copper cable- unequalized,SDR/DDR/QDR/FDR,N/A,1,0,N/A N/A,N/A,N/A,N/A,N/A,5 8 11 0,OMA,0.0,-999.999023438,0.0,-999.999023438,0.0,-999.999023438,0.0,-999.999023438,*,*,*,*,0.0,-999.999023438,0.0,-999.999023438,0.0,-999.999023438,0.0,-999.999023438,0,0,0,0,0,0,0,0,0,160,0,14-11-27,8224,0,0x0,0x0,
2020-07-26T15:00:12.742710,cable_info,1595764812742710,1,104,0xe41d2d0300109610,msib-e2edmz-02/U1/P1,Mellanox,0x2c9,MC2207130-002,MT1442VS07035,A3,2 m,Copper cable- unequalized,SDR/DDR/QDR/FDR,N/A,1,0,N/A N/A,N/A,N/A,N/A,N/A,5 8 11 0,OMA,0.0,-999.999023438,0.0,-999.999023438,0.0,-999.999023438,0.0,-999.999023438,*,*,*,*,0.0,-999.999023438,0.0,-999.999023438,0.0,-999.999023438,0.0,-999.999023438,0,0,0,0,0,0,0,0,0,160,0,14-11-27,8224,0,0x0,0x0,
2020-07-26T15:00:12.742710,cable_info,1595764812742710,3,104,0xe41d2d0300109610,msib-e2edmz-02/U1/P3,Mellanox,0x2c9,MC2207130-002,MT1411VS08914,A3,2 m,Copper cable- unequalized,SDR/DDR/QDR/FDR,N/A,1,0,N/A N/A,N/A,N/A,N/A,N/A,5 8 11 0,OMA,0.0,-999.999023438,0.0,-999.999023438,0.0,-999.999023438,0.0,-999.999023438,*,*,*,*,0.0,-999.999023438,0.0,-999.999023438,0.0,-999.999023438,0.0,-999.999023438,0,0,0,0,0,0,0,0,0,160,0,14-03-25,8224,0,0x0,0x0,
2020-07-26T15:00:12.742710,cable_info,1595764812742710,1,187,0xe41d2d03005d2250,ip-forwarder/U1/P1,Mellanox,0x2c9,MC2207130-002,MT1411VS08914,A3,2 m,Copper cable- unequalized,SDR/DDR/QDR/FDR,N/A,1,0,N/A N/A,N/A,N/A,N/A,N/A,5 8 11 0,OMA,0.0,-999.999023438,0.0,-999.999023438,0.0,-999.999023438,0.0,-999.999023438,*,*,*,*,0.0,-999.999023438,0.0,-999.999023438,0.0,-999.999023438,0.0,-999.999023438,0,0,0,0,0,0,0,0,0,160,0,14-03-25,8224,0,0x0,0x0,

Port Counters

The port_counters command is used to extract data in CSV format. It dumps counters matching a given text fragment or "counterset" for a date or range of dates.

Following is the help menu of port_counters command:

CollectX: help port_counters
 
Usage:
            port_counters [TAGS] [TIME] [out=]
 
            [TAGS] is a list of countersets/name fragments.
            [TIME] can be specified as:
                                        date=
                                        past=n[hours|days]
                                        from= to=
            [out=] to specify output file (optional)
 
Description:
            Dump port_counters matching a given fragment/counterset for a
            given date or range of dates.
 
Example:
            port_counters error past=10m
            port_counters error date=jul16 out=error_dump.csv

The following is an example of a port_counters command run:

CollectX: port_counters past=300s
 
idx,time,ts,node,port_num,PortMultiCastXmitPktsExtended,PortUniCastXmitPktsExtended,PortXmitConstraintErrorsExtended,
PortXmitDataExtended,PortXmitDiscardsExtended,PortXmitPktsExtended,PortXmitWaitExtended,port_xmit_constraint_errors,
port_xmit_data,port_xmit_discard,port_xmit_pkts,port_xmit_wait,
0,2020-07-27T09:57:02,1595833022873349,0xb8599f0300355d6e,1,0,0,0,0,0,0,0,0,0,0,0,0,
1,2020-07-27T09:57:02,1595833022873363,0xb8599f0300355d6e,2,0,0,0,0,0,0,26700642,0,0,0,0,26700642,
2,2020-07-27T09:57:02,1595833022873374,0xb8599f0300355d6e,3,0,9881980474,0,4540230161607,0,9881980474,0,0,0,0,0,0,
3,2020-07-27T09:57:02,1595833022873396,0xb8599f0300355d6e,4,0,0,0,1339496959094,0,0,21581722,0,0,0,0,21581722,
4,2020-07-27T09:57:02,1595833022873408,0xb8599f0300355d6e,9,24766362454,0,0,0,0,0,0,0,0,0,0,0,
5,2020-07-27T09:57:02,1595833022873419,0xb8599f0300355d6e,10,0,54725808986,0,8222892412792,0,29959446532,33957262,0,0,0,0,33957262, 

Switch Temperature

The switch_temperature command is used to dump switch temperature info for a given date or range of dates into CSV files.

The following presents the help menu of the switch_temperature command: 

CollectX: help switch_temperature
    Usage:
                switch_temperature  [TIME] [out=]
                [TIME] is one of the following:
                                                   last
                                                    date=
                                                    past=n[hours|days]
                [out=] is to specify output file (optional)
    Description:
                Dump switch temperature info for a given date or range of dates.
                If "out=" file name specified, data will be also dumped to that file.
    Examples:
                switch_temperature  filename
                switch_temperature  file=filename
                switch_temperature  date=apr21
                switch_temperature  past=15d out=switch_temperature.csv

The following is an example of a switch_temperature command run:

CollectX: switch_temperature past=10m out=switch_temperature.csv
time,source,timestamp,node_guid,sensor_index,mtmp_sensor_name,temperature,max_temperature,
12T17:05:16.332772,0xe41d2d030003e450,1649783116332772,0xe41d2d030003e450,0,,47,51,
2022-04-12T17:05:16.332772,0xe41d2d030003e450,1649783116332772,0xe41d2d030003e450,1,,30,33,
2022-04-12T17:05:16.332772,0xe41d2d030003e450,1649783116332772,0xe41d2d030003e450,2,,33,37,
2022-04-12T17:05:16.332772,0xec0d9a0300b41a50,1649783116332772,0xec0d9a0300b41a50,0,,58,66,
2022-04-12T17:05:16.332772,0xec0d9a0300b41a50,1649783116332772,0xec0d9a0300b41a50,1,,27,31,
2022-04-12T17:05:16.332772,0xec0d9a0300b41a50,1649783116332772,0xec0d9a0300b41a50,2,,33,37, 
...

Switch Fans

The switch_fans command is used to dump switch fans info for a given date or range of dates into CSV files.

The following presents the help menu of the switch_fans command: 

CollectX: help switch_fans
Usage:
            switch_fans  [TIME] [out=]
            [TIME] is one of the following:
                                                last
                                                date=
                                                past=n[hours|days]
            [out=] is to specify output file (optional)
Description:
            Dump switch fans info for a given date or range of dates.
            If "out=" file name specified, data will be also dumped to that file.
Examples:
            switch_fans  filename
            switch_fans  file=filename
            switch_fans  date=jun04
            switch_fans  past=15d out=switch_fans.csv

The following is an example of a switch_fans command run: 

CollectX: switch_fans past=10m out=switch_fans.csv
 
time,source,timestamp,node_guid,sensor_index,fan_speed,
2020-10-04T17:36:05.287397,0xe41d2d0300169e40,1601822165287397,0xe41d2d0300169e40,1,10288,
2020-10-04T17:36:05.287402,0xe41d2d0300169e40,1601822165287402,0xe41d2d0300169e40,2,8823,
2020-10-04T17:36:05.287403,0xe41d2d0300169e40,1601822165287403,0xe41d2d0300169e40,3,10608,
2020-10-04T17:36:05.287404,0xe41d2d0300169e40,1601822165287404,0xe41d2d0300169e40,4,9118,
…

Switch General

The switch_general command is used to dump general switch info for a given date or range of dates into CSV files.

The following presents the help menu of switch_ general command:

CollectX: help switch_general     
Usage:
            switch_general  [TIME] [out=]
            [TIME] is one of the following:
                                                last
                                                date=
                                                past=n[hours|days]
            [out=] is to specify output file (optional)
Description:
            Dump switch general info for a given date or range of dates.
            If "out=" file name specified, data will be also dumped to that file.
Examples:
            switch_general  filename
            switch_general  file=filename
            switch_general  date=jun04
            switch_general  past=15d out=switch_general.csv

The following is an example of a switch_general command run:

CollectX: switch_general past=10m out=switch_general.csv
 
time,source,timestamp,node_guid,serial_number,part_number,revision,product_name,random_fdb_cap,linear_fdb_cap,linear_fdb_top,mcast_fdb_cap,optimized_slvl_mapping,port_state_change,life_time_value,def_mcast_not_pri_port,def_mcast_pri_port,def_port,part_enf_cap,lids_per_port,mcast_fdb_top,enp0,filter_raw_outb_cap,filter_raw_inb_cap,outb_enf_cap,inb_enf_cap,
2020-10-25T11:41:05.183039,0xe41d2d0300169e40,1603618865183039,0xe41d2d0300169e40,MT1510X10802,MSB7700-EB2F,A6,Scorpion IB EDR,0,49152,7936,16383,1,1,19,255,255,0,32,0,49183,1,1,1,1,1,
2020-10-25T11:42:05.559284,0xe41d2d0300169e40,1603618925559284,0xe41d2d0300169e40,MT1510X10802,MSB7700-EB2F,A6,Scorpion IB EDR,0,49152,7936,16383,1,1,19,255,255,0,32,0,49183,1,1,1,1,1,
2020-10-
…

Bringup – amBER Format

amBER is an output format designed for debugging a cluster in its bringup stage.

The following shows the help menu of the generate amBER report command: 

CollectX: generate_amber_ib_csv past=1h out=amber_ib.csv

For example:

CollectX: help generate_amber_ib_csv
Usage:
            generate_amber_ib_csv TIME [report_type=] [out=] [show_raw_data=]
             TIME can be specified as:
                                        date=
                                        past=n[hours|days] : relative to the current time on the server.
                                        from= to=
             [out=] to specify output file
             [show_raw_data=t|f] boolean to show raw data as is. Default: f
Description:
            Dump amBER IB report for a given date or range of dates
Example:
            generate_amber_ib_csv past=10m
            generate_amber_ib_csv date=jul16 out=amber_ib.csv
            TIME:
            from='sep 23, 2021 16:05:00'
            from='2021-09-23 16:05:00'