Controlling Telemetry Agent
The connection between the telemetry agent and the controller is performed over TCP socket in JRPC protocol.
Once connectivity is established, it will be restored even if the controller is restarted or if network disconnections occur.
The telemetry agent currently supports two types of requests:
Configuration – configures the agent to start, stop restart the streaming session.
Keepalive – returns telemetry agent capabilities to the controller
The configuration request supports 4 message types: “query”, "replace", "append", "remove", and “remove-all”.
To initiate a query request, send “query”
To create a new telemetry session or to restart a session with new parameters, use "replace"
To stop the currently running telemetry session, send "remove"
In order to easily control the agent, you can use a controller script located inside the docker.
The script is located under /opt/telemetry/session_controller.
Users may use the controller script from the switch CLI.
In order to do so:
Deploy the agent and create trust with the switch as described in "Deploying Docker Image on Mellanox Onyx-Based Systems".
Now you may run controller script from switch. Run:
docker exec neo-agent session-controller
Usage:
/opt/telemetry/session_controller [-h] [-destination_ip <destination_ip>] [-destination_port <destination_port>] [-interval <interval>] [-protocol {TCP/UDP}] [-format {JSON/Influx DB Line Protocol/Protocol Buffers/gRPC}] [-controller_ip <controller_ip>]
Optional Arguments |
Description |
--help, -h |
Show help message and exit |
-destination_ip <destination_ip> |
The default destination IP |
-destination_port <destination_port> |
The default destination port (default is 5123) |
-interval <interval> |
The default collection interval (msec) (default is 1000) |
-protocol {TCP/UDP} |
The default streaming protocol (default is "TCP") |
-format {JSON/Influx DB Line Protocol/Protocol Buffers/gRPC |
The default streaming protocol (default is "Influx DB Line Protocol") |
-controller_ip <controller_ip> |
The IP of the Telemetry controller (default is "0.0.0.0") |
When running the script several options available to perform: Create session, delete session, delete all sessions, status.
And at any point while running the script you may type "return" to go back to the main menu or type "quit" to exit.
The system has default values as shown in the image above, but those values can be changed in the first run of the script or at any time when opening a session.
Script parameters:
Create session – starts a session on the agent, with default parameters or parameters provided on the run
Delete session – deletes a session selected if it is running
Delete all sessions – deletes all running sessions on the agent
Session – provides status of the agent and running sessions, or if a specific session is selected it will provide status on it. Status includes global system errors and running session, per session.
With the script the user can also open a certain session for several collectors. This is done by typing ‘yes’ to ‘add new collector’ query.
Reading collector params...
Please enter destination ip: 10.213.91.145
Please enter destination port (default 5123):
Formats
1. JSON
2. Influx DB Line Protocol
3. Protocol Buffers
4. gRPC
Please select format or enter to continue (default Influx DB Line Protocol): 1
Please enter protocol (TCP/UDP) (default TCP):
Enter "yes" to add new collector or press enter to continue: yes
Reading collector params...
Please enter destination ip: 10.213.91.146
Please enter destination port (default 5123):
Formats
1. JSON
2. Influx DB Line Protocol
3. Protocol Buffers
4. gRPC
Please select format or enter to continue (default Influx DB Line Protocol): 2
Please enter protocol (TCP/UDP) (default TCP):
Enter "yes" to add new collector or press enter to continue:
Interface counter session allows the the user choose a dynamic counter list for data sampling.
Telemetry Sessions
1. WJH - Samples the dropped packets buffer
2. Interface counters - Samples interface counters
3. Threshold events - Events generated every time a defined threshold is crossed
4. Histograms - Samples the buffer histograms
Please select session type: 2
Subscribing session Interface counters...
Filter Settings
Select port counters to be streamed
1. [X] ECN Packets
2. [X] In Broadcast Packets
3. [X] In Discards
4. [X] In Errors
5. [X] In FCS Errors
6. [X] In Multicast Packets
7. [X] In Octets
8. [X] In Oversize Packets
9. [X] In Packets
10. [X] In Packets Jumbo
11. [X] In Packets Of 1024-1518 Bytes
12. [X] In Packets Of 128-255 Bytes
13. [X] In Packets Of 256-511 Bytes
14. [X] In Packets Of 512-1023 Bytes
15. [X] In Packets Of 64 Bytes
16. [X] In Packets Of 65-127 Bytes
17. [X] In Pause Packets
18. [X] In Undersize Packets
19. [X] In Unicast Packets
20. [X] Out Broadcast Packets
21. [X] Out Discards
22. [X] Out Errors
23. [X] Out Multicast Packets
24. [X] Out Octets
25. [X] Out Packets
26. [X] Out Pause Packets
27. [X] Out Unicast Packets
28. [X] Symbol Error
29. [X] Unknown Control Opcode
------------------------------------
0. Select/Unselect all
Enter number to select/unselect or press enter to continue:
Select priority counters to be streamed
1. [X] Bytes
2. [X] No Buffer Discard
3. [X] Packets
4. [X] RX Pause Duration
5. [X] RX Pause Packets
6. [X] Shared Buffer Discard
7. [X] TX Bytes
8. [X] TX No Buffer Discard
9. [X] TX Packets
10. [X] TX Pause Duration
11. [X] TX Pause Packets
12. [X] TX Wred Discard
------------------------------------
Enter number to select/unselect or press enter to continue:
For WJH sessions, there are two options for filtering:
Default – all events except Forwarding with Notice severity
Custom – custom event filtering for each WJH category
Telemetry Sessions
1. WJH - Samples the dropped packets buffer
2. Interface counters - Samples interface counters
3. Threshold events - Events generated every time a defined threshold is crossed
4. Histograms - Samples the buffer histograms
Please select session type: 1
Subscribing session WJH...
Filter Settings
1. Default - All events except Forwarding with Notice severity
2. Custom - Custom events filtering for each WJH category
Please select option: 2
Select WJH Categories
1. [X] ACL
2. [X] L1
3. [X] Forwarding
4. [X] Buffer
------------------------------------
0. Select/Unselect all
Enter number to select/unselect or press enter to continue:
Select ACL Notice severity events to be streamed
1. [X] Ingress port ACL
2. [X] Ingress router ACL
------------------------------------
0. Select/Unselect all
Enter number to select/unselect or press enter to continue:
Select L1 aggregation Error severity events to be streamed
[X] Symbol error
[X] CRC error
------------------------------------
0. Select/Unselect all
Enter number to select/unselect or press enter to continue:
Select L1 aggregation Notice severity events to be streamed
[X] Port state change
------------------------------------
0. Select/Unselect all
Enter number to select/unselect or press enter to continue:
Select L2 Error severity events to be streamed
1. [X] Destination MAC is reserved (DMAC=01-80-C2-00-00-0x)
2. [X] VLAN tagging mismatch
3. [X] Ingress VLAN filtering
4. [X] Unicast MAC table action discard
5. [X] Port loopback filter
6. [X] Source MAC is multicast
7. [X] Source MAC equals destination MAC
------------------------------------
0. Select/Unselect all
Enter number to select/unselect or press enter to continue:
Select L2 Warning severity events to be streamed
1. [X] Multicast egress port list is empty
------------------------------------
0. Select/Unselect all
Enter number to select/unselect or press enter to continue:
Select L2 Notice severity events to be streamed
1. [ ] MLAG port isolation
2. [ ] Ingress spanning tree filter
------------------------------------
0. Select/Unselect all
Enter number to select/unselect or press enter to continue:
Select L3 Error severity events to be streamed
1. [X] Unicast destination IP but multicast destination MAC
2. [X] Destination IP is loopback address
3. [X] Source IP is multicast
4. [X] Source IP is in class E
5. [X] Source IP is loopback address
6. [X] Source IP is unspecified
7. [X] Checksum or IPver or IPv4 IHL too short
8. [X] Multicast MAC mismatch
9. [X] Source IP equals destination IP
10. [X] IPv4 source IP is limited broadcast
11. [X] IPv4 destination IP is local network (destination=0.0.0.0/8)
12. [X] IPv4 destination IP is link local
------------------------------------
0. Select/Unselect all
Enter number to select/unselect or press enter to continue:
Select L3 Warning severity events to be streamed
1. [X] Blackhole route
2. [X] Unresolved neighbor/next-hop
3. [X] Blackhole ARP/neighbor
4. [X] Ingress router interface is disabled
5. [X] Egress router interface is disabled
6. [X] IPv4 routing table (LPM) unicast miss
7. [X] IPv6 routing table (LPM) unicast miss
8. [X] Router interface loopback
9. [X] Packet size is larger than router interface MTU
10. [X] TTL value is too small
------------------------------------
0. Select/Unselect all
Enter number to select/unselect or press enter to continue:
Select L3 Notice severity events to be streamed
1. [ ] Non-routable packet
2. [ ] IPv6 destination in multicast scope FFx0:/16
3. [ ] IPv6 destination in multicast scope FFx1:/16
4. [ ] Non IP packet
------------------------------------
0. Select/Unselect all
Enter number to select/unselect or press enter to continue:
Select Tunnel Error severity events to be streamed
1. [X] Overlay switch - Source MAC is multicast
2. [X] Overlay switch - Source MAC equals destination MAC
3. [X] Decapsulation error
------------------------------------
0. Select/Unselect all
Enter number to select/unselect or press enter to continue:
Select Buffer aggregation Warning severity events to be streamed
[X] Tail drop
[X] WRED
------------------------------------
0. Select/Unselect all
Enter number to select/unselect or press enter to continue:
Reading collector params...
Please enter destination ip: 10.213.91.100
Please enter destination port (default 5123): 5003
Formats
1. JSON
2. Influx DB Line Protocol
3. Protocol Buffers
4. gRPC
Please select format or enter to continue (default Influx DB Line Protocol):
Please enter protocol (TCP/UDP) (default TCP):
Enter "yes" to add new collector or press enter to continue:
It is not allowed to unselect all categories.
The data interchange between the controller and the telemetry agent takes place over JRPC. JRPC or JSON-RPC is a remote procedure call protocol encoded in JSON. JRPC protocol is used for passing the OpenConfig telemetry data in order to configure the telemetry agent session.
Telemetry Data Example
Interface counters data example:
...interfacecounterforoneport{ "cli_counter": { "in_broadcast_pkts": 0, "in_fcs_errors": 0, "in_multicast_pkts": 260, "in_octets": 45024, "in_packets": 260, "in_packets_jumbo": 0, "in_ucast_pkts": 0, "out_broadcast_pkts": 6, "out_multicast_pkts": 4092, "out_octets": 293576, "out_packets": 4098, "out_ucast_pkts": 0 }, "port": "Eth1/17", "rfc_2819_counter": { "in_oversize_packets": 0, "in_packets_of1024to1518_bytes": 0, "in_packets_of128to255_bytes": 529, "in_packets_of256to511_bytes": 0, "in_packets_of512to1023_bytes": 0, "in_packets_of64_bytes": 0, "in_packets_of65to127_bytes": 0, "in_undersize_packets": 0 }, "rfc_2863_counter": { "in_discards": 0, "in_errors": 0, "out_discards": 0, "out_errors": 0 }, "rfc_3635_counter": { "in_pause_packets": 0, "out_pause_packets": 0, "symbol_error": 0, "unknown_control_opcode": 0 }, "speed": 12500000000, "pri_counters": [ { "priority": "0", "rx_pause_duration": 0, "rx_pause_pkts": 0, "tx_pause_duration": 0, "tx_pause_pkts": 0 }, … ], "buffer_counters": [ { "buffer_id": "0", "bytes": 45024, "no_buffer_discard": 0, "pkts": 260, "shared_buffer_discard": 0 }, … ], "tc_counters": [ { "traffic_class": "0", "tx_bytes": 489216, "tx_no_buffer_discard": 0, "tx_pkts": 7644, "tx_wred_discard": 0 }, … ], "extended_counter": { "ecn_packets": 0 }, ...
Histogram data example:
{ "device_ip": "10.209.36.26", "hist_map": { "Eth1/31.0.0": 692024, "Eth1/31.0.1": 0, "Eth1/31.0.2": 0, "Eth1/31.0.3": 0, "Eth1/31.0.4": 0, "Eth1/31.0.5": 0, "Eth1/31.0.6": 0, "Eth1/31.0.7": 0, "Eth1/31.0.8": 0, "Eth1/31.0.9": 0 }, "ts_seconds": 1595160255, "ts_useconds": 106886 }
Threshold events data example:
{ "deviceIp": "10.209.37.249", "tsUseconds": 989349, "highestOccupiedBin": "0""thresholdCrossing": "Falling", "interface": "Eth1/8", "histogram": { "1": "0", "0": "399334", "3": "0", "2": "0", "5": "0", "4": "0", "7": "0", "6": "0", "9": "0", "8": "0" }, "event": "BufferOccupancyOnyx", "tsSeconds": "1593683390" }
Sample of WJH Events in JSON format:
Forwarding WJH events example:
{ "device_ip": "10.209.37.251", "device_name": "ufm-switch18", "drop_info": [ { "category": "Forwarding", "in_port": "Eth1/29", "packet": { "ethernet": { "dst_mac": "01:80:c2:00:00:01", "ether_type": 2048, "ether_type_name": "Internet Protocol version 4 (IPv4) (0x0800)", "src_mac": "e4:1d:2d:66:d8:6a" }, "ip": { "dst_ip": "1.1.1.253", "length": 10240, "protocol": 6, "protocol_name": "TCP (0x06)", "src_ip": "1.1.1.1", "ttl": 64, "version": 4 }, "transport": { "dst_port": 4000, "dst_port_name": "4000", "src_port": 5001, "src_port_name": "5001" } }, "packet_type": "TRANSPORT", "reason": { "description": "Destination MAC is reserved (DMAC=01-80-C2-00-00-0x)", "id": 202, "recommended_action": "Bad packet was received from the peer", "severity": "Error" }, "subcategory": "L2", "timestamp": { "nano": "992616889", "seconds": "1595246720" } }, { "category": "Forwarding", "in_port": "Eth1/29", "packet": { "ethernet": { "dst_mac": "7c:fe:90:e3:d4:88", "ether_type": 2048, "ether_type_name": "Internet Protocol version 4 (IPv4) (0x0800)", "src_mac": "e4:1d:2d:66:d8:6a", "vlan_id": 300 }, "ip": { "dst_ip": "1.1.1.253", "length": 10240, "protocol": 6, "protocol_name": "TCP (0x06)", "src_ip": "1.1.1.1", "ttl": 64, "version": 4 }, "transport": { "dst_port": 4000, "dst_port_name": "4000", "src_port": 5001, "src_port_name": "5001" } }, "packet_type": "TRANSPORT", "reason": { "description": "Ingress VLAN filtering", "id": 204, "recommended_action": "Validate the VLAN membership configuration on both ends of the link", "severity": "Error" }, "subcategory": "L2", "timestamp": { "nano": "997807206", "seconds": "1595246720" } } ], "ts_seconds": "1595246721", "ts_useconds": 27236 }
ACL WJH events example:
{ "device_ip": "10.209.37.251", "device_name": "ufm-switch18", "drop_info": [ { "acl": { "acl_name": "deny_mac_list", "acl_rule": "Priority[0];KEY[DMAC: 00:00:00:00:00:00/00:00:00:00:00:00];KEY[SMAC: 00:00:00:00:00:00/00:00:00:00:00:00];ACTION[FORWARD: FORWARD_ACTION = DISCARD];" }, "category": "ACL", "in_port": "Eth1/29", "packet": { "ethernet": { "dst_mac": "7c:fe:90:f2:8c:50", "ether_type": 2048, "ether_type_name": "Internet Protocol version 4 (IPv4) (0x0800)", "src_mac": "50:6b:4b:cc:e3:e4" }, "ip": { "dst_ip": "16.0.0.1", "length": 10240, "protocol": 6, "protocol_name": "TCP (0x06)", "src_ip": "16.0.0.1", "ttl": 64, "version": 4 }, "transport": { "dst_port": 80, "dst_port_name": "http (80)", "src_port": 20, "src_port_name": "ftp-data (20)" } }, "packet_type": "TRANSPORT", "reason": { "description": "Ingress port ACL", "id": 601, "recommended_action": "Validate ACL configuration", "severity": "Notice" }, "subcategory": "ACL", "timestamp": { "nano": "638527727", "seconds": "1595247112" } } ], "ts_seconds": "1595247113", "ts_useconds": 527251 }
Buffer WJH events example:
{ "device_ip": "10.209.37.122", "device_name": "neo-switch02", "drop_info": [ { "buffer": { "end_timestamp": { "nano": "643013147", "seconds": "1595402760" }, "event_count": "154", "start_timestamp": { "nano": "176441547", "seconds": "1595402759" } }, "category": "Buffer", "in_port": "Eth1/3", "packet": { "ip": { "dst_ip": "1.1.49.21", "protocol": 17, "protocol_name": "UDP (0x11)", "src_ip": "1.1.49.11" }, "transport": { "dst_port": 20000, "dst_port_name": "20000", "src_port": 54726, "src_port_name": "54726" } }, "reason": { "description": "WRED", "id": 504, "recommended_action": "Monitor network congestion", "severity": "Warning" }, "subcategory": "Buffer", "timestamp": { "nano": "643013147", "seconds": "1595402760" } } ], "ts_seconds": "1595402761", "ts_useconds": 169816 }
The telemetry data contains data for all the supported telemetry counters for every active switch port.
Before activating a histogram or an event threshold session, the required traffic class must be configured on the switch (via CLI).
The collector should be implemented in a way that allows to create a GRPC connection between the telemetry agent and the collector. For GRPC, PROTO3 encoding is used. The protocol buffer files needed for decoding are located inside the telemetry agent container under /opt/telemetry/proto/.