Distributed Telemetry - Switch Telemetry Agent

NVIDIA UFM Telemetry Documentation v1.17.0

Distributed Telemetry (DT) is UFM Telemetry mode when the whole fabric telemetry is sampled from managed switches and hosts.

  • Managed switch samples itself and hosts connected directly to it

  • Each managed switch TI reports to one of several host TIs via MADs

  • If a fabric GUID/Port can be sampled, but not sampled by a switch, host TI will sample it.

The whole process is orchestrated by UFM Telemetry Manager (UTM) on the top load balancer.
Switch telemetry is organized as a docker container (Switch Agent) with a telemetry package inside.

distributed_2-Deploying_DT_(1)-version-1-modificationdate-1714729261240-api-v2.png

Distributed Telemetry components relation

Terminology

Switch Telemetry docker image contains:

  • Switch Agent is an HTTP server running inside of the container.

  • Switch Telemetry Instance (Switch TI) that can be started or stopped within the container

To deploy/remove Switch Agent = to deploy/remove Switch Telemetry Container.

The deployment process is described in UFM Telemetry Manager (UTM).

Note

Switch Telemetry docker image is being docker pulled by the deployment script.

After preparing the setup enable Distributed Telemetry in utm_config.ini , as explained in the chapter Configuration File .

Running Distributed telemetry via HTTP API

To get help for all HTTP API endpoints use /help endpoint:

UTM HTTP API help

Copy
Copied!
            

curl -s http://127.0.0.1:8888/help

UTM HTTP API allows users to:

  • get the status of Switch Agent/Telemetry instances

  • deploy/remove Switch Agent containers

  • start/stop Switch Telemetry inside of deployed containers

  • set switches IP list to be periodically monitored.

Recommended flow to work with Distributed Telemetry:

  • Deploy:

  • Check the switches status and find IP list to work with

  • Set monitoring switch list and deploy Switch Agents to this list

  • Start switch TIs

  • Cleanup:

  • Stop switch TIs

  • Remove Switch Agents

  • Check switches status

Detailed instructions for Switch Agent and Switch Telemetry are listed in the following subsections.

API for Switch Agents

  • Get the status of the managed switches in JSON format.

    get managed switches status

    Copy
    Copied!
                

    # all managed switches curl -s http://127.0.0.1:8888/managed_switches_status   # managed switches set to periodic monitoring only: curl -s http://127.0.0.1:8888/managed_switches_status?monitored_only=1

    /status endpoint provides JSON object per managed switch, which shows basic info about the switch, status of Switch Agent, and Switch Telemetry (If Switch Agent is installed to the switch).

  • Set switch IP list for periodic monitoring. Monitoring updates switch information for /managed_switches_status endpoint.

    set monitoring IP list

    Copy
    Copied!
                

    # monitor only IP1,IP2: curl -s http://127.0.0.1:8888/switch_mon_list?ip_list=IP1,IP2   # monitor all managed switches curl -s http://127.0.0.1:8888/switch_mon_list?ip_list=all

  • Deploy switch agents to a list of managed switches.

    deploy Switch Agents

    Copy
    Copied!
                

    # deploy to all the managed switches: curl -s http://127.0.0.1:8888/deploy_switch_agents?ip_list=all   # deploy to switches with IPs IP1 and IP2: curl -s http://127.0.0.1:8888/deploy_switch_agents?ip_list=IP1,IP2

  • Remove switch agents from a list of managed switches

    remove Switch Agents

    Copy
    Copied!
                

    # deploy to all the managed switches: curl -s http://127.0.0.1:8888/remove_switch_agents?ip_list=all   # deploy to switches with IPs IP1 and IP2: curl -s http://127.0.0.1:8888/remove_switch_agents?ip_list=IP1,IP2 

API for Switch Telemetry:

  • Start switch telemetry. Note at least one managed host TI should run.

    start Switch Telemetry

    Copy
    Copied!
                

    # for all running Switch Agents curl -s http://127.0.0.1:8888/start_switch_telemetry    # at a specific switch IP: curl -s http://127.0.0.1:8888/start_switch_telemetry?ip=IP1

  • Stop switch telemetry:

    stop Switch Telemetry

    Copy
    Copied!
                

    # for all running Switch Agents curl -s http://127.0.0.1:8888/stop_switch_telemetry   at a specific switch IP: curl -s http://127.0.0.1:8888/stop_switch_telemetry?ip=IP1

© Copyright 2024, NVIDIA. Last updated on May 6, 2024.