NVIDIA UFM Enterprise User Manual v6.17.1
NVIDIA UFM Enterprise User Manual v6.17.1 Download PDF

SNMP Plugin

The SNMP plugin is a self-contained Docker container that includes REST API support and is managed by UFM. Its primary function is to receive SNMP traps from switches and forward them to UFM as external events. This feature enhances the user experience by providing additional information about switches in the InfiniBand fabric via UFM events and alarms.

There are two potential deployment options for the SNMP plugin:

  • On UFM Appliance

  • On UFM Software

For detailed instructions on how to deploy the SNMP plugin, refer to this page.

The following authentication types are supported:

  • basic (/ufmRest)

  • client (/ufmRestV2)

  • token (/ufmRestV3)

The following REST API are supported:

  • GET /switch_list

  • GET /trap_list

  • POST /register

  • POST /unregister

  • POST /enable_trap

  • POST /disable_trap

  • GET /version

For more information, please refer to UFM Enterprise Documentation → UFM REST API → SNMP Plugin REST API.

By default, upon initialization, the SNMP plugin captures traps from all switches within the fabric. However, this behavior can be modified through configuration settings utilizing the "snmp_mode" option, with available values of "auto" or "manual".

It is important to ensure that the switch is visible to UFM and has a valid IP address. As illustrated in the following example, switch traps will only be received from "r-ufm-sw61".

snmp1-version-1-modificationdate-1716900198390-api-v2.png

The following is an instance of a trap received by the SNMP plugin and displayed as a UFM event:

snmp2-version-1-modificationdate-1716900199063-api-v2.png

Additionally, there is an option to verify events/alarms for a particular switch:

snmp3-version-1-modificationdate-1716900199373-api-v2.png

The SNMP plugin performs a periodic check of the fabric every 180 seconds, allowing for prompt receipt of traps from new switches or updated IP addresses of existing switches in under 180 seconds. This interval may be adjusted via the "ufm_switches_update_interval" option. To manually register or unregister a switch, please refer to the UFM Enterprise Documentation → UFM REST API → SNMP Plugin REST API.

The SNMP plugin employs the most up-to-date SNMP v3 protocol, which incorporates advanced security measures such as authentication and encryption. The "snmp_version" option enables the selection of SNMP versions “1” or “3”. It is essential to note that only switch-exposed traps will be transmitted to UFM as events.

OID

Name

Description

Status

Severity

MELLANOX-EFM-MIB::testTrap

send-test

A test trap ordered by the system administrator

Enabled

Warning

MELLANOX-EFM-MIB::asicChipDown

asic-chip-down

ASIC (Chip) Down

Enabled

Critical

MELLANOX-EFM-MIB::cpuUtilHigh

cpu-util-high

CPU utilization has risen too high

Enabled

Warning

MELLANOX-EFM-MIB::diskSpaceLow

disk-space-low

Filesystem free space has fallen too low

Enabled

Warning

MELLANOX-EFM-MIB::expectedShutdown

expected-shutdown

Expected system shutdown

Enabled

Info

MELLANOX-EFM-MIB::systemHealthStatus

health-module-status

Health module Status

Enabled

Critical

MELLANOX-EFM-MIB::insufficientFans

insufficient-fans

Insufficient amount of fans in system

Enabled

Warning

MELLANOX-EFM-MIB::insufficientFansRecover

insufficient-fans-recover

Insufficient amount of fans in system recovered

Enabled

Info

MELLANOX-EFM-MIB::insufficientPower

insufficient-power

Insufficient power supply

Enabled

Warning

RFC1213::linkdown

interface-down

An interface's link state has changed to down

Enabled

Minor

RFC1213::linkup

interface-up

An interface's link state has changed to up

Enabled

Info

MELLANOX-EFM-MIB::unexpectedShutdown

unexpected-shutdown

Unexpected system shutdown

Enabled

Minor

SNMPv2-MIB::coldStart

cold-start

SNMP entity reinitialized

Enabled

Info

To learn more about how to enable or disable a specific trap, please refer to the UFM Enterprise Documentation → UFM REST API → SNMP Plugin REST API.

If some traps are not included in the default list, they may be added using the "snmp_additional_traps" option. The SNMP plugin will consider these traps as "enabled" and transmit them to UFM as events with an "Info" severity level.

To ensure the uninterrupted reception of traps from switches within a large fabric, changes must be made to the UFM configuration in the [/opt/ufm/conf/gv.cfg] file's [Events] section. Specifically, the "max_events" option should be raised from 100 to 1000, while "medium_rate_threshold" and "high_rate_threshold" should both be set to 500. To implement configuration adjustments, disable and then enable the plugin.

In case of an event storm, it is necessary to adjust the Event Policy settings such that General Events are non-alarmable and the TTL is set to zero, as illustrated in the following screenshot:

snapshot-version-1-modificationdate-1716900197877-api-v2.png

Additional configurations are located in "/opt/ufm/conf/plugins/snmp/snmp.conf". To implement configuration adjustments, disable and then enable the plugin. For instructions on modifying the appliance, please refer to the UFM-SDN App CLI Guide.

Logs for the SNMP plugin are stored in "/opt/ufm/logs/snmptrap.log". For guidance on accessing logs on the appliance, please refer to the UFM-SDN App CLI Guide.

© Copyright 2024, NVIDIA. Last updated on Aug 27, 2024.