What can I help you with?
DOCA Documentation v3.0.0

MAD Congestion Control

Subnet Administration (SA) Management Datagrams (MADs) are General Management Packets (GMPs) used to communicate with the SA entity within an InfiniBand subnet. The SA is typically part of the subnet manager and exists as a single active instance. As a result, congestion at the SA communication level can occur.

Congestion control is handled by limiting the number of outstanding MADs, which are MADs that have been sent but have not yet received a response. A FIFO queue holds additional SA MADs whose transmission is delayed due to exceeding the max_outstanding threshold.

The length of this queue is controlled by queue_size, which helps prevent excessive memory use. If the FIFO queue becomes full, new SA MADs are dropped, and the drops counter is incremented.

Each queued MAD also has an associated timeout (time_sa_mad). When this timeout expires, the MAD is removed from the queue, and the user is notified of the expiration.

This feature is implemented per CA port, and configuration is available via sysfs:

Copy
Copied!
            

/sys/class/infiniband/mlx5_0/mad_sa_cc/ ├── 1/ │ ├── drops │ ├── max_outstanding │ ├── queue_size │ └── time_sa_mad └── 2/ ├── drops ├── max_outstanding ├── queue_size └── time_sa_mad

  • To print the current value:

    Copy
    Copied!
                

    cat /sys/class/infiniband/mlx5_0/mad_sa_cc/1/max_outstanding # Output: 16

  • To change the current value:

    Copy
    Copied!
                

    echo 32 > /sys/class/infiniband/mlx5_0/mad_sa_cc/1/max_outstanding cat /sys/class/infiniband/mlx5_0/mad_sa_cc/1/max_outstanding # Output: 32

  • To reset the drops counter:

    Copy
    Copied!
                

    echo 0 > /sys/class/infiniband/mlx5_0/mad_sa_cc/1/drops

Valid parameter ranges:

Parameter

Range

Default Values

max_oustanding

1–2^20

16

queue_size

16–2^20

16

time_sa_mad

1 ms–10000

20 ms

© Copyright 2025, NVIDIA. Last updated on May 5, 2025.