Adaptive Retransmission: Parameters Control

Algorithm & Profile Description and Formulas

The following is the Adaptive Retransmission (ADP) timeout algorithm flow.

  • Initial timeout value = rand(profile.timeout_init_low_bound, profile.timeout_init_low_bound + profile.timeout_init_range_size - 1)

  • HW timer = profile.time_base * (2 ^ granularity)

    • Granularity = Initial timeout

  • On the first timeout event:

    • If current_timeout falls within any defined range (range_low_boundrange_low_bound + range_size), the algorithm continues within that range.

    • Otherwise, it starts from profile.timeout_range[profile.start_range_index].range_low_bound.

Notes:

  • Initial timeout is randomized to prevent multiple QPs from starting with the same value.

  • A good configuration ensures the initial timeout falls within a defined range.

  • The initial timeout value is retried only once.

  • time_base must be a power of 2, with the minimum allowed value per firmware capability (4 µs).

  • current_timeout is used profile.timeout_range[current_range].timeout_retry_num times.

  • Once exhausted, current_timeout is doubled until it reaches

    (profile.timeout_range[current_range].range_low_bound + profile.timeout_range[current_range].range_size).

  • The algorithm then continues with the next defined range.

Notes:

  • Each range defines one or more timeout values.

  • Timeout ranges must be sorted by range_low_bound (range_low_bound[i] < range_low_bound[j] for i < j).

  • Maximum timeout is capped by QP.ack_timeout, regardless of configured ranges.

  • The overall timeout is determined by:

    • profile.qp_total_timeout ? (QP.ack_timeout * QP.retry_num) : profile.retx_total_timeout

  • If no forward progress is made within the total timeout, the QP fails and transitions to ERR state with error code IBV_WC_RETRY_EXC_ERR (Transport Retry Counter Exceeded).

  • current_timeout is decremented based on profile.timeout_range[current_range].dec_mode:

    • Decrement by 2
    • Decrement by 4
    • Or reset to profile.timeout_range[current_range].range_low_bound of the current range.
  • If current_timeout reaches the lowest timeout in the range, move to the previous range defined by profile.timeout_range[current_range].prev_range_index.

Notes:

  • The minimum enforced timeout is adp_retx_base_timeout_min (4 µs).
  • prev_range_index must always be less than current_range.
  • If no profile is selected, the ADP algorithm defaults to firmware-defined timeouts.

  • When a profile is selected, it applies to all QPs of the PF and its associated VFs.

  • Reverting to firmware-defined timeouts requires a fwreset or system reboot.

  • To check which mode is active:

    • adp_retx_profile_id = 0 → firmware-defined timeouts

    • adp_retx_profile_id > 0 → profile-based timeouts

© Copyright 2025, NVIDIA. Last updated on Aug 19, 2025.