ZTR-RTT Congestion Control Algorithm Overview v1.0
NVIDIA Docs Hub Homepage  NVIDIA Networking  Networking Software  Adapter Software  ZTR-RTT Congestion Control Algorithm Overview v1.0  Congestion Control

On This Page

Congestion Control

Congestion Control provides performance isolation when multiple applications running on the same cluster. Additionally, it prevents congestion spreading when there is a slow receiver, reduce latency in the cluster, improves fairness, prevents parking-lot effects and packet's drop in lossy networks.

The diagram below shows an example of head of the line blocking scenario.

image-2024-9-18_13-4-10-version-1-modificationdate-1727268393957-api-v2.png

Head of the Line Blocking Scenario

Datacenter Congestion Control Challenges

Developing a congestion control algorithm for datacenters present the following challenges:

  • Several µ-sec of latency with hundreds of Gbps of bandwidth

    • Congestion buildup is fast, so the congestion loop should be short

  • A wide variety of traffic types, topologies and applications

    • Hard to develop an algorithm that suits all

    • Congestion Control algorithms are constantly being introduced with new congestion indications

  • Hardware implementation is not robust enough

  • Software implementation reacts too slow

ZTR-RTT CC Infrastructure

To face the challenges above, NVIDIA CC algorithm is developed on top of an infrastructure with the following characteristics:

image-2024-9-18_13-9-43-version-1-modificationdate-1727268393210-api-v2.png

ZTR RTTCC Infrastructure

RTT Measurement Flow

image-2024-9-18_13-11-54-version-1-modificationdate-1727268392313-api-v2.png

RTT Measurement Flow

ZTR RTTCC Algorithm

image-2024-9-25_17-20-17-version-1-modificationdate-1727274097950-api-v2.png

ZTR RTTCC Algorithm

© Copyright 2024, NVIDIA. Last updated on Sep 29, 2024
content here