RoCEv2

RoCE has two addressing modes: MAC based GIDs, and IP address based GIDs. If the IP address changes while the system is running, the GID for the port will automatically be updated with the new IP address, using either IPv4 or IPv6.

RoCE IP based allows RoCE traffic between Windows and Linux systems, which use IP based GIDs by default.

A straightforward extension of the RoCE protocol enables traffic to operate in layer 3 environments. This capability is obtained via a simple modification of the RoCE packet format. Instead of the GRH used in RoCE, routable RoCE packets carry an IP header which allows traversal of IP L3 Routers and a UDP header that serves as a stateless encapsulation layer for the RDMA Transport Protocol Packets over IP.

RoCE and RoCE v2 Frame Format Differences

image2019-3-10_16-58-37.png

The proposed RoCE packets use a well-known UDP destination port value that unequivocally distinguishes the datagram. Similar to other protocols that use UDP encapsulation, the UDP source port field is used to carry an opaque flow-identifier that allows network devices to implement packet forwarding optimizations (e.g. ECMP) while staying agnostic to the specifics of the protocol header format.

The UDP source port is calculated as follows: UDP.SrcPort = (SrcPort XOR DstPort) OR 0xC000, where SrcPort and DstPort are the ports used to establish the connection.

For example, in a Network Direct application, when connecting to a remote peer, the destination IP address and the destination port must be provided as they are used in the calculation above. The source port provision is optional.

Furthermore, since this change exclusively affects the packet format on the wire, and due to the fact that with RDMA semantics packets are generated and consumed below the AP applications can seamlessly operate over any form of RDMA service (including the routable version of RoCE as shown in the figure above "RoCE and RoCE v2 Frame Format Differences”), in a completely transparent way(1).

Note (1): Standard RDMA APIs are IP based already for all existing RDMA technologies.

RoCE Protocol Stack

image2019-3-10_17-0-29.png

Warning
  • The fabric must use the same protocol stack in order for nodes to communicate.

  • The default RoCE mode in Windows is MAC based.

  • The default RoCE mode in Linux is IP based.

  • In order to communicate between Windows and Linux over RoCE, please change the RoCE mode in Windows to IP based.

© Copyright 2023, NVIDIA. Last updated on May 23, 2023.