NVIDIA NVDA-OS XC User Manual for NVIDIA MetroX-3 XC Appliance v18.02.1000
v18.02.1000

General MetroX Deployment Guidelines

MetroX-3 is the software data-path application for encapsulating encrypted InfiniBand traffic over long-haul links.

image2022-11-27_16-51-56.png

Simply put, each MetroX box pair can be viewed as independent component from the other MetroX pairs in the network.

image2022-11-27_16-24-3.png

For all instances and purposes, the MetroX appliances and all the devices between them are invisible to the InfiniBand devices on each side, and both site A and site B are on the same InfiniBand fabric, as far as the Subnet Manager (SM) is aware.

The MetroX does not respond to any InfiniBand packet, including management datagrams (MAD) from the SM. It simply encapsulates the packet and sends it to the second MetroX.

Warning

The SM is not aware of any of the MetroX appliances.

Ethernet Guidelines

Each of the MetroX ethernet ports (called long-haul/LH ports 1/2 and 2/2) have a matching port on the other MetroX to open RoCE (RDMA over Converged Ethernet) session over the long-haul ports.

Warning
  • In following guidelines LH1/2 of one appliance will be paired with the LH1/2 of the other appliance and, similarly, LH2/2 of both appliances will be paired.

  • There is no restriction on this pair choice, LH2/2 can be matched with other side's 1/2, if needed.

Important
  • Every port should have its address on the same subnet as the port it is paired with.

  • Every port pair should have different subnet than the other pair.

  • Ensure the Ethernet Medium allows connection between the two long-haul port pairs.

    An example of is a valid configuration (where LH1/2 is matched with LH1/2 on the second appliance, and LH2/2 with LH2/2 on the other side):

    image2022-11-27_15-50-50.png

    Invalid configuration due to first port pair in different subnet (traffic will pass only on first LH pair):

    image2022-11-27_15-51-8.png

    Invalid configurations due to both pairs being on the same subnet:

    image2022-11-27_15-51-40.png


InfiniBand Guidelines

Assume site A and site B from the figure above are directly connected by wire without any equipment in the middle. As such, all devices on both sites should be considered to be on the same InfiniBand fabric.

Warning

Configure a single VL on the ports of the InfiniBand switches connected directly to the MetroX appliance (e.g., use the command "interface ib 1/1 op-vls 1").

Warning
  • Ensure the Subnet Manager is running in the InfiniBand cluster with IPoIB enabled

  • The IPs and subnet masks described in the diagram below are provided as an example—actual IP and subnet masks may differ in actual customer deployments

For this section, the commands to configure the following setup will be shown:

image2022-11-27_15-52-4.png

On MetroX3-A:

Copy
Copied!
            

MetroX3-A > enable MetroX3-A# configure terminal MetroX3-A (config) # interface long-haul 1/2 ip address 1.1.1.1/24 MetroX3-A (config) # interface long-haul 2/2 ip address 2.2.2.2/24 MetroX3-A (config) # interface long-haul 1/2 remote ip address 1.1.1.10 MetroX3-A (config) # interface long-haul 2/2 remote ip address 2.2.2.20

On MetroX3-B:

Copy
Copied!
            

MetroX3-B > enable MetroX3-B # configure terminal MetroX3-B (config) # interface long-haul 1/2 ip address 1.1.1.10/24 MetroX3-B (config) # interface long-haul 2/2 ip address 2.2.2.20/24 MetroX3-B (config) # interface long-haul 1/2 remote ip address 1.1.1.1 MetroX3-B (config) # interface long-haul 2/2 remote ip address 2.2.2.2

To verify results use the command "show interfaces long-haul 1/2" and "show interfaces long-haul 2/2"

Copy
Copied!
            

MetroX3-A (config) # show interface long-haul 1/2  LH1/2:   Admin state            : Enabled   Operational state      : Up   Description            :   MAC address            : B8:CE:F6:72:B0:25   MTU                    : 9216   FEC                    : auto   Supported speeds       : 100G   Actual speed           : 100G   Auto-negotiation       : Enabled   Actual latency (ms)    : 35   Remote KA status       : Success   Last success KA message: 2022/07/17 12:28:31     IPv4 address:     1.1.1.1/24     Remote IPv4 address:     1.1.1.10     Rx:     packets      : 19414     bytes        : 10054538     error packets: 0     Tx:     packets      : 19402     bytes        : 10733466     error packets: 0

Warning
  • It is important to verify "Admin state" and "operational state" are OK and that "Remote KA status" field is "Success"

  • its possible to ping the other port pair by using the command

  • In case the "Remote KA status" is "Failure" and there is a ping to the remote IP's address, it is recommended to wait 60 seconds to let the internal mechanism sync with the other MetroX

© Copyright 2023, NVIDIA. Last updated on Sep 8, 2023.