NVIDIA MLNX-GW User Manual for NVIDIA Skyway Appliance v8.1.2000
NVIDIA MLNX-GW User Manual for NVIDIA Skyway Appliance v8.2.2200

MLNX-GW—Getting Started

Gateway Initialization

To initialize the gateway, follow the steps below.

  1. Enable remote access to serial console via IPMI.

    Warning

    Steps a through d, below, instruct how to find the MAC address for the IPMI port from inside the BIOS, even though the outside of the chassis is labeled with the IPMI port MAC address.

    1. Connect a VGA monitor and USB keyboard directly to the NVIDIA Skyway appliance.

    2. To enter the BIOS, reboot the NVIDIA Skyway appliance and press <DEL> during bootup until the BIOS window pops up.

    3. Go to “Server Mgmt.” tab → “BMC network configuration."

    4. The “Station IP address” is the address of the IPMI controller. DHCP may need to be configured in order to provide a lease for the MAC address. The NVIDIA Skyway appliance has 2 LAN ports on the back panel of the appliance that can be used for IPMI (in the figure below IPMI LAN2 is used).

      image2020-10-15_16-7-33.png

      image2021-3-8_9-37-36.png

    5. Use the following IPMI command to remotely access serial console (user and password should be “admin” by default).

      Copy
      Copied!
                  

      ipmitool -I lanplus -H <IPMI_CONTROLLER_IP> -U <user> -P <password> sol activate

      Example:

      Copy
      Copied!
                  

      ipmitool -I lanplus -H 10.7.113.60 -U admin -P admin sol activate

    Warning

    Make sure to connect to the console SOL port of the gateway and not to the management port. Of the four ports, either of the outer ports (1st or 4th port—either to IPMI LAN1 or IPMI LAN 4, in the image above) can be selected to be the SOL port.

    Warning

    Once operating system boots, iKVM over HTML5 no longer shows any output. However, iKVM over HTML5 can be used for BIOS configurations at the very beginning of the system boot sequence right before the operating system boots.

  2. Configure Console Redirection. This configuration allows to use remote IPMI to see all serial output that comes after the initial boot, useful for monitoring the OS init flow.

    1. Go to “Advanced” tab → “Serial Port Console Redirection” → Under “Serial Communication via IPMI COM".

    2. Set “Console Redirection” to “Enabled”.

      Important

      At this point, make sure to disconnect the VGA monitor and USB keyboard, or else the following error may appear:
      TSC_DEADLINE disabled due to Errata; Please update microcode to version : 0xffffffff or later

  3. Log in as admin and use admin as password, using IPMI.

    Copy
    Copied!
                

    ipmitool -I lanplus -H <IP Address> -U admin -P admin sol activate

  4. Connect the management Ethernet cable to LAN3 (second port from the left) on the back panel of the appliance.

    image2021-3-8_9-40-4.png

  5. Go through the Gateway Management configuration wizard.

    IP Configuration by DHCP

    Wizard Session Display (Example)

    Comments

    Do you want to use the wizard for initial configuration? yes

    This configuration must be performed the first time the gateway is operated or after resetting the gateway to the factory defaults.
    Type “y” and then press <Enter>.

    Step 1: Hostname? [gateway-1]

    To accept the default hostname, press <Enter>.
    Otherwise, type a different hostname and press <Enter>.

    Step 2: Use DHCP on mgmt0 interface? [yes]

    Perform this step to obtain an IP address for the gateway (mgmt0 is the management port of the gateway).

    • Typing “yes” will have the DHCP server assign the IP address

    • Typing “no” (no DHCP) will offer the use of the “zeroconf” configuration or not. For the use of Zeroconf, type "yes" and the session will continue. If “no” (no Zeroconf) is typed, enter a static IP and the session will continue.

    Step 3: Enable IPv6 [yes]

    Perform this step to enable IPv6 on management ports.

    • Type "yes" to enable enable IPv6.

    • Type “no” to not enable IPv6 (Step 4 will be skipped)

    Step 4: Enable IPv6 autoconfig (SLAAC) on mgmt0 interface

    Perform this step to enable StateLess address autoconfig on external management port.

    • Type "yes" to enable

    • Type "no" to disable

    Step 5: Use DHCPv6 on mgmt0 interface? [yes]

    Perform this step to enable DHCPv6 on the MGMT0 interface.

    Step 6: Enable password hardening? 

    Perform this step to enable/disable password hardening on your machine. If enabled, new passwords will be checked upon configured restrictions. If you wish to enable it, type “yes” and press . If you wish to disable it, enter “no”

    Step 7: Admin password (Must be typed)? <new_password>

    To avoid illegal access to the machine, type a password and press <Enter>.

    An admin password must be entered upon initial configuration. Due to California Senate Bill No. 327, this stage is required and cannot be skipped.

    Step 8: Confirm admin password? <new_password>

    Confirm the password by re-entering it. Note that password characters are not printed.

    Step 9: Monitor password (Must be typed)? <new_password>

    To avoid illegal access to the machine, please type a password and then press <Enter>.

    An admin password must be entered upon initial configuration. Due to California Senate Bill No. 327, this stage is required and cannot be skipped.

    Step 10: Confirm monitor password? <new_password>

    Confirm the password by re-entering it. Note that password characters are not printed.

    You have entered the following information:
    Hostname: <gateway name>
    Use DHCP on mgmt0 interface: yes
    Enable IPv6: yes
    Enable IPv6 autoconfig (SLAAC) on mgmt0 interface: yes
    Enable DHCPv6 on mgmt0 interface: no
    Enable password hardening: yes
    Admin password (Enter to leave unchanged): (CHANGED)
    To change an answer, enter the step number to return to.
    Otherwise hit <enter> to save changes and exit.
    Choice: <Enter>
    Configuration changes saved.
    To return to the wizard from the CLI, enter the “configuration jump-start” command
    from configuration mode. Launching CLI...
    <gateway name> [standalone: master] >

    The wizard displays a summary of choices and then asks to confirm the choices or to re-edit them.

    • Press <Enter>, to save changes and exit

    • Enter the relevant configuration step number, to edit any of the choices

    To run the command “configuration jump-start”, Config mode must be used.

  6. Check the mgmt0 interface configuration before attempting a remote connection (e.g., SSH) to the gateway. Specifically, verify the existence of an IP address.

    Copy
    Copied!
                

    gateway # show interfaces mgmt0   Interface mgmt0 status: Comment : Admin up : yes Link up : yes DHCP running : yes IP address : 10.7.148.61 Netmask : 255.255.0.0 IPv6 enabled : yes Autoconf enabled: no Autoconf route : yes Autoconf privacy: no DHCPv6 running : no IPv6 addresses : 1   IPv6 address: fe80::268a:7ff:fe53:3d8e/64   Speed : 1000Mb/s (auto) Duplex : full (auto) Interface type : ethernet Interface source: physical MTU : 1500 HW address : 00:02:c9:11:a1:b2   Rx: 11700449 bytes 55753 packets 0 mcast packets 0 discards 0 errors 0 overruns 0 frame   Tx: 5139846 bytes 28452 packets 0 discards 0 errors 0 overruns 0 carrier 0 collisions 1000 queue len

Rerunning the Wizard

To rerun the wizard, do the following:

  1. Enter config mode.

    Copy
    Copied!
                

    gateway > enable gateway # config terminal

  2. Rerun the wizard.

    Copy
    Copied!
                

    gateway (config) # configuration jump-start

Starting the Command Line Interface (CLI)

  1. Set up an Ethernet connection between the gateway and a local network machine using a standard SOL connector.

  2. Start a remote secured shell (SSH) to the gateway using the command “ssh -l <username> <gateway ip address>”.

    Copy
    Copied!
                

    rem_mach1 > ssh -l <username> <ip address>

  3. Log in to the gateway (default username and password are both "admin").

  4. Read and accept the EULA, when prompted.

  5. Once the following prompt appears, the system is ready to use.

    1. Copy
      Copied!
                  

      Mellanox Gateway   Password: Last login: <time> from <ip-address>   gateway >

Warning

If firmware was upgraded, firmware boot bar will appear and the CLI will be blocked until firmware upgrade is complete.

Warning

The CLI will be blocked until InfiniBand virtual interfaces are created. The following message will appear : "Creating VFs".

image2021-3-7_17-47-31.png

Skyway GA100 is an appliance-based InfiniBand-to-Ethernet gateway, enabling Ethernet-based communications to access the InfiniBand datacenter and vice versa. The following section describes networkwide guidelines and provides a specific example when using a NVIDIA Ethernet switch running NVIDIA Onyx™operating system.

Warning

Ensure the Subnet Manager is running in the InfiniBand cluster with IPoIB enabled.

Warning

The IPs and subnet masks described in the diagram below are provided as an example. Actual IP and subnet masks may differ in actual customer deployments.

image2022-3-22_15-19-42.png

General Networkwide Guidelines

Ethernet Guidelines

The connection between the Skyway and the Ethernet router requires configuring a LAG with active LACP on the Ethernet router (see step 3 in the "Configuring IP Addresses and Routes" below).
For increased resiliency, it is recommended to configure Ethernet routers in an MLAG configuration.

Warning

Make sure MTU on the Ethernet router connected to Skyway isat least 2 bytessmaller than the InfiniBand IPoIB MTU configuration (e.g. set InfiniBandMTU to 4092 and Ethernet MTU to 4090).

InfiniBand Guidelines

Warning

All InfiniBand ports must be connected to the same InfiniBand fabric.

Subnet Manager Configuration

Warning

Ensure the Subnet Manager is running in the InfiniBand cluster with IPoIB enabled.

Virtualization must be enabled by the Subnet Manager (SM). It is recommended to unlimit the maximum number of ports that are processed simultaneously.

If opensm runs on an InfiniBand switch, configure the following:

Copy
Copied!
            

switch (config) # ib sm virt enable switch (config) # ib sm virt-max-ports-in-process 0

If opensm runs on a host, add the following lines to the opensm.conf (by default at /etc/opensm/opensm.conf):

Copy
Copied!
            

# Virtualization support # 0: Ignore Virtualization - No virtualization support # 1: Disable Virtualization - Disable virtualization on all # Virtualization supporting ports # 2: Enable Virtualization - Enable (virtualization on all # Virtualization supporting ports virt_enabled 2     # Maximum number of ports to be processed simultaneously # by Virtualization Manager (0 - process all pending ports) virt_max_ports_in_process 0

Configuring IP Addresses and Routes

Warning

The first port (port #1) of each HCA is an InfiniBand port and the second port (port #2) is an Ethernet port; therefore, the configuration of an InfiniBand "device/port" value should be "x/1" and the configuration of an Ethernet "device/port" value should be "x/2".

For example, for HCA #7, the configuration of the InfiniBand port is 7/1 and of the Ethernet port is 7/2.

Warning

IP addresses, subnet masks, port numbers, and interface names are used as an example and may vary according to the actual connectivity of the customer's deployment.

  1. On the relevant InfiniBand nodes, configure an IP address on each InfiniBand port designated for the Skyway deployment (e.g., ib0). In addition, configure a default route with the Skyway IP as next hop (in this example, 1.1.1.3).

    Copy
    Copied!
                

    # ifconfig ib0 1.1.1.2/24 # ip route add 0/0 via 1.1.1.3

    Warning

    The NVIDIA Skyway IP which is configured as next hop should match the virtual IP of the InfiniBand port channel of the NVIDIA Skyway appliance.

  2. Access and configure an IP address on the gateway's Ethernet and InfiniBand ports and configure a virtual IP address on the InfiniBand port. In addition, configure a route to the customer's Ethernet networks via the IP assigned on the Ethernet router's port (in this example, 2.2.2.1).

    Copy
    Copied!
                

    gateway > enable gateway # configure terminal gateway (config) # interface ib port-channel 1 ip address 1.1.1.1/24 gateway (config) # interface ib port-channel 1 virtual ip address 1.1.1.3/24 gateway (config) # interface ethernet port-channel 1 ip address 2.2.2.2/24 gateway (config) # ip route 0 /0 2.2.2.1

  3. Detect the ports on the Ethernet router that are connected to the gateway, assign the LAG to a VLAN, and configure the IP address on the VLAN interface. In addition, configure a route to the IPoIB network via the IP assigned to the gateway's Ethernet port-channel (in this example, 2.2.2.2).
    Below is an example using a NVIDIA-Onyx-based switch with port 1/1 connected to the gateway.
    In this example, ports 1-8 on the router (see line 7) are connected to the 8 Ethernet ports on Skyway.

    Copy
    Copied!
                

    eth_router > enable eth_router # configure terminal eth_router (config) # ip routing eth_router (config) # lacp eth_router (config) # interface port-channel 1 eth_router (config interface port-channel 1) # exit eth_router (config) # interface ethernet 1/1-1/8 channel-group 1 mode active eth_router (config) # vlan 2 eth_router (config vlan 2) # exit eth_router (config) # interface port-channel 1 switchport access vlan 2 eth_router (config) # interface vlan 2 ip address 2.2.2.1 /24 eth_router (config) # ip route 1.1.1.0 /24 2.2.2.2

    Warning

    Note that the above connection describes a connection between Skyway and a single Ethernet router with LAG. It is possible to also connect to two Ethernet routers in an MLAG configuration. For more information, see the following community post for MLAG configuration on NVIDIA Onyx-based switches: support.mellanox.com/s/article/how-to-configure-mlag-on-mellanox-switches.

    image2022-3-22_15-25-26.png

  4. To ensure proper deployment, ping between a host in the subnet 192.168.1.0/24 and the InfiniBand host with IP 1.1.1.2 should be successful.

Deployment Scenarios

The gateway can be used in various deployment topologies, each with their particular strength.

Skyway Connectivity to the InfiniBand Using Fat Tree Topology

Option #1:

image2020-10-25_2-7-39.png

The Skyway appliances are connected to the Spine switches of the Fat Tree. This topology requires fewer hops to reach the Skyway, though, on the other hand, the Spine ports are occupied instead of keeping them available for future expansion of the cluster.

Option #2:

image2020-10-25_2-10-16.png

The Skyway appliances are connected to the Leaf switches. This topology makes Spine ports available for future expansion, though, on the other hand, the hop count from cluster nodes to the Skyway are not even—there may be nodes with a fewer hop count than others.

Skyway Connectivity to the InfiniBand Using Dragonfly+ Topology

Option #1:

image2020-10-25_2-10-52.png

The Skyway appliances are connected to a dedicated cell for services (e.g., storage and login services). This topology provides fairness and symmetry among all nodes on all other cells, though, on the other hand, it requires having an additional cell (or "services island").

Option #2:

image2020-10-25_2-12-27.png

The Skyway appliances are connected directly to leaf switches on the compute islands. An additional cell (or "service island") is not required.

Skyway Connectivity to the Ethernet Using LAG/MLAG

image2022-3-22_15-32-52.png

The clearest advantage of LAG/MLAG is that it is a simple and standard topology. While the topology provides a good load distribution and good resiliency, it is limited in scale. For more information on configuring MLAG, see the following community post.

Warning

Please consider the following while implementing LAG/MLAG using Skyway appliances:

  • There is no Inter Peer Link (IPL) across Skyway appliances.

  • Skyway-based MLAG cannot be VLAN enabled. Ports are always access.

  • All Skyway appliances in an MLAG domain will share the same Virtual IP.

Multiple IP Subnets

image2022-4-3_10-46-2.png

Multiple IP subnets can be configured over the InfiniBand network. In such cases, every IPoIB subnet will be served by dedicated Skyway appliances that are configured in High Availability (HA) Domain.

The specific IP configuration (e.g., Default Gateway, Next Hop Router, and so forth) will have to be configured separately per HA domain of the Skyway appliances.

Warning
  • InfiniBand network is assumed to be a single InfiniBand subnet encompassing several IPoIB subnets.

Configuring High Availability (HA)

This section explains how to configure a HA cluster with multiple appliances.

Before Configuring HA

Warning
  • For all appliances in the HA cluster, the MLNX-GW version must the same.

  • For all appliances in the HA cluster, the Ethernet management interfaces must be in the same L2 subnet.

  • The Skyway appliances configured in HA mode must be connected to either Ethernet L3-dedicated switch or Ethernet L2 where all ports connected to Skyway are configured as router ports.

  • Before configuring HA, each appliance should be configured according to a the "Configuring IP Addresses and Routes" section above.

  • Virtual IP configuration and Ethernet port channel configuration must be identical for all appliances in the HA cluster.
    Example of configuration that needs to be identical for all appliances:
    Skyway A:

    Copy
    Copied!
                

    gateway(config) # interface ib port-channel 1 virtual ip address 1.1.1.3/24 gateway(config) # interface ethernet port-channel 1 ip address 2.2.2.2/24

    Skyway B:

    Copy
    Copied!
                

    gateway(config) # interface ib port-channel 1 virtual ip address 1.1.1.3/24 gateway(config) # interface ethernet port-channel 1 ip address 2.2.2.2/24

  • The ib port channel IP address may be different between the appliances in the HA cluster:
    Skyway A:

    Copy
    Copied!
                

    gateway(config) # interface ib port-channel 1 ip address 1.1.1.1/24

    Skyway B:

    Copy
    Copied!
                

    gateway(config) # interface ib port-channel 1 ip address 1.1.1.4/24

  • Make sure that all Ethernet interfaces that are connected to Skyway appliances in the same HA cluster are connected through an Ethernet MLAG or LAG configuration.
    Below is an example of MLAG and MAGP configuration on Ethernet switches connected to Skyway appliances.

    Copy
    Copied!
                

    eth_router > enable eth_router # configure terminal eth_router (config) # protocol mlag eth_router (config) # lacp eth_router (config) # vlan 999 eth_router (config vlan 999) # exit eth_router (config) # interface vlan 999 ip address 192.17.10.3/24 primary eth_router (config) # interface port-channel 1 eth_router (config interface port-channel 1) # exit eth_router (config) # interface ethernet 1/1-1/4 channel-group 1 mode active eth_router (config) # interface port-channel 1 ipl 1 eth_router (config) # interface vlan 999 ipl 1 peer-address 192.17.10.2 eth_router (config) # mlag-vip GW-HA ip 10.10.252.10 /16 force eth_router (config) # no mlag shutdown eth_router (config) # interface mlag-port-channel 101  eth_router (config interface mlag-port-channel 101) # exit eth_router (config) # interface ethernet 1/19-1/26 mlag-channel-group 101 mode active eth_router (config) # interface mlag-port-channel 101 no shutdown   eth_router (config) # ip routing eth_router (config) # vlan 101 eth_router (config vlan 101) # exit eth_router (config) # interface vlan 101 ip address 2.2.2.252/24 primary eth_router (config) # interface mlag-port-channel 101 switchport access vlan 101 eth_router (config) # protocol magp eth_router (config) # interface vlan 101 magp 101 eth_router (config interface vlan 101 magp 101) # ip virtual-router address 2.2.2.254 eth_router (config interface vlan 101 magp 101) # ip virtual-router mac-address AA:BB:CC:00:01:01 eth_router (config) # ip route vrf default 172.0.0.0/8 2.2.2.2

    Below is an example of LAG configuration on Ethernet switch connected to Skyway appliances. Ports 1–8 on the router are connected to the 8 Ethernet ports on the first Skyway appliance and ports 11-18 on the router are connected to the 8 Ethernet ports on the second Skyway appliance.

    Copy
    Copied!
                

    eth_router > enable eth_router # configure terminal eth_router (config) # ip routing eth_router (config) # lacp eth_router (config) # interface port-channel 1 eth_router (config interface port-channel 1) # exit eth_router (config) # interface ethernet 1/1-1/8 channel-group 1 mode active eth_router (config) # interface ethernet 1/11-1/18 channel-group 1 mode active eth_router (config) # vlan 2 eth_router (config vlan 2) # exit eth_router (config) # interface port-channel 1 switchport access vlan 2 eth_router (config) # interface vlan 2 ip address 2.2.2.1 /24 eth_router (config) # ip route 1.1.1.0 /24 2.2.2.2

Warning

Even if working on a single Skyway appliance system, it is recommended to configure the appliance to have High Availability configuration on the system. This will allow to easily scale the topology in the future without needing to change a single Skyway appliance configuration. See section "Configuring HA on Skyway Appliance" below for configuration details.

Configuring HA on Skyway Appliance

  1. Configure HA on the gateway. Configure HA on each Skyway appliance that is going to be a part of the HA cluster.
    All Skyway appliances must share the same HA domain.

    Skyway A:

    Copy
    Copied!
                

    gateway (config) # gw ha 1  Warning! Configuration is about to be saved and the system will be reloaded. Type 'YES' to confirm the HA domain id change: YES

    Skyway B:

    Copy
    Copied!
                

    gateway (config) # gw ha 1  Warning! Configuration is about to be saved and the system will be reloaded. Type 'YES' to confirm the HA domain id change: YES

    Warning

    After this step, the Skyway appliances will be rebooted.

  2. Once all systems complete the initialization, verify that all Skyway appliances were added properly to the HA cluster by running "show gw ha" from one of the Skyway appliances.
    Verify domain ID appears as configured and all Skyway appliances appear in the output of the command.

    Copy
    Copied!
                

    gateway (config) # show gw ha   Global HA state: GW domain ID : 3 Active HA nodes: 3 Master name : skyway-7   HA domain nodes information: Name : skyway-8 GW Operational state: active System guid : b8ce:f603:0075:6eda Priority : 100   Name : skyway-64 GW Operational state: active System guid : b8ce:f603:0068:7e8a Priority : 100   Name : skyway-7 <--- (local node) GW Operational state: active System guid : b8ce:f603:0075:6efa Priority : 100

High Availability LAG/MLAG Setup

image2022-3-23_16-8-27.png

Skyway Connectivity to the Ethernet Using L2 Ethernet Switches

image2022-3-23_16-6-49.png

In this above use case, every Skyway-facing port on the side of the L2 Ethernet switches should be configured as a router port. In addition, a private network should be established (in the example above, 3.3.0.0/16) between the router ports mentioned above and the Skyways Ethernet port channel.

© Copyright 2023, NVIDIA. Last updated on May 23, 2023.