Network Configuration

This chapter describes key network considerations and instructions for the DGX A100 System.

Configuring Network Proxies

If your network requires use of a proxy server, you will need to set up configuration files to ensure the DGX A100 System communicates through the proxy.

For the OS and Most Applications

Edit the /etc/environment file and add the following proxy addresses to the file, below the PATH line.

http_proxy="http://<username>:<password>@<host>:<port>/"
ftp_proxy="ftp://<username>:<password>@<host>:<port>/";
https_proxy="https://<username>:<password>@<host>:<port>/";
no_proxy="localhost,127.0.0.1,localaddress,.localdomain.com"
HTTP_PROXY="http://<username>:<password>@<host>:<port>/"
FTP_PROXY="ftp://<username>:<password>@<host>:<port>/";
HTTPS_PROXY="https://<username>:<password>@<host>:<port>/";
NO_PROXY="localhost,127.0.0.1,localaddress,.localdomain.com"

Where username and password are optional. Refer to the following example:

http_proxy="http://myproxy.server.com:8080/"
ftp_proxy="ftp://myproxy.server.com:8080/";
https_proxy="https://myproxy.server.com:8080/";

For apt

Edit (or create) the /etc/apt/apt.conf.d/myproxy proxy file and include the following lines:

Acquire::http::proxy "http://<username>:<password>@<host>:<port>/";
Acquire::ftp::proxy "ftp://<username>:<password>@<host>:<port>/";
Acquire::https::proxy "https://<username>:<password>@<host>:<port>/";

Where username and password are optional. Refer to the following example:

Acquire::http::proxy "http://myproxy.server.com:8080/";
Acquire::ftp::proxy "ftp://myproxy.server.com:8080>/";
Acquire::https::proxy "https://myproxy.server.com:8080/";

For Docker

To ensure that Docker can access the NGC container registry through a proxy, Docker uses environment variables. For best practice recommendations on configuring proxy environment variables for Docker, see https://docs.docker.com/.

Configuring Docker IP Addresses

To ensure that the DGX A100 system can access the network interfaces for Docker containers, Docker should be configured to use a subnet distinct from other network resources used by the DGX A100 System.

By default, Docker uses the 172.17.0.0/16 subnet. Consult your network administrator to find out which IP addresses are used by your network. If your network does not conflict with the default Docker IP address range, no changes are needed, and you can skip this section.

However, if your network uses the addresses within this range for the DGX A100 system, you should change the default Docker network addresses.

You can change the default Docker network addresses by modifying the /etc/docker/ daemon.json file or modifying the /etc/systemd/system/docker.service.d/docker-override.conf file. These instructions provide an example of modifying the /etc/systemd/system/docker.service.d/docker-override.conf file to override the default Docker network addresses.

  1. Edit the docker-override.conf file and make the following changes:

    [Service] ExecStart=
    ExecStart=/usr/bin/dockerd -H fd:// -s overlay2 LimitMEMLOCK=infinity LimitSTACK=67108864
    
  2. Make the changes indicated in bold below, setting the correct bridge IP address and IP address ranges for your network.

    Consult your IT administrator for the correct addresses.

    [Service] ExecStart=
    ExecStart=/usr/bin/dockerd -H fd:// -s overlay2 --bip=192.168.127.1/24
           --fixed-cidr=192.168.127.128/25
    LimitMEMLOCK=infinity
    LimitSTACK=67108864
    
  3. When you are finished save and close the /etc/systemd/system/docker.service.d/docker- override.conf file.

  4. Reload the systemctl daemon.

    $ sudo systemctl daemon-reload
    
  5. Restart Docker.

    $ sudo systemctl restart docker
    

Open Ports

Make sure that the ports listed in the following table are open and available on your firewall to the DGX A100 System.

Open Ports

Port (Protocol)

Direction

Use

22 (TCP)

Inbound

SSH

53 (UDP)

Outbound

DNS

80 (TCP)

Outbound

HTTP, package updates

443 (TCP)

Outbound

For internet (HTTP/HTTPS) connection to NVIDIA GPU Cloud

If port 443 is proxied through a corporate firewall, then WebSocket protocol traffic must be supported.

443 (TCP)

Inbound

For BMC web services, remote console services, cd-media service, and Redfish.

If port 443 is proxied through a corporate firewall, WebSocket protocol traffic must be supported.

Connectivity Requirements for NGC Containers

To run NVIDIA NGC containers from the NGC container registry, your network must be able to access the following URLs:

To verify connection to nvcr.io, run the following command:

$ wget https://nvcr.io/v2

You should see connecting verification followed by a 401 error.

--2018-08-01 19:42:58-- https://nvcr.io/v2
Resolving nvcr.io (nvcr.io)... 52.8.131.152, 52.9.8.8
Connecting to nvcr.io (nvcr.io)|52.8.131.152|:443... connected.
HTTP request sent, awaiting response... 401 Unauthorized

Configuring a Static IP Address for the BMC

This section explains how to set a static IP address for the BMC. You will need to do this if your network does not support DHCP.

Use one of the methods described in the following sections:

Configuring a BMC Static Address by Using ipmitool

This section describes how to set a static IP address for the BMC from the Ubuntu command line.

Note

If you cannot access the DGX A100 System remotely, then connect a display (1440x900 or lower resolution) and keyboard directly to the DGX A100 system.

To view the current settings, enter the following command.

$ sudo ipmitool lan print 1
  1. Set the IP address source to static.

    $ sudo ipmitool lan set 1 ipsrc static
    
  2. Set the appropriate address information.

    • To set the IP address (“Station IP address” in the BIOS settings), enter the following and replace the italicized text with your information.

      $ sudo ipmitool lan set 1 ipaddr <my-ip-address>
      
    • To set the subnet mask, enter the following and replace the italicized text with your information.

      $ sudo ipmitool lan set 1 netmask <my-netmask-address>
      
    • To set the default gateway IP (“Router IP address” in the BIOS settings), enter the following and replace the italicized text with your information.

      $ sudo ipmitool lan set 1 defgw ipaddr <my-default-gateway-ip-address>
      

Configuring a BMC Static IP Address by Using the System BIOS

This section describes how to set a static IP address for the BMC when you cannot access the DGX A100 System remotely, and this process involves setting the BMC IP address during system boot.

  1. Connect a keyboard and display (1440 x 900 maximum resolution) to the DGX A100 System and turn on the DGX A100 System.

  2. When you see the SBIOS version screen, press Del or F2 to enter the BIOS Setup Utility screen.

  3. At the BIOS Setup Utility screen, navigate to the Server Mgmt tab on the top menu, then scroll to BMC network configuration and press Enter.

  4. Scroll to Configuration Address Source and press Enter, then at the Configuration Address source pop-up, select Static and then press Enter.

  5. Set the addresses for the Station IP address, Subnet mask, and Router IP address as needed by performing the following for each:

    1. Scroll to the specific item and press Enter.

    2. Enter the appropriate information at the pop-up, then press Enter.

  6. When finished making all your changes, press F10 to save and exit.

Configuring a BMC Static IP Address for the Network Ports

During the initial boot setup process for the DGX A100 System, you had an opportunity to configure static IP addresses for a single network interface. If you did not set this up at that time, you can configure the static IP addresses from the Ubuntu command line using the following instructions.

Note

If you are connecting to the DGX A100 console remotely, connect using the BMC remote console. If you connect using SSH, your connection will be lost when performing the final step. Also, if you encounter issues with the config file, the BMC connection will facilitate troubleshooting.

If you cannot access the DGX A100 System remotely, then connect a display (1440x900 or lower resolution) and keyboard directly to the DGX A100 System.

  1. Determine the port designation that you want to configure, based on the physical Ethernet port that you have connected to your network.

    See Configuring Network Proxies for the port designation of the connection you want to configure.

  2. Edit the network configuration yaml file.

    Note

    Ensure that your file is identical to the following sample with regard to spacing; please do not use tabs!

    $ sudo vi /etc/netplan/01-netcfg.yaml
    
    network:
      version: 2
      renderer: networkd
      ethernets:
    
        <port-designation>:
        dhcp4: no
        dhcp6: no
        addresses: [10.10.10.2/24]
        gateway4: 10.10.10.1
        nameservers:
          search: [<mydomain>, <other-domain>]
          addresses: [10.10.10.1, 1.1.1.1]
    

    Consult your network administrator for the appropriate information for the items in bold, such as network, gateway, and nameserver addresses, and use the port designations that you determined in step 1.

  3. After you complete your edits, press ESC to switch to command mode, then save the file to the disk and exit the editor.

  4. Apply the changes.

    $ sudo netplan apply
    

Note

If you are not returned to the command line prompt after a minute, reboot the system.

For additional information, see https://help.ubuntu.com/lts/serverguide/network-configuration.html.en.

Switching Between InfiniBand and Ethernet

The NVIDIA DGX A100 System is equipped with up to eight NVIDIA ConnectX-6 or ConnectX-7 single-port network cards on the I/O board, typically used for cluster communications. By default, these are configured as InfiniBand ports, but you have the option to convert these to Ethernet ports.

For these changes to work properly, the configured port must connect to a networking switch that matches the port configuration. In other words, if the port configuration is set to InfiniBand, then the external switch should be an InfiniBand switch with the corresponding InfiniBand cables. If the port configuration is set to Ethernet, the switch should also be Ethernet.

The DGX A100 is also equipped with one (and optionally two) dual-port connections typically used for network storage and configured by default for Ethernet. These can also be configured for InfiniBand.

Note

On the dual-port cards, if one of the ports is configured for Ethernet and the other port is configured for InfiniBand, the following limitations apply.

  • FDR is not supported on the InfiniBand port (port 1 or 2).

  • If port 1 is InfiniBand, then port 2 (Ethernet) does not support 40 GbE/10GbE.

  • If port 1 is Ethernet, then port 2 (InfiniBand) does not support EDR.

Starting the Mellanox Software Tools and Determining the Current Port Configuration

Here is some information about how you can start the Mellanox software tools and determine the configuration for the current port.

Start the Mellanox Software Tools services.

$ sudo mst start

To determine the current port configuration, enter the following:

$ sudo mlxconfig -e query | egrep -e Device\|LINK_TYPE

The following example shows the output for one of the port devices, showing the device path and the default, current, and next boot configuration.

Device #2:
Device type: ConnectX6
Device: /dev/mst/mt4123_pciconf8
Configurations: Default Current Next Boot
* LINK_TYPE_P1 IB(1) IB(1) IB(1)
  • IB(1) indicates the port is configured for InfiniBand.

  • ETH(2) indicates the port is configured for Ethernet.

Determine the Device path bus numbers for the slot number of the port you want to configure. Refer to the table in Open Ports for the mapping.

Switching the Port Configuration

Make sure that you have started the Mellanox Software Tools (MST) services as described in Starting the Mellanox Software Tools and Determining the Current Port Configuration and have identified the correct ports to change.

Issue mlxconfig for each port you want to configure.

$ sudo mlxconfig -y -d <device-path> set LINK_TYPE_P1=<config-number>

where:

  • <device-path> corresponds to the port you want to configure.

  • <config-number> is 1 for InfiniBand and 2 for Ethernet.

Here is an example to set slot 0 to Ethernet:

$ sudo mlxconfig -y -d /dev/mst/mt4123_pciconf2 set LINK_TYPE_P1=2

Here is an example that sets slot 1 to InfiniBand:

$ sudo mlxconfig -y -d /dev/mst/mt4123_pciconf3 set LINK_TYPE_P1=1

For these changes to take effect, reboot the system.