Rivermax SDK
The Clara AGX Developer Kit can be used along with the NVIDIA Rivermax SDK to provide an extremely efficient network connection using the onboard ConnectX-6 network adapter that is further optimized for GPU workloads by using GPUDirect. This technology avoids unnecessary memory copies and CPU overhead by copying data directly to or from pinned GPU memory, and supports both the integrated GPU or the RTX6000 add-in dGPU.
The instructions below describe the steps required to install and test the
Rivermax SDK with the Clara AGX Developer Kit. The test applications used by
these instructions, generic_sender and generic_receiver, can
then be used as samples in order to develop custom applications that use the
Rivermax SDK to optimize data transfers using GPUDirect.
The Rivermax SDK may also be installed onto the Clara AGX Developer Kit via SDK Manager by selecting it as an additional SDK during the JetPack installation. If Rivermax SDK was previously installed by SDK Manager, many of these instructions can be skipped (see additional notes in the steps below).
Access to the Rivermax SDK Developer Program as well as a valid Rivermax software license is required to use the Rivermax SDK.
The Mellanox OpenFabrics Enterprise Distribution Drivers for Linux (OFED) must be installed in order to use the ConnectX-6 network adapter that is onboard the Clara AGX Developer Kit.
If Rivermax SDK was previously installed via SDK Manager, OFED will already be installed and these steps can be skipped.
- Download OFED version 5.4-1.0.3.0: - MLNX_OFED_LINUX-5.4-1.0.3.0-ubuntu18.04-aarch64.tgz - If the above link does not work, navigate to the Downloads section on the main OFED page, select either Current Versions or Archive Versions to find version 5.4-1.0.3.0, select Ubuntu, Ubuntu 18.04, aarch64, then download the tgz file. Note- Newer versions of OFED have not been tested and may not work. 
- Install OFED: - $ sudo apt install -y apt-utils $ tar -xvf MLNX_OFED_LINUX-5.4-1.0.3.0-ubuntu18.04-aarch64.tgz $ cd MLNX_OFED_LINUX-5.4-1.0.3.0-ubuntu18.04-aarch64 $ sudo ./mlnxofedinstall --force --force-fw-update --vma --add-kernel-support $ sudo /etc/init.d/openibd restart 
The GPUDirect drivers must be installed to enable the use of GPUDirect when using an RTX6000 add-in dGPU. When using the iGPU the CPU and GPU share the unified memory and the GPUDirect drivers are not required, so this step may be skipped when using the iGPU.
The GPUDirect drivers are not installed by SDK Manager, even when Rivermax SDK is installed, so these steps must always be followed to enable GPUDirect support when using the dGPU.
- Download GPUDirect Drivers for OFED: - If the above link does not work, navigate to the Downloads section on the GPUDirect page. 
- Install GPUDirect: - $ mv nvidia-peer-memory_1.1.tar.gz nvidia-peer-memory_1.1.orig.tar.gz $ tar -xvf nvidia-peer-memory_1.1.orig.tar.gz $ cd nvidia-peer-memory-1.1 $ dpkg-buildpackage -us -uc $ sudo dpkg -i ../nvidia-peer-memory_1.1-0_all.deb $ sudo dpkg -i ../nvidia-peer-memory-dkms_1.1-0_all.deb $ sudo service nv_peer_mem start - Verify the - nv_peer_memservice is running:- $ sudo service nv_peer_mem status - Enable the - nv_peer_memservice at boot time:- $ sudo systemctl enable nv_peer_mem $ sudo /lib/systemd/systemd-sysv-install enable nv_peer_mem 
If Rivermax SDK was previously installed via SDK Manager, the download and install steps (1 and 2) can be skipped. The Rivermax license must still be installed, however, so step 3 must still be followed.
- Download version 1.8.21 or newer of the Rivermax SDK from the NVIDIA Rivermax SDK developer page. - Click Get Started and login using your NVIDIA developer account. 
- Scroll down to Downloads and click I Agree To the Terms of the NVIDIA Rivermax Software Licence Agreement 
- Select Rivermax SDK 1.8.21, Linux, then download rivermax_ubuntu1804_1.8.21.tar.gz. If a newer version is available, replace 1.8.21 in this and all following steps with the newer version that is available. 
 
- Install Rivermax SDK: - $ tar -xvf rivermax_ubuntu1804_1.8.21.tar.gz $ sudo dpkg -i 1.8.21/Ubuntu.18.04/deb-dist/aarch64/rivermax_11.3.9.21_arm64.deb 
- Install Rivermax License - Using Rivermax requires a valid license, which can be purchased from the Rivermax Licenses page. Once the license file has been obtained, it must be placed onto the system using the following path: - /opt/mellanox/rivermax/rivermax.lic 
Running the Rivermax sample applications requires two systems, a sender and a receiver, connected via ConnectX network adapters. If two Clara AGX Developer Kits are used then the onboard ConnectX-6 can be used on each system, but if only one Clara AGX is available then it’s expected that another system with an add-in ConnectX network adapter will need to be used. Rivermax supports a wide array of platforms, including both Linux and Windows, but these instructions assume that another Linux based platform will be used as the sender device while the Clara AGX is used as the receiver.
- Determine the logical name for the ConnectX devices that are used by each system. This can be done by using the - lshw -class networkcommand, finding the- product:entry for the ConnectX device, and making note of the- logical name:that corresponds to that device. For example, this output on a Clara AGX shows the onboard ConnectX-6 device using the- enp9s0f01logical name (- lshwoutput shortened for demonstration purposes).- $ sudo lshw -class network *-network:0 description: Ethernet interface product: MT28908 Family [ConnectX-6] vendor: Mellanox Technologies physical id: 0 bus info: pci@0000:09:00.0 logical name: enp9s0f0 version: 00 serial: 48:b0:2d:13:9b:6b capacity: 10Gbit/s width: 64 bits clock: 33MHz capabilities: pciexpress vpd msix pm bus_master cap_list ethernet physical 1000bt-fd 10000bt-fd autonegotiation configuration: autonegotiation=on broadcast=yes driver=mlx5_core driverversion=5.4-1.0.3 duplex=full firmware=20.27.4006 (NVD0000000001) ip=10.0.0.2 latency=0 link=yes multicast=yes resources: iomemory:180-17f irq:33 memory:1818000000-1819ffffff - The instructions that follow will use the - enp9s0f0logical name for- ifconfigcommands, but these names should be replaced with the corresponding logical names as determined by this step.
- Run the - generic_senderapplication on the sending system.- Bring up the network: - $ sudo ifconfig enp9s0f0 up 10.0.0.1 
- Build the sample apps: - $ cd 1.8.21/apps $ make Note- The - 1.8.21path above corresponds to the path where the Rivermax SDK package was extracted in step 2 of the Installing Rivermax SDK section, above. If the Rivermax SDK was installed via SDK Manager, this path will be- $HOME/Documents/Rivermax/1.8.21.
 - Launch the - generic_senderapplication:- $ sudo ./generic_sender -l 10.0.0.1 -d 10.0.0.2 -p 5001 -y 1462 -k 8192 -z 500 -v ... +############################################# | Sender index: 0 | Thread ID: 0x7fa1ffb1c0 | CPU core affinity: -1 | Number of streams in this thread: 1 | Memory address: 0x7f986e3010 | Memory length: 59883520[B] | Memory key: 40308 +############################################# | Stream index: 0 | Source IP: 10.0.0.1 | Destination IP: 10.0.0.2 | Destination port: 5001 | Number of flows: 1 | Rate limit bps: 0 | Rate limit max burst in packets: 0 | Memory address: 0x7f986e3010 | Memory length: 59883520[B] | Memory key: 40308 | Number of user requested chunks: 1 | Number of application chunks: 5 | Number of packets in chunk: 8192 | Packet's payload size: 1462 +********************************************** 
 
- Run the - generic_receiverapplication on the receiving system.- Bring up the network: - $ sudo ifconfig enp9s0f0 up 10.0.0.2 
- Build the sample apps with GPUDirect support ( - CUDA=y):- $ cd 1.8.21/apps $ make CUDA=y Note- The - 1.8.21path above corresponds to the path where the Rivermax SDK package was extracted in step 2 of the Installing Rivermax SDK section, above. If the Rivermax SDK was installed via SDK Manager, this path will be- $HOME/Documents/Rivermax/1.8.21.
- Launch the - generic_receiverapplication:- $ sudo ./generic_receiver -i 10.0.0.2 -m 10.0.0.2 -s 10.0.0.1 -p 5001 -g 0 ... Attached flow 1 to stream. Running main receive loop... Got 5877704 GPU packets | 68.75 Gbps during 1.00 sec Got 5878240 GPU packets | 68.75 Gbps during 1.00 sec Got 5878240 GPU packets | 68.75 Gbps during 1.00 sec Got 5877704 GPU packets | 68.75 Gbps during 1.00 sec Got 5878240 GPU packets | 68.75 Gbps during 1.00 sec ... 
 
With both the generic_sender and generic_receiver processes
active, the receiver will continue to print out received packet statistics
every second. Both processes can then be terminated with <ctrl-c>
GPUDirect is ideal for applications which receive data from the network adapter
and then use the GPU to process the received data directly in GPU memory. The
generic_sender and generic_receiver demo applications include
a simple demonstration of the use of CUDA with received packets by using a CUDA
kernel to compute and then compare a checksum of the packet against an expected
checksum as provided by the sender. This additional checksum packet included by
the sender also includes a packet sequence number that is used by the receiver
to detect when any packets are lost during transmission.
In order to enable the CUDA checksum sample, append the -x parameter to
the generic_sender and generic_receiver commands that are run
above.
Due to the increased workload by the receiver when the checksum calculation is
enabled, you will begin to see dropped packets and/or checksum errors if
you try to maintain the same data rate from the sender as you did when the
checksum was disabled (i.e. when all received packet data was simply discarded).
Because of this the sleep parameter used by the sender, -z, should be
increased until there are no more dropped packets or checksum errors. In this
example, the sleep parameter was increased from 500 to 40000 in
order to ensure the receiver can receive and process the sent packets without
any errors or loss:
            
            [Sender]
$ sudo ./generic_sender -l 10.0.0.1 -d 10.0.0.2 -p 5001 -y 1462 -k 8192 -z 40000 -v -x
[Receiver]
$ sudo ./generic_receiver -i 10.0.0.2 -m 10.0.0.2 -s 10.0.0.1 -p 5001 -g 0 -x
...
Got  203968 GPU packets | 2.40 Gbps during 1.02 sec | 0 dropped packets | 0 checksum errors
Got  200632 GPU packets | 2.36 Gbps during 1.00 sec | 0 dropped packets | 0 checksum errors
Got  203968 GPU packets | 2.40 Gbps during 1.01 sec | 0 dropped packets | 0 checksum errors
Got  201608 GPU packets | 2.37 Gbps during 1.01 sec | 0 dropped packets | 0 checksum errors
    
If you would like to write an application that uses Rivermax and GPUDirect for
CUDA data processing, refer to the source code for the generic_sender
and generic_receiver applications included with the Rivermax SDK in
generic_sender.cpp and generic_receiver.cpp, respectively.
The CUDA checksum calculation in the generic_receiver is included
only to show how the data received through GPUDirect can be processed through
CUDA. This example is not optimized in any way, and should not be used as
an example of how to write a high-performance CUDA application. Please refer
to the CUDA Best Practices Guide for an introduction to optimizing CUDA
applications.
If running the driver installation or sample applications do not work, check the following.
- The ConnectX network adapter is recognized by the system. For example, on a Linux system using a ConnectX-6 Dx add-in PCI card: - $ lspci ... 0000:05:00.0 Ethernet controller: Mellanox Technologies MT28841 0000:05:00.1 Ethernet controller: Mellanox Technologies MT28841 ... - If the network adapter is not recognized, try rebooting the system and/or reseating the card in the PCI slot. 
- The ConnectX network adapter is recognized by the OFED driver. For example, on a Linux system using a ConnectX-6 Dx add-in PCI card: - $ sudo mlxfwmanager ... Device Type: ConnectX6DX Part Number: MCX623106AC-CDA_Ax Description: ConnectX-6 Dx EN adapter card; 100GbE; Dual-port QSFP56; PCIe 4.0 x16; Crypto and Secure Boot PSID: MT_0000000436 PCI Device Name: /dev/mst/mt4125_pciconf0 Base GUID: 0c42a1030024053a Base MAC: 0c42a124053a Versions: Current Available FW 22.31.1014 N/A FW (Running) 22.30.1004 N/A PXE 3.6.0301 N/A UEFI 14.23.0017 N/A If the device does not appear, first try rebooting and then :ref:`reinstalling OFED <installing_ofed>` as described above. 
- The sender and reciever systems can ping each other: - $ ping 10.0.0.1 PING 10.0.0.1 (10.0.0.1) 56(84) bytes of data. 64 bytes from 10.0.0.1: icmp_seq=1 ttl=64 time=0.205 ms 64 bytes from 10.0.0.1: icmp_seq=2 ttl=64 time=0.206 ms ... - If the systems can not ping each other, try bringing up the network interfaces again using the - ifconfigcommands.
- The nv_peer_mem service is running: - $ sudo service nv_peer_mem status * nv_peer_mem.service - LSB: Activates/Deactivates nv_peer_mem to \ start at boot time. Loaded: loaded (/etc/init.d/nv_peer_mem; generated) Active: active (exited) since Mon 2021-01-25 16:45:08 MST; 9min ago Docs: man:systemd-sysv-generator(8) Process: 6847 ExecStart=/etc/init.d/nv_peer_mem start (code=exited, status=0/SUCCESS) Jan 25 16:45:08 mccoy systemd[1]: Starting LSB: Activates/Deactivates nv_peer_mem to \ start at boot time.... Jan 25 16:45:08 mccoy nv_peer_mem[6847]: starting... OK Jan 25 16:45:08 mccoy systemd[1]: Started LSB: Activates/Deactivates nv_peer_mem to \ start at boot time.. - If the service is not running, try starting it again using - sudo service nv_peer_mem start.