Aerial SDK 23-1
Aerial SDK 23-1

Aerial System Scripts

This page describes scripts to retrieve and configure settings for the cuBB SDK.

This section describes how to create an initialization script that configures the system settings for Aerial.

Because the network optimization settings described below are not persistent across system reboot, they need to be re-applied each time after the system boots up. Saving these steps in a bash script will make it easier to run.

On a system where everything runs natively, all the steps below will run natively with the script. On a system that uses a Docker container for the network software tools and drivers, these will need to run from the container.

Creating the Script

Create a bash shell script called aerial-init.sh for the convenience of initializing the system each time after the system boots up.

Copy
Copied!
            

$ nano aerial-init.sh $ chmod 755 aerial-init.sh

These are the contents of aerial-init.sh:

Copy
Copied!
            

#===================================================================== # Enable GPU Persistence Mode on the GPU #===================================================================== sudo nvidia-smi -pm 1 sudo nvidia-smi -i 0 -lgc $(sudo nvidia-smi -i 0 --query-supported-clocks=graphics --format=csv,noheader,nounits | sort -h | tail -n 1) sudo nvidia-smi -mig 0 # Load nvidia-peermem sudo modprobe nvidia-peermem sudo ifconfig ens6f0 up sudo ifconfig ens6f1 up # Improving FH and PTP ports TX timestamping accuracy sudo ethtool --set-priv-flags ens6f0 tx_port_ts on sudo ethtool --set-priv-flags ens6f1 tx_port_ts on # Disable flow rules for both ports of CX6-DX NIC sudo ethtool -A ens6f0 rx off tx off sudo ethtool -A ens6f1 rx off tx off


Included in the SDK is a script that checks and displays key system configuration settings that are important for running the Aerial cuBB SDK.

Copy
Copied!
            

$ pip3 install psutil $ cd $cuBB_SDK/cuPHY/util/cuBB_system_checks $ sudo -E python3 ./cuBB_system_checks.py

The output of cuBB_system_checks.py may differ slightly between bare-metal and container versions of the environment. The script helps to retrieve the software-component versions and hardware configuration. Refer to the Release Manifest in the cuBB Release Notes to ensure the correct software-component versions are installed. Below is an example output on a bare-metal platform:

Copy
Copied!
            

# In order to get the system or ptp info, the command has to run on ths host. $ sudo python3 cuBB_system_checks.py --sys -----General-------------------------------------- Hostname : devkit-1 IP address : 192.168.1.100 Linux distro : "Ubuntu 20.04.3 LTS" Linux kernel version : 5.4.0-65-lowlatency -----System--------------------------------------- Manufacturer : GIGABYTE Product Name : E251-U70-00 Base Board Manufacturer : GIGABYTE Base Board Product Name : MU71-SU0-00 Chassis Manufacturer : GIGABYTE Chassis Type : Rack Mount Chassis Chassis Height : Unspecified Processor : Intel(R) Xeon(R) Gold 6240R CPU @ 2.40GHz Max Speed : 4000 MHz Current Speed : 2400 MHz $ sudo python3 cuBB_system_checks.py -----General-------------------------------------- Hostname : devkit-1 IP address : 192.168.1.100 Linux distro : "Ubuntu 20.04.3 LTS" Linux kernel version : 5.4.0-65-lowlatency -----Kernel Command Line-------------------------- Audit subsystem : audit=0 Clock source : clocksource=tsc HugePage count : hugepages=16 HugePage size : hugepagesz=1G CPU idle time management : idle=poll Max Intel C-state : intel_idle.max_cstate=0 Intel IOMMU : intel_iommu=off IOMMU : iommu=off Isolated CPUs : isolcpus=2-21 Corrected errors : mce=ignore_ce Adaptive-tick CPUs : nohz_full=2-21 Soft-lockup detector disable : nosoftlockup Max processor C-state : processor.max_cstate=0 RCU callback polling : rcu_nocb_poll No-RCU-callback CPUs : rcu_nocbs=2-21 TSC stability checks : tsc=reliable -----CPU------------------------------------------ CPU cores : 24 Thread(s) per CPU core : 1 CPU MHz: : 3200.000 CPU sockets : 1 -----Environment variables------------------------ CUDA_DEVICE_MAX_CONNECTIONS : N/A cuBB_SDK : N/A -----Memory--------------------------------------- HugePage count : 16 Free HugePages : 16 HugePage size : 1048576 kB Shared memory size : 47G -----Nvidia GPUs---------------------------------- GPU driver version : 520.61.05 CUDA version : 11.8 GPU0 GPU product name : NVIDIA A100-PCIE-40GB GPU persistence mode : Enabled Current GPU temperature : 29 C GPU clock frequency : 1410 MHz Max GPU clock frequency : 1410 MHz GPU PCIe bus id : 00000000:B6:00.0 -----GPUDirect topology--------------------------- GPU0 mlx5_0 mlx5_1 CPU Affinity NUMA Affinity GPU0 X PIX PIX 0-23 N/A mlx5_0 PIX X PIX mlx5_1 PIX PIX X Legend: X = Self SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI) NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU) PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge) PIX = Connection traversing at most a single PCIe bridge NV# = Connection traversing a bonded set of # NVLinks -----Mellanox NICs-------------------------------- NIC0 NIC product name : ConnectX6DX NIC part number : MCX623106AE-CDA_Ax NIC PCIe bus id : 0000:b5:00.0 NIC FW version : 22.35.1012 FLEX_PARSER_PROFILE_ENABLE : 4 PROG_PARSE_GRAPH : True(1) ACCURATE_TX_SCHEDULER : True(1) CQE_COMPRESSION : AGGRESSIVE(1) REAL_TIME_CLOCK_ENABLE : True(1) -----Mellanox NIC Interfaces---------------------- Interface0 Name : ens6f0 Network adapter : mlx5_0 PCIe bus id : 0000:b5:00.0 Ethernet address : b8:ce:f6:33:fd:ee Operstate : up MTU : 1514 RX flow control : off TX flow control : off PTP hardware clock : 2 QoS Priority trust state : pcp PCIe MRRS : 4096 bytes Interface1 Name : ens6f1 Network adapter : mlx5_1 PCIe bus id : 0000:b5:00.1 Ethernet address : b8:ce:f6:33:fd:ef Operstate : up MTU : 1500 RX flow control : off TX flow control : off PTP hardware clock : 3 QoS Priority trust state : pcp PCIe MRRS : 512 bytes -----Linux PTP------------------------------------ ● ptp4l.service - Precision Time Protocol (PTP) service Loaded: loaded (/lib/systemd/system/ptp4l.service; enabled; vendor preset: enabled) Active: active (running) since Tue 2022-09-27 00:05:26 UTC; 1 day 7h ago Docs: man:ptp4l Main PID: 1594 (ptp4l) Tasks: 1 (limit: 94581) Memory: 840.0K CGroup: /system.slice/ptp4l.service └─1594 /usr/sbin/ptp4l -f /etc/ptp.conf Sep 27 00:05:26 dc6-devkit-18 systemd[1]: Started Precision Time Protocol (PTP) service. Sep 27 00:05:26 dc6-devkit-18 taskset[1594]: ptp4l[127.145]: selected /dev/ptp2 as PTP clock Sep 27 00:05:27 dc6-devkit-18 taskset[1594]: ptp4l[127.162]: port 1: INITIALIZING to LISTENING on INIT_COMPLETE Sep 27 00:05:27 dc6-devkit-18 taskset[1594]: ptp4l[127.162]: port 0: INITIALIZING to LISTENING on INIT_COMPLETE Sep 27 00:05:27 dc6-devkit-18 taskset[1594]: ptp4l[127.186]: port 1: new foreign master b8cef6.fffe.33fe16-1 Sep 27 00:05:27 dc6-devkit-18 taskset[1594]: ptp4l[127.436]: selected best master clock b8cef6.fffe.33fe16 Sep 27 00:05:27 dc6-devkit-18 taskset[1594]: ptp4l[127.436]: assuming the grand master role Sep 27 00:05:27 dc6-devkit-18 taskset[1594]: ptp4l[127.436]: port 1: LISTENING to GRAND_MASTER on RS_GRAND_MASTER ● phc2sys.service - Synchronize system clock or PTP hardware clock (PHC) Loaded: loaded (/lib/systemd/system/phc2sys.service; enabled; vendor preset: enabled) Active: active (running) since Tue 2022-09-27 00:05:26 UTC; 1 day 7h ago Docs: man:phc2sys Main PID: 1598 (sh) Tasks: 2 (limit: 94581) Memory: 5.4M CGroup: /system.slice/phc2sys.service ├─1598 /bin/sh -c /usr/sbin/phc2sys -s /dev/ptp$(ethtool -T $(lshw -c network -businfo | grep b5:00.0 | awk '{print $2}') | grep PTP | awk '{print $4}') -c CLOCK_REALTIME -n 24 -O 0 -R 256 -u 256 └─1897 /usr/sbin/phc2sys -s /dev/ptp2 -c CLOCK_REALTIME -n 24 -O 0 -R 256 -u 256 Sep 28 07:16:46 dc6-devkit-18 phc2sys[1897]: [112407.124] CLOCK_REALTIME rms 10 max 34 freq +7048 +/- 25 delay 1765 +/- 8 Sep 28 07:16:47 dc6-devkit-18 phc2sys[1897]: [112408.140] CLOCK_REALTIME rms 10 max 27 freq +7031 +/- 39 delay 1765 +/- 8 Sep 28 07:16:49 dc6-devkit-18 phc2sys[1897]: [112409.155] CLOCK_REALTIME rms 9 max 27 freq +7044 +/- 30 delay 1764 +/- 7 Sep 28 07:16:50 dc6-devkit-18 phc2sys[1897]: [112410.171] CLOCK_REALTIME rms 9 max 24 freq +7041 +/- 17 delay 1765 +/- 8 Sep 28 07:16:51 dc6-devkit-18 phc2sys[1897]: [112411.188] CLOCK_REALTIME rms 9 max 28 freq +7036 +/- 21 delay 1766 +/- 7 Sep 28 07:16:52 dc6-devkit-18 phc2sys[1897]: [112412.203] CLOCK_REALTIME rms 9 max 22 freq +7055 +/- 21 delay 1766 +/- 7 Sep 28 07:16:53 dc6-devkit-18 phc2sys[1897]: [112413.219] CLOCK_REALTIME rms 9 max 24 freq +7038 +/- 20 delay 1764 +/- 8 Sep 28 07:16:54 dc6-devkit-18 phc2sys[1897]: [112414.235] CLOCK_REALTIME rms 9 max 23 freq +7041 +/- 19 delay 1763 +/- 7 Sep 28 07:16:55 dc6-devkit-18 phc2sys[1897]: [112415.251] CLOCK_REALTIME rms 9 max 22 freq +7043 +/- 11 delay 1763 +/- 8 Sep 28 07:16:56 dc6-devkit-18 phc2sys[1897]: [112416.267] CLOCK_REALTIME rms 10 max 24 freq +7052 +/- 20 delay 1762 +/- 7 Sep 28 07:16:57 dc6-devkit-18 phc2sys[1897]: [112417.283] CLOCK_REALTIME rms 10 max 30 freq +7035 +/- 39 delay 1765 +/- 8 -----Software Packages---------------------------- cmake : N/A docker /usr/bin : 19.03.13 gcc /usr/bin : 9.4.0 git-lfs : N/A MOFED : 5.8-1.0.1.1 meson : N/A ninja : N/A ptp4l /usr/sbin : 1.9.2-1 -----Loaded Kernel Modules------------------------ GDRCopy : gdrdrv GPUDirect RDMA : nvidia_peermem Nvidia : nvidia -----Non-persistent settings---------------------- VM swappiness : vm.swappiness = 0 VM zone reclaim mode : vm.zone_reclaim_mode = 0 -----Docker images--------------------------------

This section describes a list of system settings that are not persistent across system power-on reboot, and the required steps to re-apply them each time after the system is powered on or rebooted.

Applying the Optimization Settings

Apply the Aerial initialization settings:

Copy
Copied!
            

$ ~/aerial-init.sh


Checking the NIC Status

To query back the Mellanox NIC firmware settings initialized with the script above, use these commands:

Copy
Copied!
            

$ sudo mlxconfig -d $MLX0PCIEADDR q | grep "CQE_COMPRESSION\|PROG_PARSE_GRAPH\ \|FLEX_PARSER_PROFILE_ENABLE\|REAL_TIME_CLOCK_ENABLE\|ACCURATE_TX_SCHEDULER" # FLEX_PARSER_PROFILE_ENABLE 4 # PROG_PARSE_GRAPH True(1) # ACCURATE_TX_SCHEDULER True(1) # CQE_COMPRESSION AGGRESSIVE(1) # REAL_TIME_CLOCK_ENABLE True(1)

To check the current status of a NIC port, use this command:

Copy
Copied!
            

$ sudo mlxlink -d $MLX0PCIEADDR

Alternatively, you can use the System Configuration Validation Script to obtain a full list of configuration settings.

Previous Installing and Upgrading cuBB SDK
Next Troubleshooting
© Copyright 2022-2023, NVIDIA.. Last updated on Apr 20, 2024.