Troubleshooting

This page documents solutions to common issues that you might encounter.

Normally the hugepages settings are updated through the /etc/default/grub configuration file, as described earlier. However, depending on the version of operating system, the settings changes may become overwritten by another configuration file: /etc/grub.

Run below to remove CUDA Toolkit and driver if the system already has old version installed:

Copy
Copied!
            

sudo apt-get --purge remove "*cublas*" "*cufft*" "*curand*" "*cusolver*" "*cusparse*" "*npp*" "*nvjpeg*" "cuda*" "nsight*" "*nvidia*" sudo apt-get autoremove

You may see the apt update error if the system time is incorrect.

Copy
Copied!
            

E: Release file for https://download.docker.com/linux/ubuntu/dists/focal/InRelease is not valid yet (invalid for another 2d 10h 51min 11s). Updates for this repository will not be applied.

Run the folllowing commands to set the date and time via NTP once (This will not enable the NTP service):

Copy
Copied!
            

sudo apt-get install ntpdate sudo ntpdate -s pool.ntp.org

When installing Ubuntu 22.04 server, it partitions the whole disk but only creates a 200GB logical volume. This is what you will see on a newly installed devkit:

Copy
Copied!
            

# Devkit has 1TB SSD but default lv uses only 200GB lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT loop0 7:0 0 55.5M 1 loop /snap/core18/2246 loop1 7:1 0 55.5M 1 loop /snap/core18/2253 loop2 7:2 0 67.3M 1 loop /snap/lxd/21545 loop3 7:3 0 67.2M 1 loop /snap/lxd/21835 loop4 7:4 0 61.9M 1 loop /snap/core20/1242 loop5 7:5 0 61.9M 1 loop /snap/core20/1169 loop6 7:6 0 32.5M 1 loop /snap/snapd/13640 loop7 7:7 0 42.2M 1 loop /snap/snapd/14066 sda 8:0 0 894.3G 0 disk ├─sda1 8:1 0 512M 0 part /boot/efi ├─sda2 8:2 0 1G 0 part /boot └─sda3 8:3 0 892.8G 0 part └─ubuntu--vg-ubuntu--lv 253:0 0 200G 0 lvm /

The following commands will resize the logic volume to use the entire disk, then resize the filesystem to use the entire logic volume.

Copy
Copied!
            

# Test mode first sudo lvresize -t -v -l +100%FREE /dev/mapper/ubuntu--vg-ubuntu--lv # Remove -t if test mode succeeds sudo lvresize -v -l +100%FREE /dev/mapper/ubuntu--vg-ubuntu--lv lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT loop0 7:0 0 55.5M 1 loop /snap/core18/2246 loop1 7:1 0 55.5M 1 loop /snap/core18/2253 loop2 7:2 0 67.3M 1 loop /snap/lxd/21545 loop3 7:3 0 67.2M 1 loop /snap/lxd/21835 loop4 7:4 0 61.9M 1 loop /snap/core20/1242 loop5 7:5 0 61.9M 1 loop /snap/core20/1169 loop6 7:6 0 32.5M 1 loop /snap/snapd/13640 loop7 7:7 0 42.2M 1 loop /snap/snapd/14066 sda 8:0 0 894.3G 0 disk ├─sda1 8:1 0 512M 0 part /boot/efi ├─sda2 8:2 0 1G 0 part /boot └─sda3 8:3 0 892.8G 0 part └─ubuntu--vg-ubuntu--lv 253:0 0 892.8G 0 lvm / # Resize filesystem sudo resize2fs -p /dev/mapper/ubuntu--vg-ubuntu--lv df -h -T Filesystem Type Size Used Avail Use% Mounted on udev devtmpfs 39G 0 39G 0% /dev tmpfs tmpfs 9.4G 2.0M 9.4G 1% /run /dev/mapper/ubuntu--vg-ubuntu--lv ext4 878G 77G 764G 10% / tmpfs tmpfs 47G 0 47G 0% /dev/shm tmpfs tmpfs 5.0M 0 5.0M 0% /run/lock tmpfs tmpfs 47G 0 47G 0% /sys/fs/cgroup /dev/sda2 ext4 976M 460M 450M 51% /boot /dev/loop0 squashfs 56M 56M 0 100% /snap/core18/2246 /dev/sda1 vfat 511M 5.3M 506M 2% /boot/efi /dev/loop1 squashfs 56M 56M 0 100% /snap/core18/2253 /dev/loop5 squashfs 62M 62M 0 100% /snap/core20/1169 /dev/loop2 squashfs 68M 68M 0 100% /snap/lxd/21545 /dev/loop4 squashfs 62M 62M 0 100% /snap/core20/1242 /dev/loop6 squashfs 33M 33M 0 100% /snap/snapd/13640 /dev/loop3 squashfs 68M 68M 0 100% /snap/lxd/21835 /dev/loop7 squashfs 43M 43M 0 100% /snap/snapd/14066 overlay overlay 878G 77G 764G 10% /var/lib/docker/overlay2/851cbfd83b022a24f61fb0f87a007c56da8065a7528f6b661bf45d3d65ccc787/merged tmpfs tmpfs 9.4G 4.0K 9.4G 1% / run/user/1000

Use the lspci command to find the bus address of the NIC, then use the lshw command to find the interface name from the bus address. Then use the ip -a command to find the MAC address from the interface name. Here is an example:

Copy
Copied!
            

$ lspci|grep -i ether 04:00.0 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 2-port Gigabit Ethernet PCIe 04:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 2-port Gigabit Ethernet PCIe 31:00.0 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5] 31:00.1 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5] 4b:00.0 Ethernet controller: Mellanox Technologies MT2892 Family [ConnectX-6 Dx] 4b:00.1 Ethernet controller: Mellanox Technologies MT2892 Family [ConnectX-6 Dx] 98:00.0 Ethernet controller: Mellanox Technologies MT2892 Family [ConnectX-6 Dx] 98:00.1 Ethernet controller: Mellanox Technologies MT2892 Family [ConnectX-6 Dx]

Copy
Copied!
            

$ lshw -c network -businfo Bus info Device Class Description ============================================================ pci@0000:04:00.0 eno8303 network NetXtreme BCM5720 2-port Gigabit Ethernet PCIe pci@0000:04:00.1 eno8403 network NetXtreme BCM5720 2-port Gigabit Ethernet PCIe pci@0000:31:00.0 eno12399np0 network MT27800 Family [ConnectX-5] pci@0000:31:00.1 eno12409np1 network MT27800 Family [ConnectX-5] pci@0000:4b:00.0 ens3f0np0 network MT2892 Family [ConnectX-6 Dx] pci@0000:4b:00.1 ens3f1np1 network MT2892 Family [ConnectX-6 Dx] pci@0000:98:00.0 ens6f0np0 network MT2892 Family [ConnectX-6 Dx] pci@0000:98:00.1 ens6f1np1 network MT2892 Family [ConnectX-6 Dx] $ ip a

Previous Aerial System Scripts
Next cuBB Release Notes
© Copyright 2022-2023, NVIDIA.. Last updated on Apr 20, 2024.