E2E gNodeB on MIG#
This page covers how to set up E2E gNodeB on MIG.
Setting up MIG for Aerial#
Check GPU Device availability#
To check the available GPUs on the system and get the GPU-ID, run the nvidia-smi -L
command.
$ nvidia-smi -L
GPU 0: NVIDIA GH200 480GB (UUID: GPU-51c12aab-5ee1-2f10-a4b7-6baacfec5e31)
Partition GPUs#
Run the
nvidia-smi -i <GPU_ID> -mig 1
command to enable MIG mode on the GPU(s).Note
If
-i <GPU_ID>
is not specified, then MIG mode is applied to all the GPUs on the system.$ sudo nvidia-smi -i 0 -mig 1 Enabled MIG Mode for GPU 00000009:01:00.0 All done.
Check the available partition options using the
nvidia-smi mig -lgip
command.The following example displays the results from GH.
$ sudo nvidia-smi mig -lgip +-----------------------------------------------------------------------------+ | GPU instance profiles: | | GPU Name ID Instances Memory P2P SM DEC ENC | | Free/Total GiB CE JPEG OFA | |=============================================================================| | 0 MIG 1g.12gb 19 7/7 11.00 No 16 1 0 | | 1 1 0 | +-----------------------------------------------------------------------------+ | 0 MIG 1g.12gb+me 20 1/1 11.00 No 16 1 0 | | 1 1 1 | +-----------------------------------------------------------------------------+ | 0 MIG 1g.24gb 15 4/4 23.00 No 26 1 0 | | 1 1 0 | +-----------------------------------------------------------------------------+ | 0 MIG 2g.24gb 14 3/3 23.00 No 32 2 0 | | 2 2 0 | +-----------------------------------------------------------------------------+ | 0 MIG 3g.48gb 9 2/2 46.50 No 60 3 0 | | 3 3 0 | +-----------------------------------------------------------------------------+ | 0 MIG 4g.48gb 5 1/1 46.50 No 64 4 0 | | 4 4 0 | +-----------------------------------------------------------------------------+ | 0 MIG 7g.96gb 0 1/1 93.00 No 132 7 0 | | 8 7 1 | +-----------------------------------------------------------------------------+
Slice the GPU using the
nvidia-smi mig -cgi <PROFILE> -C
command.The following example uses one Profile with 4g and one with 3g.
$ sudo nvidia-smi mig -cgi 9,5 -C Successfully created GPU instance ID 2 on GPU 0 using profile MIG 3g.48gb (ID 9) Successfully created compute instance ID 0 on GPU 0 GPU instance ID 2 using profile MIG 3g.48gb (ID 2) Successfully created GPU instance ID 1 on GPU 0 using profile MIG 4g.48gb (ID 5) Successfully created compute instance ID 0 on GPU 0 GPU instance ID 1 using profile MIG 4g.48gb (ID 3)
Check the GPU partitions using the
nvidia-smi -L
command.The following example displays the results from GH.
$ nvidia-smi -L GPU 0: NVIDIA GH200 480GB (UUID: GPU-51c12aab-5ee1-2f10-a4b7-6baacfec5e31) MIG 4g.48gb Device 0: (UUID: MIG-e9f0fa8c-548f-5fc5-aa58-51ef34c2816a) MIG 3g.48gb Device 1: (UUID: MIG-fcc563dc-5c8d-5de2-a448-439bde80400c)
Note
MIG mode is not persistent over reboots, so you may need to run above commands after each reboot.
Disabling MIG#
To disable MIG , use the nvidia-smi -i <GPU_ID> -mig 0
command.
Bringing up cuBB with a MIG Instance#
Start the cuBB Container#
To start the L1 container with a specific MIG instance, pass the CUDA_VISIBLE_DEVICES
variable argument specifying the MIG
instance UUID to docker run.
Tip
Run nvidia-smi -L
(as described in section above) to get UUIDs for MIG instances.
The following example command launches the cuBB container with the MIG-3 instance.
$ sudo docker run --gpus all --restart unless-stopped -dP --network host --shm-size=4096m --privileged -it --device=/dev/gdrdrv:/dev/gdrdrv -v /lib/modules:/lib/modules -v /dev/hugepages:/dev/hugepages --userns=host --ipc=host -v /usr/src:/usr/src -v /home/aerial/nfs:/root -v /home/aerial/nfs:/cuBBSrc -v /home/aerial/nfs:/home/aerial/nfs -e CUDA_VISIBLE_DEVICES=<UUID of MIG-3> --name 25-1-mig nvidia:Aerial-cuBB-container-ubuntu22.04-25.01.0-Rel-25-1.284-aarch64 bash
If successful, the above command creates a container with the name “25-1-mig”.
Start L1 Binaries#
Enter the cuBB container by running the following command.
``docker exec -it 25-1-mig /bin/bash``
Create the
bringup.sh
script as shown below.#!/bin/bash # This script to be used after getting into the docker image export cuBB_SDK=$(pwd) mkdir build cd build cmake .. -DCMAKE_TOOLCHAIN_FILE=cuPHY/cmake/toolchains/native # Compile the code make -j $(nproc --all) export CUDA_VISIBLE_DEVICES=$(nvidia-smi -L|grep 'MIG 3g\.'| sed -n 's/.*(UUID: \(.*\))/\1/p') echo $CUDA_VISIBLE_DEVICES export CUDA_DEVICE_MAX_CONNECTIONS=8 export CUDA_MPS_PIPE_DIRECTORY=/tmp/$CUDA_VISIBLE_DEVICES mkdir -p $CUDA_MPS_PIPE_DIRECTORY export CUDA_MPS_LOG_DIRECTORY=/var # Stop existing MPS echo "Stop existing mps" sudo -E echo quit | sudo -E nvidia-cuda-mps-control # Start MPS echo "Start mps" sudo -E nvidia-cuda-mps-control -d sudo -E echo start_server -uid 0 | sudo -E nvidia-cuda-mps-control exit 0
Run the
bringup.sh
script.Use
nvidia-smi
to confirm thatbringup.sh
has started the MPS server process:$ nvidia-smi +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 570.124.06 Driver Version: 570.124.06 CUDA Version: 12.8 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA GH200 480GB On | 00000009:01:00.0 Off | On | | N/A 38C P0 118W / 900W | N/A | N/A Default | | | | Enabled | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | MIG devices: | +------------------+----------------------------------+-----------+-----------------------+ | GPU GI CI MIG | Memory-Usage | Vol| Shared | | ID ID Dev | BAR1-Usage | SM Unc| CE ENC DEC OFA JPG | | | | ECC| | |==================+==================================+===========+=======================| | 0 1 0 0 | 58MiB / 47616MiB | 64 0 | 4 0 4 0 4 | | | 0MiB / 0MiB | | | +------------------+----------------------------------+-----------+-----------------------+ | 0 2 0 1 | 172MiB / 47616MiB | 60 0 | 3 0 3 0 3 | | | 0MiB / 0MiB | | | +------------------+----------------------------------+-----------+-----------------------+ +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | 0 2 0 493989 C nvidia-cuda-mps-server 120MiB | +-----------------------------------------------------------------------------------------+
Start L1 binaries using the below command.
Note
The
CUDA_VISIBLE_DEVICES=<MIG-UUID>
value can be obtained from thenvidia-smi -L
command (as described in section above).export CUDA_VISIBLE_DEVICES=<UUID of MIG-3> && export CUDA_MPS_PIPE_DIRECTORY=/tmp/$CUDA_VISIBLE_DEVICES && export CUDA_MPS_LOG_DIRECTORY=/var && export CUDA_DEVICE_MAX_CONNECTIONS=8 && sudo -E stdbuf -i0 -o0 -e0 /opt/nvidia/cuBB/build/cuPHY-CP/cuphycontroller/examples/cuphycontroller_scf P5G_FXN_GH
Both L1 and MPS server processes should now be running on GPU instance 2, which corresponds to MIG-3.
$ nvidia-smi +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 570.124.06 Driver Version: 570.124.06 CUDA Version: 12.8 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA GH200 480GB On | 00000009:01:00.0 Off | On | | N/A 42C P0 117W / 900W | 31209MiB / 97871MiB | N/A Default | | | | Enabled | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | MIG devices: | +------------------+----------------------------------+-----------+-----------------------+ | GPU GI CI MIG | Memory-Usage | Vol| Shared | | ID ID Dev | BAR1-Usage | SM Unc| CE ENC DEC OFA JPG | | | | ECC| | |==================+==================================+===========+=======================| | 0 1 0 0 | 58MiB / 47616MiB | 64 0 | 4 0 4 0 4 | | | 0MiB / 0MiB | | | +------------------+----------------------------------+-----------+-----------------------+ | 0 2 0 1 | 31151MiB / 47616MiB | 60 0 | 3 0 3 0 3 | | | 0MiB / 0MiB | | | +------------------+----------------------------------+-----------+-----------------------+ +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | 0 2 0 493989 C nvidia-cuda-mps-server 120MiB | | 0 2 0 494009 M+C .../examples/cuphycontroller_scf 30958MiB | +-----------------------------------------------------------------------------------------+
Starting LLM on MIG#
Execute the following Docker command to start the LLM on MIG:
sudo docker run --cpuset-cpus="52-70" --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=<UUID of MIG 4 instance> --rm -it -p 8000:8000 nvcr.io/miicz8azigqf/fix_mid_ans_tmo_async_rag_gh200_llama3-70b-int4_with_engine:0.10.0
Adding Routes on CN and PDN#
Adding a PDN Route on CN#
Navigate to the
/sbin
folder on the CN machine and create a script namedadd-route.sh
.Add the following contents to the
add-route.sh
script. The PDN server IP is given as169.254.200.1
; modify this value as needed based on your PDN IP setup.#!/bin/bash container_id=`docker ps | grep dataplane | awk '{print$1}'` echo "*************** Adding route to PDN inside VPP ***************" echo -e "\n" docker exec -it $container_id bash -c "vppctl ip route add 0.0.0.0/0 via 169.254.200.1 net1" echo -e "\n" echo "*************** Checking added route ***************" echo -e "\n" docker exec -it $container_id bash -c "vppctl show ip fib"
Provide full permissions permissions for the script:
chmod 777 add-route.sh
Run the script:
./add-route.sh
.Note
This route may get deleted at some point, in which case you will need to run the
add-route.sh
script again. If CUE cannot connect to internet, this is an indication that the route was deleted on the CN.
Adding Routes on PDN to enable Internet#
The PDN server has 2 IP addresses:
PDN VM Interface
IP:
192.168.122.11
Interface name:
enp6s0
PDN server Interface: The IP of this interface is configured on the CN machine.
IP:
169.254.200.1
Interface name:
enp1s0
Add the first route for a UE IP range of
21.21.21.*
.iptables -t nat -A POSTROUTING -s 21.21.21.0/24 -p all -j SNAT --to-source 192.168.122.11
Create a script named
internet_enable.sh
with the content below.Note
Ensure the WANIF and LANIF are set properly.
#! /bin/bash IPTABLES=/sbin/iptables WANIF='enp6s0' LANIF='enp1s0' # enable ip forwarding in the kernel echo 'Enabling Kernel IP forwarding...' /bin/echo 1 > /proc/sys/net/ipv4/ip_forward # flush rules and delete chains echo 'Flushing rules and deleting existing chains...' $IPTABLES -F $IPTABLES -X # enable masquerading to allow LAN internet access echo 'Enabling IP Masquerading and other rules...' $IPTABLES -t nat -A POSTROUTING -o $LANIF -j MASQUERADE $IPTABLES -A FORWARD -i $LANIF -o $WANIF -m state --state RELATED,ESTABLISHED -j ACCEPT $IPTABLES -A FORWARD -i $WANIF -o $LANIF -j ACCEPT $IPTABLES -t nat -A POSTROUTING -o $WANIF -j MASQUERADE $IPTABLES -A FORWARD -i $WANIF -o $LANIF -m state --state RELATED,ESTABLISHED -j ACCEPT $IPTABLES -A FORWARD -i $LANIF -o $WANIF -j ACCEPT echo 'Done.' $IPTABLES -X # enable masquerading to allow LAN internet access echo 'Enabling IP Masquerading and other rules...' $IPTABLES -t nat -A POSTROUTING -o $LANIF -j MASQUERADE $IPTABLES -A FORWARD -i $LANIF -o $WANIF -m state --state RELATED,ESTABLISHED -j ACCEPT $IPTABLES -A FORWARD -i $WANIF -o $LANIF -j ACCEPT $IPTABLES -t nat -A POSTROUTING -o $WANIF -j MASQUERADE $IPTABLES -A FORWARD -i $WANIF -o $LANIF -m state --state RELATED,ESTABLISHED -j ACCEPT $IPTABLES -A FORWARD -i $LANIF -o $WANIF -j ACCEPT echo 'Done.'
Provide full permissions permissions for the script:
chmod 777 internet_enable.sh
Run the script:
./internet_enable.sh
Note
You may need to add a proper nameserver entry in /etc/netplan/00-installer-config.yaml
to ping the outside Internet. To get the
DNS Server name, use the following command:
aerial@iperf-cn-vm:~$ systemd-resolve --status | grep "DNS Servers"
DNS Servers: 10.110.8.18