E2E gNodeB on MIG#

This page covers how to set up E2E gNodeB on MIG.

Setting up MIG for Aerial#

Check GPU Device availability#

To check the available GPUs on the system and get the GPU-ID, run the nvidia-smi -L command.

$ nvidia-smi -L
GPU 0: NVIDIA GH200 480GB (UUID: GPU-51c12aab-5ee1-2f10-a4b7-6baacfec5e31)

Partition GPUs#

Run the nvidia-smi -i <GPU_ID> -mig 1 command to enable MIG mode on the GPU(s).

Note

If -i <GPU_ID> is not specified, then MIG mode is applied to all the GPUs on the system.
```
$ sudo nvidia-smi -i 0 -mig 1
Enabled MIG Mode for GPU 00000009:01:00.0
All done.
```

Check the available partition options using the nvidia-smi mig -lgip command.

The following example displays the results from GH.

$ sudo nvidia-smi mig -lgip
+-----------------------------------------------------------------------------+
| GPU instance profiles:                                                      |
| GPU   Name             ID    Instances   Memory     P2P    SM    DEC   ENC  |
|                              Free/Total   GiB              CE    JPEG  OFA  |
|=============================================================================|
|   0  MIG 1g.12gb       19     7/7        11.00      No     16     1     0   |
|                                                             1     1     0   |
+-----------------------------------------------------------------------------+
|   0  MIG 1g.12gb+me    20     1/1        11.00      No     16     1     0   |
|                                                             1     1     1   |
+-----------------------------------------------------------------------------+
|   0  MIG 1g.24gb       15     4/4        23.00      No     26     1     0   |
|                                                             1     1     0   |
+-----------------------------------------------------------------------------+
|   0  MIG 2g.24gb       14     3/3        23.00      No     32     2     0   |
|                                                             2     2     0   |
+-----------------------------------------------------------------------------+
|   0  MIG 3g.48gb        9     2/2        46.50      No     60     3     0   |
|                                                             3     3     0   |
+-----------------------------------------------------------------------------+
|   0  MIG 4g.48gb        5     1/1        46.50      No     64     4     0   |
|                                                             4     4     0   |
+-----------------------------------------------------------------------------+
|   0  MIG 7g.96gb        0     1/1        93.00      No     132    7     0   |
|                                                             8     7     1   |
+-----------------------------------------------------------------------------+

Slice the GPU using the nvidia-smi mig -cgi <PROFILE> -C command.

The following example uses one Profile with 4g and one with 3g.

$ sudo nvidia-smi mig -cgi 9,5 -C
Successfully created GPU instance ID  2 on GPU  0 using profile MIG 3g.48gb (ID  9)
Successfully created compute instance ID  0 on GPU  0 GPU instance ID  2 using profile MIG 3g.48gb (ID  2)
Successfully created GPU instance ID  1 on GPU  0 using profile MIG 4g.48gb (ID  5)
Successfully created compute instance ID  0 on GPU  0 GPU instance ID  1 using profile MIG 4g.48gb (ID  3)

Check the GPU partitions using the nvidia-smi -L command.

The following example displays the results from GH.

$ nvidia-smi -L
GPU 0: NVIDIA GH200 480GB (UUID: GPU-51c12aab-5ee1-2f10-a4b7-6baacfec5e31)
MIG 4g.48gb     Device  0: (UUID: MIG-e9f0fa8c-548f-5fc5-aa58-51ef34c2816a)
MIG 3g.48gb     Device  1: (UUID: MIG-fcc563dc-5c8d-5de2-a448-439bde80400c)

Note

MIG mode is not persistent over reboots, so you may need to run above commands after each reboot.

Disabling MIG#

To disable MIG , use the nvidia-smi -i <GPU_ID> -mig 0 command.

Bringing up cuBB with a MIG Instance#

Start the cuBB Container#

To start the L1 container with a specific MIG instance, pass the CUDA_VISIBLE_DEVICES variable argument specifying the MIG instance UUID to docker run.

Tip

Run nvidia-smi -L (as described in section above) to get UUIDs for MIG instances.

The following example command launches the cuBB container with the MIG-3 instance.

$ sudo docker run --gpus all --restart unless-stopped -dP --network host --shm-size=4096m --privileged -it --device=/dev/gdrdrv:/dev/gdrdrv -v /lib/modules:/lib/modules -v /dev/hugepages:/dev/hugepages --userns=host --ipc=host -v /usr/src:/usr/src -v /home/aerial/nfs:/root  -v /home/aerial/nfs:/cuBBSrc -v /home/aerial/nfs:/home/aerial/nfs  -e CUDA_VISIBLE_DEVICES=<UUID of MIG-3> --name 25-1-mig nvidia:Aerial-cuBB-container-ubuntu22.04-25.01.0-Rel-25-1.284-aarch64 bash

If successful, the above command creates a container with the name “25-1-mig”.

Start L1 Binaries#

Enter the cuBB container by running the following command.
```
``docker exec -it 25-1-mig /bin/bash``
```

Create the bringup.sh script as shown below.

#!/bin/bash

# This script to be used after getting into the docker image

export cuBB_SDK=$(pwd)

mkdir build

cd build

cmake .. -DCMAKE_TOOLCHAIN_FILE=cuPHY/cmake/toolchains/native

# Compile the code

make -j $(nproc --all)

export CUDA_VISIBLE_DEVICES=$(nvidia-smi -L|grep 'MIG 3g\.'| sed -n 's/.*(UUID: \(.*\))/\1/p')

echo $CUDA_VISIBLE_DEVICES

export CUDA_DEVICE_MAX_CONNECTIONS=8

export CUDA_MPS_PIPE_DIRECTORY=/tmp/$CUDA_VISIBLE_DEVICES

mkdir -p $CUDA_MPS_PIPE_DIRECTORY

export CUDA_MPS_LOG_DIRECTORY=/var

# Stop existing MPS

echo "Stop existing mps"

sudo -E echo quit | sudo -E nvidia-cuda-mps-control

# Start MPS

echo "Start mps"

sudo -E nvidia-cuda-mps-control -d

sudo -E echo start_server -uid 0 | sudo -E nvidia-cuda-mps-control

exit 0

Run the bringup.sh script.

Use nvidia-smi to confirm that bringup.sh has started the MPS server process:

$ nvidia-smi
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.124.06             Driver Version: 570.124.06     CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GH200 480GB             On  |   00000009:01:00.0 Off |                   On |
| N/A   38C    P0            118W /  900W |                  N/A   |     N/A      Default |
|                                         |                        |              Enabled |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| MIG devices:                                                                            |
+------------------+----------------------------------+-----------+-----------------------+
| GPU  GI  CI  MIG |                     Memory-Usage |        Vol|        Shared         |
|      ID  ID  Dev |                       BAR1-Usage | SM     Unc| CE ENC  DEC  OFA  JPG |
|                  |                                  |        ECC|                       |
|==================+==================================+===========+=======================|
|  0    1   0   0  |              58MiB / 47616MiB    | 64      0 |  4   0    4    0    4 |
|                  |                 0MiB /     0MiB  |           |                       |
+------------------+----------------------------------+-----------+-----------------------+
|  0    2   0   1  |             172MiB / 47616MiB    | 60      0 |  3   0    3    0    3 |
|                  |                 0MiB /     0MiB  |           |                       |
+------------------+----------------------------------+-----------+-----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0    2    0           493989      C   nvidia-cuda-mps-server                  120MiB |
+-----------------------------------------------------------------------------------------+

Start L1 binaries using the below command.

Note

The CUDA_VISIBLE_DEVICES=<MIG-UUID> value can be obtained from the nvidia-smi -L command (as described in section above).

export CUDA_VISIBLE_DEVICES=<UUID of MIG-3> && export CUDA_MPS_PIPE_DIRECTORY=/tmp/$CUDA_VISIBLE_DEVICES && export CUDA_MPS_LOG_DIRECTORY=/var && export CUDA_DEVICE_MAX_CONNECTIONS=8 && sudo -E stdbuf -i0 -o0 -e0 /opt/nvidia/cuBB/build/cuPHY-CP/cuphycontroller/examples/cuphycontroller_scf P5G_FXN_GH

Both L1 and MPS server processes should now be running on GPU instance 2, which corresponds to MIG-3.

$ nvidia-smi
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.124.06             Driver Version: 570.124.06     CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GH200 480GB             On  |   00000009:01:00.0 Off |                   On |
| N/A   42C    P0            117W /  900W |   31209MiB /  97871MiB |     N/A      Default |
|                                         |                        |              Enabled |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| MIG devices:                                                                            |
+------------------+----------------------------------+-----------+-----------------------+
| GPU  GI  CI  MIG |                     Memory-Usage |        Vol|        Shared         |
|      ID  ID  Dev |                       BAR1-Usage | SM     Unc| CE ENC  DEC  OFA  JPG |
|                  |                                  |        ECC|                       |
|==================+==================================+===========+=======================|
|  0    1   0   0  |              58MiB / 47616MiB    | 64      0 |  4   0    4    0    4 |
|                  |                 0MiB /     0MiB  |           |                       |
+------------------+----------------------------------+-----------+-----------------------+
|  0    2   0   1  |           31151MiB / 47616MiB    | 60      0 |  3   0    3    0    3 |
|                  |                 0MiB /     0MiB  |           |                       |
+------------------+----------------------------------+-----------+-----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0    2    0           493989      C   nvidia-cuda-mps-server                  120MiB |
|    0    2    0           494009    M+C   .../examples/cuphycontroller_scf      30958MiB |
+-----------------------------------------------------------------------------------------+

Starting LLM on MIG#

Execute the following Docker command to start the LLM on MIG:

sudo docker run --cpuset-cpus="52-70" --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=<UUID of MIG 4 instance> --rm -it -p 8000:8000 nvcr.io/miicz8azigqf/fix_mid_ans_tmo_async_rag_gh200_llama3-70b-int4_with_engine:0.10.0

Adding Routes on CN and PDN#

Adding a PDN Route on CN#

Navigate to the /sbin folder on the CN machine and create a script named add-route.sh.

Add the following contents to the add-route.sh script. The PDN server IP is given as 169.254.200.1; modify this value as needed based on your PDN IP setup.

#!/bin/bash

container_id=`docker ps | grep dataplane | awk '{print$1}'`

echo "*************** Adding route to PDN inside VPP ***************"
echo -e "\n"
docker exec -it $container_id bash -c "vppctl ip route add 0.0.0.0/0 via 169.254.200.1 net1"

echo -e "\n"
echo "*************** Checking added route ***************"
echo -e "\n"
docker exec -it $container_id bash -c "vppctl show ip fib"

Provide full permissions permissions for the script: chmod 777 add-route.sh
Run the script: ./add-route.sh.

Note

This route may get deleted at some point, in which case you will need to run the add-route.sh script again. If CUE cannot connect to internet, this is an indication that the route was deleted on the CN.

Adding Routes on PDN to enable Internet#

The PDN server has 2 IP addresses:

PDN VM Interface
- IP: 192.168.122.11
- Interface name: enp6s0
PDN server Interface: The IP of this interface is configured on the CN machine.
- IP: 169.254.200.1
- Interface name: enp1s0

Add the first route for a UE IP range of 21.21.21.*.

iptables -t nat -A POSTROUTING -s 21.21.21.0/24 -p all -j SNAT --to-source 192.168.122.11

Create a script named internet_enable.sh with the content below.

Note

Ensure the WANIF and LANIF are set properly.

#! /bin/bash

IPTABLES=/sbin/iptables

WANIF='enp6s0'

LANIF='enp1s0'

# enable ip forwarding in the kernel

echo 'Enabling Kernel IP forwarding...'

/bin/echo 1 > /proc/sys/net/ipv4/ip_forward

# flush rules and delete chains

echo 'Flushing rules and deleting existing chains...'

$IPTABLES -F

$IPTABLES -X

# enable masquerading to allow LAN internet access

echo 'Enabling IP Masquerading and other rules...'

$IPTABLES -t nat -A POSTROUTING -o $LANIF -j MASQUERADE

$IPTABLES -A FORWARD -i $LANIF -o $WANIF -m state --state RELATED,ESTABLISHED -j ACCEPT

$IPTABLES -A FORWARD -i $WANIF -o $LANIF -j ACCEPT

$IPTABLES -t nat -A POSTROUTING -o $WANIF -j MASQUERADE

$IPTABLES -A FORWARD -i $WANIF -o $LANIF -m state --state RELATED,ESTABLISHED -j ACCEPT

$IPTABLES -A FORWARD -i $LANIF -o $WANIF -j ACCEPT

echo 'Done.'
$IPTABLES -X

# enable masquerading to allow LAN internet access

echo 'Enabling IP Masquerading and other rules...'

$IPTABLES -t nat -A POSTROUTING -o $LANIF -j MASQUERADE

$IPTABLES -A FORWARD -i $LANIF -o $WANIF -m state --state RELATED,ESTABLISHED -j ACCEPT

$IPTABLES -A FORWARD -i $WANIF -o $LANIF -j ACCEPT

$IPTABLES -t nat -A POSTROUTING -o $WANIF -j MASQUERADE

$IPTABLES -A FORWARD -i $WANIF -o $LANIF -m state --state RELATED,ESTABLISHED -j ACCEPT

$IPTABLES -A FORWARD -i $LANIF -o $WANIF -j ACCEPT

echo 'Done.'

Provide full permissions permissions for the script: chmod 777 internet_enable.sh
Run the script: ./internet_enable.sh

Note

You may need to add a proper nameserver entry in /etc/netplan/00-installer-config.yaml to ping the outside Internet. To get the DNS Server name, use the following command:

aerial@iperf-cn-vm:~$ systemd-resolve --status | grep "DNS Servers"
DNS Servers: 10.110.8.18