E2E gNodeB on MIG#

This page covers how to set up E2E gNodeB on MIG.

Setting up MIG for Aerial#

Check GPU Device availability#

To check the available GPUs on the system and get the GPU-ID, run the nvidia-smi -L command.

$ nvidia-smi -L
GPU 0: NVIDIA GH200 480GB (UUID: GPU-51c12aab-5ee1-2f10-a4b7-6baacfec5e31)

Partition GPUs#

  1. Run the nvidia-smi -i <GPU_ID> -mig 1 command to enable MIG mode on the GPU(s).

    Note

    If -i <GPU_ID> is not specified, then MIG mode is applied to all the GPUs on the system.

    $ sudo nvidia-smi -i 0 -mig 1
    Enabled MIG Mode for GPU 00000009:01:00.0
    All done.
    
  2. Check the available partition options using the nvidia-smi mig -lgip command.

    The following example displays the results from GH.

    $ sudo nvidia-smi mig -lgip
    +-----------------------------------------------------------------------------+
    | GPU instance profiles:                                                      |
    | GPU   Name             ID    Instances   Memory     P2P    SM    DEC   ENC  |
    |                              Free/Total   GiB              CE    JPEG  OFA  |
    |=============================================================================|
    |   0  MIG 1g.12gb       19     7/7        11.00      No     16     1     0   |
    |                                                             1     1     0   |
    +-----------------------------------------------------------------------------+
    |   0  MIG 1g.12gb+me    20     1/1        11.00      No     16     1     0   |
    |                                                             1     1     1   |
    +-----------------------------------------------------------------------------+
    |   0  MIG 1g.24gb       15     4/4        23.00      No     26     1     0   |
    |                                                             1     1     0   |
    +-----------------------------------------------------------------------------+
    |   0  MIG 2g.24gb       14     3/3        23.00      No     32     2     0   |
    |                                                             2     2     0   |
    +-----------------------------------------------------------------------------+
    |   0  MIG 3g.48gb        9     2/2        46.50      No     60     3     0   |
    |                                                             3     3     0   |
    +-----------------------------------------------------------------------------+
    |   0  MIG 4g.48gb        5     1/1        46.50      No     64     4     0   |
    |                                                             4     4     0   |
    +-----------------------------------------------------------------------------+
    |   0  MIG 7g.96gb        0     1/1        93.00      No     132    7     0   |
    |                                                             8     7     1   |
    +-----------------------------------------------------------------------------+
    
  3. Slice the GPU using the nvidia-smi mig -cgi <PROFILE> -C command.

    The following example uses one Profile with 4g and one with 3g.

    $ sudo nvidia-smi mig -cgi 9,5 -C
    Successfully created GPU instance ID  2 on GPU  0 using profile MIG 3g.48gb (ID  9)
    Successfully created compute instance ID  0 on GPU  0 GPU instance ID  2 using profile MIG 3g.48gb (ID  2)
    Successfully created GPU instance ID  1 on GPU  0 using profile MIG 4g.48gb (ID  5)
    Successfully created compute instance ID  0 on GPU  0 GPU instance ID  1 using profile MIG 4g.48gb (ID  3)
    
  4. Check the GPU partitions using the nvidia-smi -L command.

    The following example displays the results from GH.

    $ nvidia-smi -L
    GPU 0: NVIDIA GH200 480GB (UUID: GPU-51c12aab-5ee1-2f10-a4b7-6baacfec5e31)
    MIG 4g.48gb     Device  0: (UUID: MIG-e9f0fa8c-548f-5fc5-aa58-51ef34c2816a)
    MIG 3g.48gb     Device  1: (UUID: MIG-fcc563dc-5c8d-5de2-a448-439bde80400c)
    

Note

MIG mode is not persistent over reboots, so you may need to run above commands after each reboot.

Disabling MIG#

To disable MIG , use the nvidia-smi -i <GPU_ID> -mig 0 command.

Bringing up cuBB with a MIG Instance#

Start the cuBB Container#

To start the L1 container with a specific MIG instance, pass the CUDA_VISIBLE_DEVICES variable argument specifying the MIG instance UUID to docker run.

Tip

Run nvidia-smi -L (as described in section above) to get UUIDs for MIG instances.

The following example command launches the cuBB container with the MIG-3 instance.

$ sudo docker run --gpus all --restart unless-stopped -dP --network host --shm-size=4096m --privileged -it --device=/dev/gdrdrv:/dev/gdrdrv -v /lib/modules:/lib/modules -v /dev/hugepages:/dev/hugepages --userns=host --ipc=host -v /usr/src:/usr/src -v /home/aerial/nfs:/root  -v /home/aerial/nfs:/cuBBSrc -v /home/aerial/nfs:/home/aerial/nfs  -e CUDA_VISIBLE_DEVICES=<UUID of MIG-3> --name 25-1-mig nvidia:Aerial-cuBB-container-ubuntu22.04-25.01.0-Rel-25-1.284-aarch64 bash

If successful, the above command creates a container with the name “25-1-mig”.

Start L1 Binaries#

  1. Enter the cuBB container by running the following command.

    ``docker exec -it 25-1-mig /bin/bash``
    
  2. Create the bringup.sh script as shown below.

    #!/bin/bash
    
    # This script to be used after getting into the docker image
    
    export cuBB_SDK=$(pwd)
    
    mkdir build
    
    cd build
    
    cmake .. -DCMAKE_TOOLCHAIN_FILE=cuPHY/cmake/toolchains/native
    
    # Compile the code
    
    make -j $(nproc --all)
    
    export CUDA_VISIBLE_DEVICES=$(nvidia-smi -L|grep 'MIG 3g\.'| sed -n 's/.*(UUID: \(.*\))/\1/p')
    
    echo $CUDA_VISIBLE_DEVICES
    
    export CUDA_DEVICE_MAX_CONNECTIONS=8
    
    export CUDA_MPS_PIPE_DIRECTORY=/tmp/$CUDA_VISIBLE_DEVICES
    
    mkdir -p $CUDA_MPS_PIPE_DIRECTORY
    
    export CUDA_MPS_LOG_DIRECTORY=/var
    
    # Stop existing MPS
    
    echo "Stop existing mps"
    
    sudo -E echo quit | sudo -E nvidia-cuda-mps-control
    
    # Start MPS
    
    echo "Start mps"
    
    sudo -E nvidia-cuda-mps-control -d
    
    sudo -E echo start_server -uid 0 | sudo -E nvidia-cuda-mps-control
    
    exit 0
    
  3. Run the bringup.sh script.

  4. Use nvidia-smi to confirm that bringup.sh has started the MPS server process:

    $ nvidia-smi
    +-----------------------------------------------------------------------------------------+
    | NVIDIA-SMI 570.124.06             Driver Version: 570.124.06     CUDA Version: 12.8     |
    |-----------------------------------------+------------------------+----------------------+
    | GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
    |                                         |                        |               MIG M. |
    |=========================================+========================+======================|
    |   0  NVIDIA GH200 480GB             On  |   00000009:01:00.0 Off |                   On |
    | N/A   38C    P0            118W /  900W |                  N/A   |     N/A      Default |
    |                                         |                        |              Enabled |
    +-----------------------------------------+------------------------+----------------------+
    
    +-----------------------------------------------------------------------------------------+
    | MIG devices:                                                                            |
    +------------------+----------------------------------+-----------+-----------------------+
    | GPU  GI  CI  MIG |                     Memory-Usage |        Vol|        Shared         |
    |      ID  ID  Dev |                       BAR1-Usage | SM     Unc| CE ENC  DEC  OFA  JPG |
    |                  |                                  |        ECC|                       |
    |==================+==================================+===========+=======================|
    |  0    1   0   0  |              58MiB / 47616MiB    | 64      0 |  4   0    4    0    4 |
    |                  |                 0MiB /     0MiB  |           |                       |
    +------------------+----------------------------------+-----------+-----------------------+
    |  0    2   0   1  |             172MiB / 47616MiB    | 60      0 |  3   0    3    0    3 |
    |                  |                 0MiB /     0MiB  |           |                       |
    +------------------+----------------------------------+-----------+-----------------------+
    
    +-----------------------------------------------------------------------------------------+
    | Processes:                                                                              |
    |  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
    |        ID   ID                                                               Usage      |
    |=========================================================================================|
    |    0    2    0           493989      C   nvidia-cuda-mps-server                  120MiB |
    +-----------------------------------------------------------------------------------------+
    
  5. Start L1 binaries using the below command.

    Note

    The CUDA_VISIBLE_DEVICES=<MIG-UUID> value can be obtained from the nvidia-smi -L command (as described in section above).

    export CUDA_VISIBLE_DEVICES=<UUID of MIG-3> && export CUDA_MPS_PIPE_DIRECTORY=/tmp/$CUDA_VISIBLE_DEVICES && export CUDA_MPS_LOG_DIRECTORY=/var && export CUDA_DEVICE_MAX_CONNECTIONS=8 && sudo -E stdbuf -i0 -o0 -e0 /opt/nvidia/cuBB/build/cuPHY-CP/cuphycontroller/examples/cuphycontroller_scf P5G_FXN_GH
    

    Both L1 and MPS server processes should now be running on GPU instance 2, which corresponds to MIG-3.

    $ nvidia-smi
    +-----------------------------------------------------------------------------------------+
    | NVIDIA-SMI 570.124.06             Driver Version: 570.124.06     CUDA Version: 12.8     |
    |-----------------------------------------+------------------------+----------------------+
    | GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
    |                                         |                        |               MIG M. |
    |=========================================+========================+======================|
    |   0  NVIDIA GH200 480GB             On  |   00000009:01:00.0 Off |                   On |
    | N/A   42C    P0            117W /  900W |   31209MiB /  97871MiB |     N/A      Default |
    |                                         |                        |              Enabled |
    +-----------------------------------------+------------------------+----------------------+
    
    +-----------------------------------------------------------------------------------------+
    | MIG devices:                                                                            |
    +------------------+----------------------------------+-----------+-----------------------+
    | GPU  GI  CI  MIG |                     Memory-Usage |        Vol|        Shared         |
    |      ID  ID  Dev |                       BAR1-Usage | SM     Unc| CE ENC  DEC  OFA  JPG |
    |                  |                                  |        ECC|                       |
    |==================+==================================+===========+=======================|
    |  0    1   0   0  |              58MiB / 47616MiB    | 64      0 |  4   0    4    0    4 |
    |                  |                 0MiB /     0MiB  |           |                       |
    +------------------+----------------------------------+-----------+-----------------------+
    |  0    2   0   1  |           31151MiB / 47616MiB    | 60      0 |  3   0    3    0    3 |
    |                  |                 0MiB /     0MiB  |           |                       |
    +------------------+----------------------------------+-----------+-----------------------+
    
    +-----------------------------------------------------------------------------------------+
    | Processes:                                                                              |
    |  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
    |        ID   ID                                                               Usage      |
    |=========================================================================================|
    |    0    2    0           493989      C   nvidia-cuda-mps-server                  120MiB |
    |    0    2    0           494009    M+C   .../examples/cuphycontroller_scf      30958MiB |
    +-----------------------------------------------------------------------------------------+
    

Starting LLM on MIG#

Execute the following Docker command to start the LLM on MIG:

sudo docker run --cpuset-cpus="52-70" --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=<UUID of MIG 4 instance> --rm -it -p 8000:8000 nvcr.io/miicz8azigqf/fix_mid_ans_tmo_async_rag_gh200_llama3-70b-int4_with_engine:0.10.0

Adding Routes on CN and PDN#

Adding a PDN Route on CN#

  1. Navigate to the /sbin folder on the CN machine and create a script named add-route.sh.

  2. Add the following contents to the add-route.sh script. The PDN server IP is given as 169.254.200.1; modify this value as needed based on your PDN IP setup.

    #!/bin/bash
    
    container_id=`docker ps | grep dataplane | awk '{print$1}'`
    
    echo "*************** Adding route to PDN inside VPP ***************"
    echo -e "\n"
    docker exec -it $container_id bash -c "vppctl ip route add 0.0.0.0/0 via 169.254.200.1 net1"
    
    echo -e "\n"
    echo "*************** Checking added route ***************"
    echo -e "\n"
    docker exec -it $container_id bash -c "vppctl show ip fib"
    
  3. Provide full permissions permissions for the script: chmod 777 add-route.sh

  4. Run the script: ./add-route.sh.

    Note

    This route may get deleted at some point, in which case you will need to run the add-route.sh script again. If CUE cannot connect to internet, this is an indication that the route was deleted on the CN.

Adding Routes on PDN to enable Internet#

The PDN server has 2 IP addresses:

  • PDN VM Interface

    • IP: 192.168.122.11

    • Interface name: enp6s0

  • PDN server Interface: The IP of this interface is configured on the CN machine.

    • IP: 169.254.200.1

    • Interface name: enp1s0

  1. Add the first route for a UE IP range of 21.21.21.*.

    iptables -t nat -A POSTROUTING -s 21.21.21.0/24 -p all -j SNAT --to-source 192.168.122.11
    
  2. Create a script named internet_enable.sh with the content below.

    Note

    Ensure the WANIF and LANIF are set properly.

    #! /bin/bash
    
    IPTABLES=/sbin/iptables
    
    WANIF='enp6s0'
    
    LANIF='enp1s0'
    
    # enable ip forwarding in the kernel
    
    echo 'Enabling Kernel IP forwarding...'
    
    /bin/echo 1 > /proc/sys/net/ipv4/ip_forward
    
    # flush rules and delete chains
    
    echo 'Flushing rules and deleting existing chains...'
    
    $IPTABLES -F
    
    $IPTABLES -X
    
    # enable masquerading to allow LAN internet access
    
    echo 'Enabling IP Masquerading and other rules...'
    
    $IPTABLES -t nat -A POSTROUTING -o $LANIF -j MASQUERADE
    
    $IPTABLES -A FORWARD -i $LANIF -o $WANIF -m state --state RELATED,ESTABLISHED -j ACCEPT
    
    $IPTABLES -A FORWARD -i $WANIF -o $LANIF -j ACCEPT
    
    $IPTABLES -t nat -A POSTROUTING -o $WANIF -j MASQUERADE
    
    $IPTABLES -A FORWARD -i $WANIF -o $LANIF -m state --state RELATED,ESTABLISHED -j ACCEPT
    
    $IPTABLES -A FORWARD -i $LANIF -o $WANIF -j ACCEPT
    
    echo 'Done.'
    $IPTABLES -X
    
    # enable masquerading to allow LAN internet access
    
    echo 'Enabling IP Masquerading and other rules...'
    
    $IPTABLES -t nat -A POSTROUTING -o $LANIF -j MASQUERADE
    
    $IPTABLES -A FORWARD -i $LANIF -o $WANIF -m state --state RELATED,ESTABLISHED -j ACCEPT
    
    $IPTABLES -A FORWARD -i $WANIF -o $LANIF -j ACCEPT
    
    $IPTABLES -t nat -A POSTROUTING -o $WANIF -j MASQUERADE
    
    $IPTABLES -A FORWARD -i $WANIF -o $LANIF -m state --state RELATED,ESTABLISHED -j ACCEPT
    
    $IPTABLES -A FORWARD -i $LANIF -o $WANIF -j ACCEPT
    
    echo 'Done.'
    
  3. Provide full permissions permissions for the script: chmod 777 internet_enable.sh

  4. Run the script: ./internet_enable.sh

Note

You may need to add a proper nameserver entry in /etc/netplan/00-installer-config.yaml to ping the outside Internet. To get the DNS Server name, use the following command:

aerial@iperf-cn-vm:~$ systemd-resolve --status | grep "DNS Servers"
DNS Servers: 10.110.8.18