LLRNet: Dataset generation#

The wireless ML design flow using Aerial is depicted in the figure below.

ML design flow

In this notebook, we take data generated in the Using pyAerial for data generation by simulation example and generate a dataset for training LLRNet using pyAerial. Note that the data is assumed to have been generated prior to running this notebook.

LLRNet, published in

  1. Shental, J. Hoydis, “’Machine LLRning’: Learning to Softly Demodulate”, https://arxiv.org/abs/1907.01512

is a simple neural network model that takes equalizer outputs, i.e. the complex-valued equalized symbols, as its input and outputs the corresponding log-likelihood ratios (LLRs) for each bit. This model is used to demonstrate the whole ML design flow using Aerial, from capturing the data to deploying the model into 5G NR PUSCH receiver, replacing the conventional soft demapper in cuPHY. In this notebook a dataset is generated. We use pyAerial to call cuPHY functionality to get equalized symbols out for pre-captured/-simulated Rx data, as well as the corresponding log-likelihood ratios from a conventional soft demapper.

Imports#

[1]:
%matplotlib widget
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0"

import cupy as cp
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from tqdm.notebook import tqdm
from IPython.display import Markdown
from IPython.display import display
import warnings

from aerial.phy5g.algorithms import ChannelEstimator
from aerial.phy5g.algorithms import NoiseIntfEstimator
from aerial.phy5g.algorithms import ChannelEqualizer
from aerial.phy5g.algorithms import Demapper
from aerial.phy5g.ldpc import LdpcDeRateMatch
from aerial.phy5g.ldpc import LdpcDecoder
from aerial.phy5g.ldpc import CrcChecker
from aerial.phy5g.config import PuschConfig
from aerial.phy5g.config import PuschUeConfig
from aerial.util.data import PuschRecord
from aerial.util.data import load_pickle
from aerial.util.data import save_pickle
from aerial.util.cuda import CudaStream
from aerial.util.fapi import dmrs_fapi_to_bit_array

Load the source data#

The source data can be either real data collected from an over the air setup, or synthetic data generated by simulation.

Note: This notebook uses data generated using this notebook: Using pyAerial for data generation by simulation, which needs to be run before this notebook.

[2]:
# This is the source data directory which is assumed to contain the source data.
DATA_DIR = "data/"
source_dataset_dir = DATA_DIR + "example_simulated_dataset/QPSK/"
# This is the target dataset directory. It gets created if it does not exist.
target_dataset_dir = DATA_DIR + "example_llrnet_dataset/QPSK/"
os.makedirs(target_dataset_dir, exist_ok=True)

# Load the main data file.
try:
    df = pd.read_parquet(source_dataset_dir + "l2_metadata.parquet", engine="pyarrow")
except FileNotFoundError:
    display(Markdown("**Data not found - has example_simulated_dataset.ipynb been run?**"))

print(f"Loaded {df.shape[0]} PUSCH records.")
Loaded 20000 PUSCH records.

Dataset generation#

Here, pyAerial is used to run channel estimation, noise/interference estimation and channel equalization to get the equalized symbols, corresponding to the LLRNet input, as well as the log-likelihood ratios, corresponding to the LLRNet target output.

[3]:
# Treat warnings as errors to catch any numerical issues during processing.
warnings.filterwarnings("error")

cuda_stream = CudaStream()

# Take modulation order from the first record. The assumption is that all
# entries have the same modulation order here.
mod_order = df.loc[0].qamModOrder
# These hard-coded too.
num_rx_ant = 2
enable_pusch_tdi = 1
eq_coeff_algo = 1

# Create the PUSCH Rx components for extracting the equalized symbols and log-likelihood ratios.
channel_estimator = ChannelEstimator(
    num_rx_ant=num_rx_ant,
    cuda_stream=cuda_stream
)
noise_intf_estimator = NoiseIntfEstimator(
    num_rx_ant=num_rx_ant,
    eq_coeff_algo=eq_coeff_algo,
    cuda_stream=cuda_stream
)
channel_equalizer = ChannelEqualizer(
    num_rx_ant=num_rx_ant,
    eq_coeff_algo=eq_coeff_algo,
    enable_pusch_tdi=enable_pusch_tdi,
    cuda_stream=cuda_stream)
derate_match = LdpcDeRateMatch(enable_scrambling=True, cuda_stream=cuda_stream)
demapper = Demapper(mod_order=mod_order)
decoder = LdpcDecoder(cuda_stream=cuda_stream)
crc_checker = CrcChecker(cuda_stream=cuda_stream)

# Loop through the PUSCH records and create new ones.
pusch_records = []
tb_errors = []
snrs = []
for pusch_record in (pbar := tqdm(df.itertuples(index=False), total=df.shape[0])):

    pbar.set_description("Running cuPHY to get equalized symbols and log-likelihood ratios...")

    tbs = len(pusch_record.macPdu)
    ref_tb = pusch_record.macPdu
    slot = pusch_record.Slot

    # Just making sure the hard-coded value is correct.
    assert mod_order == pusch_record.qamModOrder

    # Wrap the parameters in a PuschConfig structure.
    pusch_ue_config = PuschUeConfig(
        scid=pusch_record.SCID,
        layers=pusch_record.nrOfLayers,
        dmrs_ports=pusch_record.dmrsPorts,
        rnti=pusch_record.RNTI,
        data_scid=pusch_record.dataScramblingId,
        mcs_table=pusch_record.mcsTable,
        mcs_index=pusch_record.mcsIndex,
        code_rate=pusch_record.targetCodeRate,
        mod_order=pusch_record.qamModOrder,
        tb_size=len(pusch_record.macPdu)
    )
    # Note that this is a list. One UE group only in this case.
    pusch_configs = [PuschConfig(
        ue_configs=[pusch_ue_config],
        num_dmrs_cdm_grps_no_data=pusch_record.numDmrsCdmGrpsNoData,
        dmrs_scrm_id=pusch_record.ulDmrsScramblingId,
        start_prb=pusch_record.rbStart,
        num_prbs=pusch_record.rbSize,
        dmrs_syms=dmrs_fapi_to_bit_array(pusch_record.ulDmrsSymbPos),
        dmrs_max_len=1,
        dmrs_add_ln_pos=1,
        start_sym=pusch_record.StartSymbolIndex,
        num_symbols=pusch_record.NrOfSymbols
    )]

    # Load received IQ samples and move to GPU for processing.
    rx_iq_data_filename = source_dataset_dir + pusch_record.rx_iq_data_filename
    rx_slot_np = load_pickle(rx_iq_data_filename)
    num_rx_ant = rx_slot_np.shape[2]
    rx_slot = cp.array(rx_slot_np)

    # Load user data.
    user_data_filename = source_dataset_dir + pusch_record.user_data_filename
    user_data = load_pickle(user_data_filename)

    # Run the channel estimation (cuPHY).
    ch_est = channel_estimator.estimate(
        rx_slot=rx_slot,
        slot=slot,
        pusch_configs=pusch_configs
    )

    # Run noise/interference estimation (cuPHY), needed for equalization.
    lw_inv, noise_var_pre_eq = noise_intf_estimator.estimate(
        rx_slot=rx_slot,
        channel_est=ch_est,
        slot=slot,
        pusch_configs=pusch_configs
    )

    # Run equalization and mapping to log-likelihood ratios.
    llrs, equalized_sym = channel_equalizer.equalize(
        rx_slot=rx_slot,
        channel_est=ch_est,
        lw_inv=lw_inv,
        noise_var_pre_eq=noise_var_pre_eq,
        pusch_configs=pusch_configs
    )
    ree_diag_inv = channel_equalizer.ree_diag_inv[0]
    ree_diag_inv = cp.transpose(ree_diag_inv[..., 0], (1, 2, 0)).reshape(ree_diag_inv.shape[1], -1)

    # Run the demapper on all data symbols (CuPy arrays stay on GPU).
    map_llrs = demapper.demap(equalized_sym[0][0, :, :], ree_diag_inv[0, :, None])

    # Pick data from one data symbol for the LLRNet dataset and move to CPU.
    user_data["llrs"] = llrs[0][:mod_order, 0, :, 0].get()
    user_data["eq_syms"] = equalized_sym[0][0, :, 0].get()
    user_data["map_llrs"] = map_llrs[:, :, 0].get()

    # Save pickle files for the target dataset (using original NumPy arrays).
    rx_iq_data_fullpath = target_dataset_dir + pusch_record.rx_iq_data_filename
    user_data_fullpath = target_dataset_dir + pusch_record.user_data_filename
    save_pickle(data=rx_slot_np, filename=rx_iq_data_fullpath)
    save_pickle(data=user_data, filename=user_data_fullpath)

    pusch_records.append(pusch_record)

    #######################################################################################
    # Run through the rest of the receiver pipeline to verify that this was legit LLR data.

    # De-rate matching and descrambling.
    coded_blocks = derate_match.derate_match(
        input_llrs=[map_llrs[:, None, :, :]],
        pusch_configs=pusch_configs
    )

    # LDPC decoding of the derate matched blocks.
    code_blocks = decoder.decode(
        input_llrs=coded_blocks,
        pusch_configs=pusch_configs
    )

    # Combine the code blocks into a transport block.
    tb, _ = crc_checker.check_crc(
        input_bits=code_blocks,
        pusch_configs=pusch_configs
    )

    tb_errors.append(not np.array_equal(tb[0][:tbs].get(), ref_tb[:tbs]))
    snrs.append(user_data["snr"])

[4]:
print("Saving...")
df_filename = os.path.join(target_dataset_dir, "l2_metadata.parquet")
df = pd.DataFrame.from_records(pusch_records, columns=PuschRecord._fields)
df.to_parquet(df_filename, engine="pyarrow")
print("All done!")
Saving...
All done!

BLER verification#

Plot the block error rate (BLER) as a function of Es/No to verify that the generated dataset produces reasonable receiver performance.

[5]:
# Suppress traitlets/ipympl DeprecationWarning triggered by plt.figure().
warnings.filterwarnings("ignore", category=DeprecationWarning, module="traitlets")

# Compute BLER per unique Es/No value.
snrs = np.array(snrs)
tb_errors = np.array(tb_errors)
unique_snrs = np.unique(snrs)
bler = np.array([tb_errors[snrs == s].mean() for s in unique_snrs])

fig = plt.figure()
plt.scatter(unique_snrs, bler, marker="d", color="blue")
plt.yscale("log")
plt.ylim(0.001, 1)
plt.xlim(unique_snrs.min(), unique_snrs.max())
plt.title("Receiver BLER Performance vs. Es/No")
plt.ylabel("BLER")
plt.xlabel("Es/No [dB]")
plt.grid()
plt.show()