NVIDIA Sionna Neural Radio Framework - Aerial Data Lake

6G will be artificial intelligence (AI) native. AI and machine learning (ML) will extend through all aspects of next generation networks from the radio, baseband processing, the network core including system management, orchestration and dynamic optimization processes. GPU hardware, together with programming frameworks will be essential to realize this vision of a software defined native-AI communication infrastructure.

The application of AI/ML in the physical layer has particularly been a hot research topic.

There is no AI without data. While the synthetic data generation capabilities of Aerial Omniverse Digital Twin (AODT) and Sionna/SionnaRT are essential aspects of a research project, availability of over-the-air (OTA) waveform data from real-time systems is equally important. This is the role of Aerial Data Lake. It is a data capture platform supporting the capture of OTA radio frequency (RF) data from virtual radio access network (vRAN) networks built on the Aerial CUDA-Accelerated RAN. Aerial Data Lake consists of a data capture application (app) running on the base station (BS) distributed unit (DU), a database of samples collected by the app, and an application programming interface (API) for accessing the database.

Figure 1 shows a gNB instrumented with the Aerial Data Lake platform. Uplink I/Q data from one or more O-RAN radio units (O-RUs) is delivered to GPU memory where it is both processed by the Aerial L1 PUSCH baseband pipeline and delivered to host CPU memory. The Aerial Data Lake collector process writes the I/Q samples to the Aerial Data Lake database. The collector app also filters the rx_data.indication and ul_tti.request FAPI messages from the layer-2 function and writes them to the database. The fields in the data structures associated with these messages form the basis for the database access APIs.

Each gNB in a network testbed collects data from all O-RUs associated with it. That is, data collection over the span of a network is performed in a distributed manner, each gNB is building its own local database. Training can be performed locally at each gNB, and site-specific optimizations can be realized with this approach. Since the data in a database is time-stamped, the local databases can be consolidated at a centralized compute resource and training performed using the time aligned aggregated data.

The Aerial Data Lake database storage requirements depend on the number of O-RUs, the antenna configuration of the O-RU, the carrier bandwidth, the TDD pattern and the number of samples to be collected. For example, for a single, to collect 1E6 I/Q samples with a 4T4R O-RU employing a single 100MHz carrier, 0.66 TB of storage is required.

Aerial Data Lake database comprises the fronthaul RF data. However, for many training applications access to data at other nodes in the receive pipeline is required. A pyAerial pipeline, together with the Data Lake database APIs, can access samples from an Aerial Data Lake database and transform that data into training data for any function in the pipeline. Figure 2 illustrates data ingress from a Data Lake database into a pyAerial pipeline and using standard Python file I/O to generate training data for a soft de-mapper.

Key Features

Aerial Data Lake has the following key features:

Feature 1: Real-time capture of RF data from OTA testbed

  • Aerial Data Lake is designed to operate with gNBs built on the Aerial CUDA-Accelerated RAN and that employ the Small Cell Forum FAPI interface between L2 and L1. One example system being the NVIDIA ARC-OTA network testbed. I/Q samples from O-RUs connected to the GPU platform via a O-RAN 7.2x split fronthaul interface are delivered to the host CPU and exported to the Aerial Data Lake database.

Feature 2: Aerial Data Lake APIs to access the RF database

  • The layer-2 messages rx_Data.indication and ul.tti.request are filtered from the L2/L1 FAPI message stream and exported to the database. The fields in these data structures form the basis of the database access APIs.

Feature 3: Scalable and time coherent over arbitrary number of BSs

  • The data collection app runs on the same CPU that supports the DU. It only consumes a small number of CPU cores. Because each BS is responsible for collecting its own uplink data, the collection process scales as more BSs are added to the network testbed. Databases are time-stamped and so data collected over multiple BSs can be used in a training flow in a time-coherent manner.

Feature 4: Use in conjunction with pyAerial to generate training data for neural network physical layer designs

  • Aerial Data Lake can be used in conjunction with the NVIDIA pyAerial CUDA-Accelerated Python L1 library. Using the Data Lake database APIs, pyAerial can access RF samples in a Data Lake database and transform those samples into training data for all the signal processing functions in an uplink or downlink pipeline.

Target Audience

Industry and university researchers and developers looking to bring ML to the physical layer with the end goal of benchmarking on OTA testbeds like NVIDIA ARC-OTA or other GPU-based BSs.

Value Proposition

Capture real-world data from the gNBs built on Aerial layer-1, such as ARC-OTA, and enable training in the PHY. Transform OTA RF sample captures into data for training layer-1 functions or compositions of multiple functions.

data_capture_platform.png

Figure 1: The Aerial Data Lake data capture platform. Uplink I/Q data from one or more O-RUs is delivered to GPU memory where it is both processed by the Aerial L1 PUSCH baseband pipeline and delivered to host CPU memory. The Aerial Data Lake Collector process writes the I/Q samples to the Aerial Data Lake Database. The collector app also filters the rx_data.indication and ul_tti.request FAPI messages from the layer-2 function and writes them to the database. The fields in the data structures associated with these messages form the basis of the database access APIs.

data_lake_db_example.png

Figure 2: pyAerial is used in conjunction with the NVIDIA data collection platform, namely, Aerial Data Lake to build training data sets for any node in the layer-1 downlink or uplink signal processing pipeline. The example shows a Data Lake database of over-the-air samples transformed into training data for a neural network soft de-mapper.

© Copyright 2024, NVIDIA. Last updated on Apr 19, 2024.