Aerial CUDA-Accelerated RAN#

Aerial CUDA-Accelerated RAN brings together the Aerial software for 5G and AI frameworks and the NVIDIA accelerated computing platform, enabling TCO reduction and unlocking infrastructure monetization for telcos.

Aerial CUDA-Accelerated RAN has the following key features:

  • Software-defined, scalable, modular, highly programmable and cloud-native, without any fixed function accelerators. Enables the ecosystem to flexibly adopt necessary modules for their commercial products.

  • Full-stack acceleration of DU L1, DU L2+, CU, UPF and other network functions, enabling workload consolidation for maximum performance and spectral efficiency, leading to best-in-class system TCO.

  • General purpose infrastructure, with multi-tenancy that can power both traditional workloads and cutting-edge AI applications for best-in-class RoA.

What’s New in 25-2

  • Aerial cuPHY: CUDA accelerated inline PHY

    • cuPHY 4TR

      • Extension of reserved beam Id for static beamforming

      • Option for L1 to send static BFW to RU always

      • cuBB functionality for 64 UEs/TTI without timing closure

      • cuBB functionality for expanded number of SRS UEs without timing closure

      • cuBB functionality for 40MHz BW support without timing closure

      • cuBB functionality for CA (100 + 40 + 40 MHz) without timing closure

    • SRS Configuration

      • 192 SRS resource for mMIMO

    • Operation / Redundancy / Resiliency

      • PTP status monitoring

      • Logging improvement

  • Aerial cuMAC: CUDA accelerated MAC scheduler

    • cuMAC-CP

      • CUDA Optimizations

        • Memory copies for data re-arrangement from per-cell buffers to cell-group buffers

        • Optimized concurrent CUDA memory copy and CUDA kernel execution for different slots

      • Testing Framework

        • A new testbench for L2 stack integration of CUDA-based proportional-fairness metric computation and sorting module with latency measurements

        • Helps demonstrate benefits of L2 scheduler offloading to GPU

      • Performance Improvements

        • cuMAC-CP CPU-GPU memory data movement time decreased by 20%

        • cuMAC-CP standalone test peak cell capacity increased from 8 cells to 12 cells of 100MHz 4T4R

    • cuMAC-Sch 64TR

      • UE grouping algorithms

      • Dynamic per-TTI UE grouping

      • Semi-persistent UE grouping

    • cuMAC-Sch 64TR

      • UE grouping algorithms

        • Dynamic per-TTI UE grouping

        • Semi-persistent UE grouping

      • Beamforming

        • CUDA-based multi-cell regularized-zero-forcing beamforming module

      • SRS Optimization

        • SRS uplink transmit power control that is compliant with 3GPP 5G specs

      • Control Plane Enhancements

        • Expanded API parameters and data buffers to support advanced features of 64T64R MU-MIMO UE grouping, scheduling, and beamforming

      • A new testbench for the validation and testing of 64T64R MU-MIMO scheduler

  • Aerial E2E: System level / End-to-End validation

    • 4T4R 100MHz (w/ CapGemini) – 20x cells

      • UL only: 213Mbps / DL only: 1.25Gbps

      • UL + DL: DL = 1.1Gbps and UL = 145Mbps

    • 64T64R 100MHz (w/ CapGemini) – 1x cell

      • 2 UEs (2 Layers each) and achieved peak Tput (1.44 Gbps)

    • 4T4R ARC (w/ OAI)

      • WNC O-RU Integrated

      • 4T4R SRS Validated in TestMAC

      • Support 55 UE(s) attached

      • Multi-L2 (to single L1) achieve 2 cells with Traffic

        • DL = 790Mbps and UL = 90Mbps per Cell

    • Data Lakes

      • Real Time capability

      • Multi-cell (up to 4) IQ capture

  • Performance

    • 20x100MHz 4T4R Peak cells, 4DL/2UL

    • 6x100MHz 64T64R peak cells

      • 16DL/4UL layers with early-HARQ

      • 16DL/8UL layers without early-HARQ

  • Robustness and Resiliency

    • E2E Resiliency

      • FH Network supports EVPN-MH

      • Dual PTP capability

      • GM holdover validated with SN5400 switch

    • L1

      • Error handling on non-recoverable DOCA and NIC initialization

      • Error handling when L1 and L2 have mismatched cell_group_num and L2 scheduled more cells than expected

      • Checks if FAPI PDU is under L1 limits and corrupted eCPRI header

      • Fix error codes sent in ERROR indication to L2 when slot is dropped

      • Sends the appropriate Error indication for various RX failure cases

      • Improve error handling for invalid reconfigure

What’s New in 25-1

The following new features are available in release 25-1 for Aerial CUDA-Accelerated RAN:

  • Aerial cuPHY: CUDA accelerated inline PHY

    • cuPHY 4TR

      • 20x100MHz 4TR Peak Cells on GH200

      • NN PUSCH Channel Estimate

    • cuPHY 64TR

      • 3x100MHz 64TR Ave Cells w/ Mod Comp, on GH200

      • Reconfiguration of static beam weights to RU

    • SRS Configuration

      • Extended number of SRS UEs for mMIMO

      • BFW calculation for SRS unallocated RBs

      • Support 4 SRS symbols on S-slot

    • Operation/Redundancy/Resiliency

      • Dynamic OAM (Out-Of-Service) - configuration or modification of dl/ulBandwidth and eAxCID.

      • Logging on cloud platform

      • Version check for YAML configuration files

      • Enhanced cuBB_system_checks script to check versions and configurations required for cuBB test

      • Cooperative cancellation of GPU workload for PUSCH

      • Support for FH UL I/Q sample capture in case of CRC errors

  • Aerial cuMAC: CUDA accelerated MAC scheduler

    • cuMAC-CP

      • Functional Interface for 4T4R L2

    • cuMAC-Sch 4TR

      • 40x100MHz 4TR Ave Cells on GH200

      • Type 0 & 1, PF Parallel Riding Peaks.

      • UE Down Selection / TTI

      • PRB Allocation & Layer Selection

      • Link Adaptation (MCS OLLA) & AI - DRL-MCS

    • cuMAC-Sch 64TR

      • 3x100MHz 64TR Ave Cells on GH200

      • UE Sorting & down Selection

      • MU-MIMO user grouping (PRB allocation & Layer Selection) – Type 1 & flexible layers per UE.

      • Link Adaptation (MCS OLLA).

    • SRS Configuration

      • Wideband SRS, Aperiodic, non-inter cell

      • 40x100MHz 4TR Ave Cells

    • Baseline Scheduler – on CPU

      • MU-MIMO Type 1 SU-MIMO PRB allocation for Anchor UE and PF-Based greedy MU-MIMO user grouping.

  • Aerial E2E: System level / End-to-End validation

    • 4T4R 100MHz

      • 8 Peak Cells in E2E configuration (CN + RAN + UE-EM) validated in eCPRI setup.

      • Achieving aggregate DL throughput of 11.2Gbps and aggregate UL throughput of 1.68Gbps

      • AI- RAN: Validated 8 peak cell performance with MIG enabled

  • pyAerial: Python interface to Aerial cuPHY

    • CuPy-based API, in addition to the existing Numpy-based API

      • Significantly reduce copies between GPU and host memory

      • Improve interoperability with other frameworks supporting the CUDA array interface (PyTorch, Numba, etc.)

    • New configuration API for configuring pyAerial pipelines and components

    • SRS transmitter and receiver pipelines

    • SRS example notebook

    • CRC encoding

  • Performance

    • 1x100MHz 64T64R Peak cell / 3x100MHz 64T64R average cells

    • 20x100MHz 4T4R Peak cells

What’s New in 24-3

The following new features are available in release 24-3 for Aerial CUDA-Accelerated RAN:

  • Aerial cuPHY: CUDA accelerated inline PHY

    • Multi-cell support for mMImO (up to 3 cells)

    • Scheduling DL in special slots

    • Increase SRS slots in 4T4R and mMIMO

    • SRS CS multiplexing for different UEs

    • UL PUSCH channel estimation at PRG level

    • RKHS channel estimation

  • Aerial E2E: System level / End-to-End validation

    • Fronthaul Port Failover Validation (Active-Standby) of C/U/S-Planes

    • Concluded Ch.8 Conformance testing with PRACH

    • MIG validation of AI + RAN

  • Aerial Redundancy/Resiliency: CUDA accelerated RAN Redundancy/Resiliency features

    • RU Health Monitor - actively detect FH connectivity issues with ORU and take corrective action

    • Introduce L1 recovery period - If L1 is running late, drop FAPI messages for some time to allow L1 to recover

    • nvIPC pcap acquisition improvements - Introduced capability to add filters (cell-id , msg-id level) to nvIPC pcap acquisition

    • Backtrace output on console - Aerial prints backtrace on console in case of crash

  • Aerial cuMAC: CUDA accelerated MAC scheduler

    • DRL MCS selection module

      • Pre-trained neural networks available under aerial_sdk/cuMAC/testVectors

      • Inference based on TensorRT

    • 64TR MU-MIMO scheduler

      • UE sorting algorithm based on SRS SNR estimates

      • UE grouping algorithm based on SRS channel coefficient estimates

    • Aperiodic SRS resource manager

      • Combined with MU-MIMO UE sorting algorithm

    • 4T4R system simulation with GPU-based TDL channel model

    • Improved algorithms & CUDA implementation for type-0 and type-1 4T4R schedulers

  • pyAerial: Python interface to Aerial cuPHY

    • CSI-RS transmission pipeline

    • RSRP and pre- and post-equalizer SINR estimation

    • Carrier frequency offset and timing advance estimation

    • CRC checking

    • OFDM fading channel simulation

    • Support of multiple UE groups for PUSCH receiver pipeline and its components

    • An improved API to PUSCH receiver pipeline and its components

What’s New in 24-2.1

The following new features are available in release 24-2.1 for Aerial CUDA-Accelerated RAN:

  • Aerial cuPHY: CUDA accelerated inline PHY

    • 64T64R Massive MIMO:

      • 100 MHz DL max combined 16 layers + UL max combined 8 layers + SRS

      • 64T64R SRS + Dynamic + Static Beamforming Weights

      • Support multiple dynamic UE groups

      • Support flexible PRG size and PRB number

      • Support SRS buffer indexing from L2

      • Support non 2^n layers

      • Use different section IDs when splitting the C-Plane section

      • FH messaging for CSIRS + PDSCH and other channel combinations

    • Support GH200+BF3 as RU emulator platform

What’s New in 24-2

The following new features are available in release 24-2 for Aerial CUDA-Accelerated RAN:

  • Aerial cuPHY: CUDA accelerated inline PHY

    • MGX Grace Hopper multicell capacity w/ telco-grade traffic model

      • 20 peak loaded 4T4R @ 100MHz

      • Capacity also validated with more challenging traffic model

        • PUSCH and PDCCH symbols in the S-slot

    • L1-L2 interface enhancements

      • Separate FAPI request timelines for PDSCH and PDCCH

  • Aerial cuMAC: CUDA accelerated MAC scheduler

    • cuMAC-Sch

      • 4T4R CUDA implementation complete

    • cuMAC-CP

      • 4T4R implementation (Functional – early access)

  • Aerial cuBB/E2E: System level / End-to-End validation

    • Over-The-Air (OTA) validation:

      • CBRS O-RU

      • 8 UE OTA w/ 6 UE/TTI for > 8 hours

    • RedHat-OCP:

      • Multicell capacity validated on MGX (GH200+BF3)

    • O-RAN Fronthaul:

      • 16-bit fixed point IQ sample validated E2E (Keysight eLSU)

      • Simultaneous dual-port FH capability (8 peak cells; 4 per port)

    • L2 integration:

      • Multi-L2 container instances per L1 validated E2E

  • pyAerial: Python interface to Aerial cuPHY

    • TensorRT inference engine

      • Jupyter notebook example using pyAerial to validate a neural PUSCH receiver

    • LDPC API improvements

      • Added soft outputs to LDPC decoder

    • LS channel estimation

    • Limited support for Grace Hopper

      • Run pyAerial together with Aerial Data Lakes