Aerial CUDA-Accelerated RAN#

Aerial CUDA-Accelerated RAN brings together the Aerial software for 5G and AI frameworks and the NVIDIA accelerated computing platform, enabling TCO reduction and unlocking infrastructure monetization for telcos.

Aerial CUDA-Accelerated RAN has the following key features:

  • Software-defined, scalable, modular, highly programmable and cloud-native, without any fixed function accelerators. Enables the ecosystem to flexibly adopt necessary modules for their commercial products.

  • Full-stack acceleration of DU L1, DU L2+, CU, UPF and other network functions, enabling workload consolidation for maximum performance and spectral efficiency, leading to best-in-class system TCO.

  • General purpose infrastructure, with multi-tenancy that can power both traditional workloads and cutting-edge AI applications for best-in-class RoA.

What’s New in 25-1

The following new features are available in release 25-1 for Aerial CUDA-Accelerated RAN:

  • Aerial cuPHY: CUDA accelerated inline PHY

    • cuPHY 4TR

      • 20x100MHz 4TR Peak Cells on GH200

      • NN PUSCH Channel Estimate

    • cuPHY 64TR

      • 3x100MHz 64TR Ave Cells w/ Mod Comp, on GH200

      • Reconfiguration of static beam weights to RU

    • SRS Configuration

      • Extended number of SRS UEs for mMIMO

      • BFW calculation for SRS unallocated RBs

      • Support 4 SRS symbols on S-slot

    • Operation/Redundancy/Resiliency

      • Dynamic OAM (Out-Of-Service) - configuration or modification of dl/ulBandwidth and eAxCID.

      • Logging on cloud platform

      • Version check for YAML configuration files

      • Enhanced cuBB_system_checks script to check versions and configurations required for cuBB test

      • Cooperative cancellation of GPU workload for PUSCH

      • Support for FH UL I/Q sample capture in case of CRC errors

  • Aerial cuMAC: CUDA accelerated MAC scheduler

    • cuMAC-CP

      • Functional Interface for 4T4R L2

    • cuMAC-Sch 4TR

      • 40x100MHz 4TR Ave Cells on GH200

      • Type 0 & 1, PF Parallel Riding Peaks.

      • UE Down Selection / TTI

      • PRB Allocation & Layer Selection

      • Link Adaptation (MCS OLLA) & AI - DRL-MCS

    • cuMAC-Sch 64TR

      • 3x100MHz 64TR Ave Cells on GH200

      • UE Sorting & down Selection

      • MU-MIMO user grouping (PRB allocation & Layer Selection) – Type 1 & flexible layers per UE.

      • Link Adaptation (MCS OLLA).

    • SRS Configuration

      • Wideband SRS, Aperiodic, non-inter cell

      • 40x100MHz 4TR Ave Cells

    • Baseline Scheduler – on CPU

      • MU-MIMO Type 1 SU-MIMO PRB allocation for Anchor UE and PF-Based greedy MU-MIMO user grouping.

  • Aerial E2E: System level / End-to-End validation

    • 4T4R 100MHz

      • 8 Peak Cells in E2E configuration (CN + RAN + UE-EM) validated in eCPRI setup.

      • Achieving aggregate DL throughput of 11.2Gbps and aggregate UL throughput of 1.68Gbps

      • AI- RAN: Validated 8 peak cell performance with MIG enabled

  • pyAerial: Python interface to Aerial cuPHY

    • CuPy-based API, in addition to the existing Numpy-based API

      • Significantly reduce copies between GPU and host memory

      • Improve interoperability with other frameworks supporting the CUDA array interface (PyTorch, Numba, etc.)

    • New configuration API for configuring pyAerial pipelines and components

    • SRS transmitter and receiver pipelines

    • SRS example notebook

    • CRC encoding

  • Performance

    • 1x100MHz 64T64R Peak cell / 3x100MHz 64T64R average cells

    • 20x100MHz 4T4R Peak cells

What’s New in 24-3

The following new features are available in release 24-3 for Aerial CUDA-Accelerated RAN:

  • Aerial cuPHY: CUDA accelerated inline PHY

    • Multi-cell support for mMImO (up to 3 cells)

    • Scheduling DL in special slots

    • Increase SRS slots in 4T4R and mMIMO

    • SRS CS multiplexing for different UEs

    • UL PUSCH channel estimation at PRG level

    • RKHS channel estimation

  • Aerial E2E: System level / End-to-End validation

    • Fronthaul Port Failover Validation (Active-Standby) of C/U/S-Planes

    • Concluded Ch.8 Conformance testing with PRACH

    • MIG validation of AI + RAN

  • Aerial Redundancy/Resiliency: CUDA accelerated RAN Redundancy/Resiliency features

    • RU Health Monitor - actively detect FH connectivity issues with ORU and take corrective action

    • Introduce L1 recovery period - If L1 is running late, drop FAPI messages for some time to allow L1 to recover

    • nvIPC pcap acquisition improvements - Introduced capability to add filters (cell-id , msg-id level) to nvIPC pcap acquisition

    • Backtrace output on console - Aerial prints backtrace on console in case of crash

  • Aerial cuMAC: CUDA accelerated MAC scheduler

    • DRL MCS selection module

      • Pre-trained neural networks available under aerial_sdk/cuMAC/testVectors

      • Inference based on TensorRT

    • 64TR MU-MIMO scheduler

      • UE sorting algorithm based on SRS SNR estimates

      • UE grouping algorithm based on SRS channel coefficient estimates

    • Aperiodic SRS resource manager

      • Combined with MU-MIMO UE sorting algorithm

    • 4T4R system simulation with GPU-based TDL channel model

    • Improved algorithms & CUDA implementation for type-0 and type-1 4T4R schedulers

  • pyAerial: Python interface to Aerial cuPHY

    • CSI-RS transmission pipeline

    • RSRP and pre- and post-equalizer SINR estimation

    • Carrier frequency offset and timing advance estimation

    • CRC checking

    • OFDM fading channel simulation

    • Support of multiple UE groups for PUSCH receiver pipeline and its components

    • An improved API to PUSCH receiver pipeline and its components

What’s New in 24-2.1

The following new features are available in release 24-2.1 for Aerial CUDA-Accelerated RAN:

  • Aerial cuPHY: CUDA accelerated inline PHY

    • 64T64R Massive MIMO:

      • 100 MHz DL max combined 16 layers + UL max combined 8 layers + SRS

      • 64T64R SRS + Dynamic + Static Beamforming Weights

      • Support multiple dynamic UE groups

      • Support flexible PRG size and PRB number

      • Support SRS buffer indexing from L2

      • Support non 2^n layers

      • Use different section IDs when splitting the C-Plane section

      • FH messaging for CSIRS + PDSCH and other channel combinations

    • Support GH200+BF3 as RU emulator platform

What’s New in 24-2

The following new features are available in release 24-2 for Aerial CUDA-Accelerated RAN:

  • Aerial cuPHY: CUDA accelerated inline PHY

    • MGX Grace Hopper multicell capacity w/ telco-grade traffic model

      • 20 peak loaded 4T4R @ 100MHz

      • Capacity also validated with more challenging traffic model

        • PUSCH and PDCCH symbols in the S-slot

    • L1-L2 interface enhancements

      • Separate FAPI request timelines for PDSCH and PDCCH

  • Aerial cuMAC: CUDA accelerated MAC scheduler

    • cuMAC-Sch

      • 4T4R CUDA implementation complete

    • cuMAC-CP

      • 4T4R implementation (Functional – early access)

  • Aerial cuBB/E2E: System level / End-to-End validation

    • Over-The-Air (OTA) validation:

      • CBRS O-RU

      • 8 UE OTA w/ 6 UE/TTI for > 8 hours

    • RedHat-OCP:

      • Multicell capacity validated on MGX (GH200+BF3)

    • O-RAN Fronthaul:

      • 16-bit fixed point IQ sample validated E2E (Keysight eLSU)

      • Simultaneous dual-port FH capability (8 peak cells; 4 per port)

    • L2 integration:

      • Multi-L2 container instances per L1 validated E2E

  • pyAerial: Python interface to Aerial cuPHY

    • TensorRT inference engine

      • Jupyter notebook example using pyAerial to validate a neural PUSCH receiver

    • LDPC API improvements

      • Added soft outputs to LDPC decoder

    • LS channel estimation

    • Limited support for Grace Hopper

      • Run pyAerial together with Aerial Data Lakes