Is this page helpful?

Aerial CUDA-Accelerated RAN#

Aerial CUDA-Accelerated RAN is an SDK (Software Development Kit) for building commercial-grade, AI-native, 3GPP, and O-RAN compliant 5G/6G gNB software on NVIDIA-accelerated computing platforms.

Aerial CUDA-Accelerated RAN has the following key features:

Software-defined, scalable, modular, highly programmable and AI-native, without any fixed function accelerators. Enables the ecosystem to flexibly adopt necessary modules for their commercial products.
General purpose infrastructure, enabling multi-tenancy that can power both traditional RAN workloads and cutting-edge AI applications for best-in-class return on assets (RoA).

What’s New in 26-1

L1
- Support cuBB loopback test to run RU Emulator on the same host
- Support mixed ModComp and BFP on different cells
- Support flexible PRG size for SRS channel estimation report to L2
- Enhance C/U-plane header validation in RU Emulator
- Improve error handling of unsupported combinations of SRS with other channels
- Improve error handling in recoverable condition
cuMAC
- Support for SRS-based 64T64R MU-MIMO UE pairing with GPU memory sharing between cuPHY and cuMAC
- A multi-process testbench for SRS-based 64T64R MU-MIMO UE pairing, supporting two operating modes:
  - Standard message passing between L2 and cuMAC-CP to transfer SRS channel estimates
  - SRS GPU memory sharing between cuPHY and cuMAC
E2E (GH200)
- Achieved 20 Peak 4T4R cells orchestrated via K8s
- 1 Peak throughput MU-MIMO cell with 12Layers (using BFP9 & ModComp)
- Multi-UE capability (up to 16UEs) tested on peak cell
E2E (DGX Spark)
- Achieved 1 Peak cell (DL = 1.5Gbps, UL = 210Mbps)
- Validated CX7 performance and timing capabilities

What’s New in 25-3

Aerial cuPHY: CUDA accelerated inline PHY
- Features performance optimized
  - AAL - Weighted average CFO
  - AAL - UL HARQ buffer management
  - AAL - DL transmission failure notification
  - TTI bundling enabling VoNR
- Features functionally optimized
  - Wireless 32 DL layers functionality without timing closure
  - 32-port CSI-RS without timing closure
Performance
- 6x100MHz 64T64R peak cells
  - 16DL/8UL layers with early-HARQ
Aerial cuMAC: CUDA accelerated MAC scheduler
- Per-UE/logical channel proportional-fairness metric (PFM) compute and sorting (for standard and massive MIMO)
- MU-MIMO UE grouping: channel orthogonality metric compute and UE grouping (without branching)
- 3rd party stack integration with cuMAC
- Scaling for mMIMO - up to 6 cells
Aerial Data Collection: E3 Agent
- Component of dApp Framework (to be released later)
E2E
- Massive MIMO (64T64R) OTA Field Trial Validation
  - Single Cell, BW = 100MHz
  - Simultaneous MU-MIMO up to 16UEs (1 Layer each) and 8UEs (2 Layers each)
- 20 cells (4T4R) validated
  - BW = 100MHz, 4DL layers and 2 UL layers
- - FH Port Resiliency
  - Open Virtual Switch (OVS)
  - Dual Port PTP
  - EVPN-MH FH Transport Network
- FH Compression validated for Massive MIMO (64T64R)
  - BFP9 and ModComp for 6UEs (1 Layer each) and 4UEs (2 Layers each)
- Enabled Multi-UE (up to 100UEs) traffic sessions

What’s New in 25-2

Aerial cuPHY: CUDA accelerated inline PHY
- cuPHY 64TR
  - Extension of reserved beam Id for static beamforming
  - Option for L1 to send static BFW to RU always
  - cuBB functionality for 64 UEs/TTI without timing closure
  - cuBB functionality for expanded number of SRS UEs without timing closure
  - cuBB functionality for 40MHz BW support without timing closure
  - cuBB functionality for CA (100 + 40 + 40 MHz) without timing closure
- SRS Configuration
  - 192 SRS resource / 40 ms for mMIMO
- Operation / Redundancy / Resiliency
  - PTP status monitoring
  - Logging improvement
Aerial cuMAC: CUDA accelerated MAC scheduler
- cuMAC-CP
  - CUDA Optimizations
    - Memory copies for data re-arrangement from per-cell buffers to cell-group buffers
    - Optimized concurrent CUDA memory copy and CUDA kernel execution for different slots
  - Testing Framework
    - A new testbench for L2 stack integration of CUDA-based proportional-fairness metric computation and sorting module with latency measurements
    - Helps demonstrate benefits of L2 scheduler offloading to GPU
  - Performance Improvements
    - cuMAC-CP CPU-GPU memory data movement time decreased by 20%
    - cuMAC-CP standalone test peak cell capacity increased from 8 cells to 12 cells of 100MHz 4T4R
- cuMAC-Sch 64TR
  - UE grouping algorithms
    - Dynamic per-TTI UE grouping
    - Semi-persistent UE grouping
  - Beamforming
    - CUDA-based multi-cell regularized-zero-forcing beamforming module
  - SRS Optimization
    - SRS uplink transmit power control that is compliant with 3GPP 5G specs
  - Control Plane Enhancements
    - Expanded API parameters and data buffers to support advanced features of 64T64R MU-MIMO UE grouping, scheduling, and beamforming
  - A new testbench for the validation and testing of 64T64R MU-MIMO scheduler
Aerial E2E: System level / End-to-End validation
- 4T4R 100MHz (w/ CapGemini) – 20x cells
  - UL only: 213Mbps / DL only: 1.25Gbps
  - UL + DL: DL = 1.1Gbps and UL = 145Mbps
- 64T64R 100MHz (w/ CapGemini) – 1x cell
  - 2 UEs (2 Layers each) and achieved peak Tput (1.44 Gbps)
- 4T4R ARC (w/ OAI)
  - WNC O-RU Integrated
  - 4T4R SRS Validated in TestMAC
  - Support 55 UE(s) attached
  - Multi-L2 (to single L1) achieve 2 cells with Traffic
    - DL = 790Mbps and UL = 90Mbps per Cell
- Data Lakes
  - Real Time capability
  - Multi-cell (up to 4) IQ capture
Performance
- 20x100MHz 4T4R Peak cells, 4DL/2UL
- 6x100MHz 64T64R peak cells
  - 16DL/4UL layers with early-HARQ
  - 16DL/8UL layers without early-HARQ
Robustness and Resiliency
- E2E Resiliency
  - FH Network supports EVPN-MH
  - Dual PTP capability
  - GM holdover validated with SN5400 switch
- L1
  - Error handling on non-recoverable DOCA and NIC initialization
  - Error handling when L1 and L2 have mismatched cell_group_num and L2 scheduled more cells than expected
  - Checks if FAPI PDU is under L1 limits and corrupted eCPRI header
  - Fix error codes sent in ERROR indication to L2 when slot is dropped
  - Sends the appropriate Error indication for various RX failure cases
  - Improve error handling for invalid reconfigure

What’s New in 25-1

The following new features are available in release 25-1 for Aerial CUDA-Accelerated RAN:

Aerial cuPHY: CUDA accelerated inline PHY
- cuPHY 4TR
  - 20x100MHz 4TR Peak Cells on GH200
  - NN PUSCH Channel Estimate
- cuPHY 64TR
  - 3x100MHz 64TR Ave Cells w/ Mod Comp, on GH200
  - Reconfiguration of static beam weights to RU
- SRS Configuration
  - Extended number of SRS UEs for mMIMO
  - BFW calculation for SRS unallocated RBs
  - Support 4 SRS symbols on S-slot
- Operation/Redundancy/Resiliency
  - Dynamic OAM (Out-Of-Service) - configuration or modification of dl/ulBandwidth and eAxCID.
  - Logging on cloud platform
  - Version check for YAML configuration files
  - Enhanced cuBB_system_checks script to check versions and configurations required for cuBB test
  - Cooperative cancellation of GPU workload for PUSCH
  - Support for FH UL I/Q sample capture in case of CRC errors
Aerial cuMAC: CUDA accelerated MAC scheduler
- cuMAC-CP
  - Functional Interface for 4T4R L2
- cuMAC-Sch 4TR
  - 40x100MHz 4TR Ave Cells on GH200
  - Type 0 & 1, PF Parallel Riding Peaks.
  - UE Down Selection / TTI
  - PRB Allocation & Layer Selection
  - Link Adaptation (MCS OLLA) & AI - DRL-MCS
- cuMAC-Sch 64TR
  - 3x100MHz 64TR Ave Cells on GH200
  - UE Sorting & down Selection
  - MU-MIMO user grouping (PRB allocation & Layer Selection) – Type 1 & flexible layers per UE.
  - Link Adaptation (MCS OLLA).
- SRS Configuration
  - Wideband SRS, Aperiodic, non-inter cell
  - 40x100MHz 4TR Ave Cells
- Baseline Scheduler – on CPU
  - MU-MIMO Type 1 SU-MIMO PRB allocation for Anchor UE and PF-Based greedy MU-MIMO user grouping.
Aerial E2E: System level / End-to-End validation
- 4T4R 100MHz
  - 8 Peak Cells in E2E configuration (CN + RAN + UE-EM) validated in eCPRI setup.
  - Achieving aggregate DL throughput of 11.2Gbps and aggregate UL throughput of 1.68Gbps
  - AI- RAN: Validated 8 peak cell performance with MIG enabled
pyAerial: Python interface to Aerial cuPHY
- CuPy-based API, in addition to the existing Numpy-based API
  - Significantly reduce copies between GPU and host memory
  - Improve interoperability with other frameworks supporting the CUDA array interface (PyTorch, Numba, etc.)
- New configuration API for configuring pyAerial pipelines and components
- SRS transmitter and receiver pipelines
- SRS example notebook
- CRC encoding
Performance
- 1x100MHz 64T64R Peak cell / 3x100MHz 64T64R average cells
- 20x100MHz 4T4R Peak cells

What’s New in 24-3

The following new features are available in release 24-3 for Aerial CUDA-Accelerated RAN:

Aerial cuPHY: CUDA accelerated inline PHY
- Multi-cell support for mMImO (up to 3 cells)
- Scheduling DL in special slots
- Increase SRS slots in 4T4R and mMIMO
- SRS CS multiplexing for different UEs
- UL PUSCH channel estimation at PRG level
- RKHS channel estimation
Aerial E2E: System level / End-to-End validation
- Fronthaul Port Failover Validation (Active-Standby) of C/U/S-Planes
- Concluded Ch.8 Conformance testing with PRACH
- MIG validation of AI + RAN
Aerial Redundancy/Resiliency: CUDA accelerated RAN Redundancy/Resiliency features
- RU Health Monitor - actively detect FH connectivity issues with ORU and take corrective action
- Introduce L1 recovery period - If L1 is running late, drop FAPI messages for some time to allow L1 to recover
- nvIPC pcap acquisition improvements - Introduced capability to add filters (cell-id , msg-id level) to nvIPC pcap acquisition
- Backtrace output on console - Aerial prints backtrace on console in case of crash
Aerial cuMAC: CUDA accelerated MAC scheduler
- DRL MCS selection module
  - Pre-trained neural networks available under aerial_sdk/cuMAC/testVectors
  - Inference based on TensorRT
- 64TR MU-MIMO scheduler
  - UE sorting algorithm based on SRS SNR estimates
  - UE grouping algorithm based on SRS channel coefficient estimates
- Aperiodic SRS resource manager
  - Combined with MU-MIMO UE sorting algorithm
- 4T4R system simulation with GPU-based TDL channel model
- Improved algorithms & CUDA implementation for type-0 and type-1 4T4R schedulers
pyAerial: Python interface to Aerial cuPHY
- CSI-RS transmission pipeline
- RSRP and pre- and post-equalizer SINR estimation
- Carrier frequency offset and timing advance estimation
- CRC checking
- OFDM fading channel simulation
- Support of multiple UE groups for PUSCH receiver pipeline and its components
- An improved API to PUSCH receiver pipeline and its components

What’s New in 24-2.1

The following new features are available in release 24-2.1 for Aerial CUDA-Accelerated RAN:

Aerial cuPHY: CUDA accelerated inline PHY
- 64T64R Massive MIMO:
  - 100 MHz DL max combined 16 layers + UL max combined 8 layers + SRS
  - 64T64R SRS + Dynamic + Static Beamforming Weights
  - Support multiple dynamic UE groups
  - Support flexible PRG size and PRB number
  - Support SRS buffer indexing from L2
  - Support non 2^n layers
  - Use different section IDs when splitting the C-Plane section
  - FH messaging for CSIRS + PDSCH and other channel combinations
- Support GH200+BF3 as RU emulator platform

What’s New in 24-2

The following new features are available in release 24-2 for Aerial CUDA-Accelerated RAN:

Aerial cuPHY: CUDA accelerated inline PHY
- MGX Grace Hopper multicell capacity w/ telco-grade traffic model
  - 20 peak loaded 4T4R @ 100MHz
  - Capacity also validated with more challenging traffic model
    - PUSCH and PDCCH symbols in the S-slot
- L1-L2 interface enhancements
  - Separate FAPI request timelines for PDSCH and PDCCH
Aerial cuMAC: CUDA accelerated MAC scheduler
- cuMAC-Sch
  - 4T4R CUDA implementation complete
- cuMAC-CP
  - 4T4R implementation (Functional – early access)
cuBB/E2E: System level / End-to-End validation
- Over-The-Air (OTA) validation:
  - CBRS O-RU
  - 8 UE OTA w/ 6 UE/TTI for > 8 hours
- RedHat-OCP:
  - Multicell capacity validated on MGX (GH200+BF3)
- O-RAN Fronthaul:
  - 16-bit fixed point IQ sample validated E2E (Keysight eLSU)
  - Simultaneous dual-port FH capability (8 peak cells; 4 per port)
- L2 integration:
  - Multi-L2 container instances per L1 validated E2E
pyAerial: Python interface to Aerial cuPHY
- TensorRT inference engine
  - Jupyter notebook example using pyAerial to validate a neural PUSCH receiver
- LDPC API improvements
  - Added soft outputs to LDPC decoder
- LS channel estimation
- Limited support for Grace Hopper
  - Run pyAerial together with Data Lakes