Aerial CUDA-Accelerated RAN
Aerial CUDA-Accelerated RAN 24-2.1

Multicell Capacity

CPU core usage for multicell benchmark:

On Grace Hopper:

  • 1 isolated physical Grace core for core-locked PTP applications (phc2sys+ptp4l)

  • 10 additional isolated Grace cores for the other core-locked cuphycontroller threads

On x86 based targets:

Without hyperthreading using “1+6+fractional” x86 cores. The shorthand “1+6+fractional” x86 cores is defined as follows:

  • 1 isolated physical x86 core for core-locked PTP applications (phc2sys+ptp4l) and the core-locked cuphycontroller L2A H2D prepone thread

  • 6 aditional isolated physical x86 cores for the other core-locked cuphycontroller threads

  • A fraction of a shared floating x86 core for non-core-locked cuphycontroller threads

Additionally, the tested L2 timeline is as follows:

  • FAPI SLOT.indication for Slot N is sent from L1 to L2 at the wall-clock time for Slot N-3 (i.e. 3 slot advance).

  • For the DDDSUUDDDD TDD pattern with 0-based slot numbering, L2 has up-to-500us, for slot%10 in {2,3,4,5,6}, from SLOT.indication to deliver all FAPI PDUs for Slot N.

  • For the DDDSUUDDDD TDD pattern with 0-based slot numbering, L2 has up-to-250us, for slot%10 in {0,1,7,8,9}, from SLOT.indication to deliver all FAPI PDUs for Slot N.

As of 24-2:

Supports 500us L2 processing budget and 7 beam peak and average patterns as defined below using 100MHz:

On Grace Hopper:

  • BFP9: 20 4T4R Peak cells / 20 4T4R average cells

while respecting the following configuration for 7 beam traffic patterns:

TDD 4T4R - 80 Slot Traffic Models

7-beam config

Configuration (4 UL streams RU->DU)

Peak

Average

Compression BFP9 and BFP14 BFP9 and BFP14
Max PxSCH PRB 270 132
DL Throughput/cell 1469.14 Mbps 523.10 Mbps
UL Throughput/cell 212.64 Mbps 79.91 Mbps
Peak DL Fronthaul Bandwidth / cell 11.06 Gbps BFP14 5.46 Gbps BFP14
7.14 Gbps BFP9 3.58 Gbps BFP9
Peak UL Fronthaul Bandwidth / cell 11.88 Gbps BFP14 6.34 Gbps BFP14
8.03 Gbps BFP9 4.57 Gbps BFP9
SSB slots Frame 0 & 2: 0,1,2,3 Frame 0 & 2: 0,1,2,3
#SSB per slot Frame 0 & 2: 2,2,2,1 Frame 0 & 2: 2,2,2,1
TRS slots Frame 0-3: 6,7,8,9,10,11 Frame 0-3: 6,7,8,9,10,11
Frame 0 & 2: 16,17 Frame 0 & 2: 16,17
TRS Symbols Even cells: 6,10 Even cells: 6,10
Odd cells: 5,9 Odd cells: 5,9
CSI-RS slots Frame 0: 8,10,16 Frame 0: 8,10,16
Frame 1: 6,8,10 Frame 1: 6,8,10
Frame 2: 6 Frame 2: 6
CSI-RS Symbols Even cells: 12 Even cells: 12
Odd cells: 13 Odd cells: 13
PDSCH #DCI 12 (6 DL + 6 UL per slot) 12 (6 DL + 6 UL per slot)
UE/TTI/Cell 6 per DL slot, 6 per UL slot 6 per DL slot, 6 per UL slot
UCI on PUSCH HARQ+CSIP1+CSIP2 (bits) 4+37+5 4+37+5
PUCCH format 1 1
PUCCH payload (bits) 18 18
PRACH format B4 B4
PRACH slots Frame 0-3: 5, 15 Frame 0-3: 5, 15
PRACH occasions Slot 5: 4, Slot 15: 3 Slot 5: 4, Slot 15: 3

Notes:

  • Stated performance achievement and CPU core count usage is for L1 workload only (additional non-L1 workloads in E2E setting may have an impact on the achieved performance and/or CPU core count usage)

  • Performance achievement is measured by running L1 in steady-state traffic mode (e.g. impact of workloads such as cell reconfiguration on other cells is not captured)

Previous Supported Features and Configurations
Next Supported Test Vector Configurations
© Copyright 2024, NVIDIA. Last updated on Oct 3, 2024.