Performance

Aerial SDK 23-1

The following performance achievement is possible without hyperthreading using “1+6+fractional” x86 cores. The shorthand “1+6+fractional” x86 cores is defined as follows:

  • 1 isolated physical x86 core for core-locked PTP applications (phc2sys+ptp4l) and the core-locked cuphycontroller L2A H2D prepone thread

  • 6 aditional isolated physical x86 cores for the other core-locked cuphycontroller threads

  • A fraction of a shared floating x86 core for non-core-locked cuphycontroller threads

Additionally, the tested L2 timeline is as follows:

  • FAPI SLOT.indication for Slot N is sent from L1 to L2 at the wall-clock time for Slot N-3 (i.e. 3 slot advance).

  • For the DDDSUUDDDD TDD pattern with 0-based slot numbering, L2 has up-to-500us, for slot%10 in {2,3,4,5,6}, from SLOT.indication to deliver all FAPI PDUs for Slot N.

  • For the DDDSUUDDDD TDD pattern with 0-based slot numbering, L2 has up-to-250us, for slot%10 in {0,1,7,8,9}, from SLOT.indication to deliver all FAPI PDUs for Slot N.

Measured performance is:

  • On discrete A100/CX6 cards: 4 4T4R peak / 8 4T4R avg cells

  • On A100X: 6 4T4R peak / 12 4T4R avg cells

  • On A100X-Next: 8 4T4R peak / 16 4T4R avg cells

while respecting the following configuration:

E2E Summary is provided below:

  • 2 Peak Cells in E2E configuration (CN + RAN + UE-EM) via eCPRI connection to test equipment, (Achieving aggregate DL throughput of 2.86Gbps and UL throughput of 420Mbps)

  • 1 Peak Cell in E2E configuration (CN + RAN + UE-EM) via RF cabbeld connection to O-RU (Achieving DL throughput of 1.3Gbps and UL throughput of 100Mbps)

Previous Supported Features and Configurations
Next Supported Test Vector Configurations
© Copyright 2022-2023, NVIDIA.. Last updated on Apr 20, 2024.