NVIDIA Docs Hub Homepage NVIDIA AI Aerial Aerial SDK 23-1 Performance

Performance

The following performance achievement is possible without hyperthreading using “1+6+fractional” x86 cores. The shorthand “1+6+fractional” x86 cores is defined as follows:

1 isolated physical x86 core for core-locked PTP applications (phc2sys+ptp4l) and the core-locked cuphycontroller L2A H2D prepone thread
6 aditional isolated physical x86 cores for the other core-locked cuphycontroller threads
A fraction of a shared floating x86 core for non-core-locked cuphycontroller threads

Additionally, the tested L2 timeline is as follows:

FAPI SLOT.indication for Slot N is sent from L1 to L2 at the wall-clock time for Slot N-3 (i.e. 3 slot advance).
For the DDDSUUDDDD TDD pattern with 0-based slot numbering, L2 has up-to-500us, for slot%10 in {2,3,4,5,6}, from SLOT.indication to deliver all FAPI PDUs for Slot N.
For the DDDSUUDDDD TDD pattern with 0-based slot numbering, L2 has up-to-250us, for slot%10 in {0,1,7,8,9}, from SLOT.indication to deliver all FAPI PDUs for Slot N.

Achievement

Measured performance is:

On discrete A100/CX6 cards: 4 4T4R peak / 8 4T4R avg cells
On A100X: 6 4T4R peak / 12 4T4R avg cells
On A100X-Next: 8 4T4R peak / 16 4T4R avg cells

while respecting the following configuration:

E2E Summary is provided below:

2 Peak Cells in E2E configuration (CN + RAN + UE-EM) via eCPRI connection to test equipment, (Achieving aggregate DL throughput of 2.86Gbps and UL throughput of 420Mbps)
1 Peak Cell in E2E configuration (CN + RAN + UE-EM) via RF cabbeld connection to O-RU (Achieving DL throughput of 1.3Gbps and UL throughput of 100Mbps)

Previous Supported Features and Configurations

Next Supported Test Vector Configurations