RadarDopplerFFT#

Overview#

Doppler FFT (Fast Fourier Transform) is a critical signal processing step in radar systems that follows Range FFT in the radar processing pipeline. It transforms range-processed data from the slow time domain into the frequency domain to extract target velocity (Doppler) information. By performing a 1D FFT along the slow time (chirp) dimension for each antenna channel and range bin, the module converts range profiles into range-Doppler maps, where each frequency bin corresponds to a specific radial velocity. The output of Doppler FFT enables velocity estimation and is used as input for subsequent steps such as peak detection and angle estimation.

Windowing is essential for minimizing spectral leakage in Doppler FFT processing, similar to Range FFT. This module supports multiple window types (e.g., Hanning, Hamming) with different coefficients defined. Users can select or configure the window function according to their application requirements to achieve optimal velocity resolution and sidelobe suppression.

Algorithm Description#

The Doppler FFT operator performs a 1D FFT along the slow time (chirp) dimension for each antenna channel and range bin. The input is a 3D tensor of shape (chirps x channels x range bins) from the Range FFT output. If transpose output mode is disabled, the output maintains the same 3D tensor structure but transforms the chirp dimension into Doppler (velocity) bins, resulting in a range-Doppler map of shape (Doppler bins x channels x range bins). If transpose output mode is enabled, the output will transpose the range bins dimension with the Doppler bins dimension, resulting in a range-Doppler map of shape (range bins x channels x Doppler bins). To achieve optimal performance, the implementation uses a mixed radix-2, radix-3, radix-4, and radix-5 FFT algorithms. The number of chirps must be in the range [2, 1024) and must be factorizable using only factors of 2, 3, 4, and 5.

Implementation Details#

Parameters#

  • Input tensor with shape (chirps x channels x range bins) and data type complex int32.

  • Output tensor with shape (Doppler bins x channels x range bins) if transpose output mode is disabled or (range bins x channels x Doppler bins) if transpose output mode is enabled and data type complex int32.

Dataflow Configuration#

  • Use 1 SQDF (SequenceDataflow) to transfer the twiddle factors and the coefficients of the window function from DRAM to VMEM.

  • Use 1 input and 1 output SQDFs to split the whole input/output tensor into tiles and transfer them between DRAM and VMEM one by one.

    • The width of the input tile equals to 8 batches per tile and the height equals to the number of chirps.

    • If transpose output mode is disabled, the width of the output tile equals to 8 batches per tile and the height equals to the number of Doppler bins.

    • If transpose output mode is enabled, the width of the output tile equals to the number of Doppler bins and the height equals to 8 batches per tile.

VMEM Buffer Allocation#

  • 1 buffer with data type int32 to store the twiddle factors and 1 buffer with data type int32 to store the coefficients of the window function.

  • 1 input with data type int32 and double buffering to store the data of each tiles reading from the input tensor.

  • 1 output with data type int32 and double buffering to store the data of each tiles writing to the output tensor.

  • 1 temporary buffer with data type int32 to store the data for transpose operation.

Workflow Implementation#

The kernels of the Doppler FFT operator are implemented with high performance fixed-point vectorized instructions on PVA with the following steps:

  1. The windowing function.

  2. \(ceil(log_4(N))-1\) stages of the fft_batched_radix4 function with radix-4 operation.

  3. The digit_reverse_transpose function with radix-4 operation. If \(N\) is not a power of 4, the digit_reverse_transpose function will execute radix-2 operation.

    • If transpose output mode is enabled, the digit_reverse_transpose function will transpose the data layout in the output tile.

Agen Configuration#

The agen configurations are set according to the radix-4/radix-2 operation in each stage. The agen configurations in the last radix stage are set to implement the digit reversal addressing with zero overhead. One problem is that the overflow may happen during the accumulation of the butterfly calculation in the radix-4/radix-2 operation. The solution is to set the rounding parameter of the store agen in each stage to 2-bit for radix-4 and 1-bit for radix-2. The rounding parameter should also account for the quantization bits (qbits) of the twiddle factors when they are used in the current stage. When storing the output data at the end of each stage, the rounding operation prevents accumulation overflow in subsequent stages.

VPU Function Implementation#

The fft_batched_radix4 function uses the following instructions to implement the radix-4 operation:

  • vaddsub4x2_0 and vaddsub4x2_1 are used to accelerate the butterfly calculation in the radix-4 operation.

  • dvcmulw_t16 is used to accelerate the complex multiplication with the twiddle factors.

To achieve the best performance, we manually allocate the vector registers and unroll the loop body in the fft_batched_radix4 function.

Performance#

Execution Time is the average time required to execute the operator on a single VPU core. Note that each PVA contains two VPU cores, which can operate in parallel to process two streams simultaneously, or reduce execution time by approximately half by splitting the workload between the two cores.

Total Power represents the average total power consumed by the module when the operator is executed concurrently on both VPU cores. Idle power is approximately 7W when the PVA is not processing data.

For detailed information on interpreting the performance table below and understanding the benchmarking setup, see Performance Benchmark.

RangeBinCount

RxAntennaCount

ChirpCount

OutputLayout

InputOutputDataType

Execution Time

Submit Latency

Total Power

8

4

512

DopplerRxRange

2S32

0.036ms

0.020ms

12.539W

24

4

512

DopplerRxRange

2S32

0.059ms

0.022ms

14.542W

25

4

512

DopplerRxRange

2S32

0.102ms

0.022ms

15.766W

256

4

512

DopplerRxRange

2S32

0.611ms

0.031ms

12.648W

257

1

512

DopplerRxRange

2S32

0.221ms

0.024ms

14.662W

257

1

512

RangeRxDoppler

2S32

0.134ms

0.025ms

15.849W

257

1

729

DopplerRxRange

2S32

0.574ms

0.028ms

16.146W

257

1

729

RangeRxDoppler

2S32

0.571ms

0.027ms

14.531W

257

1

400

DopplerRxRange

2S32

0.222ms

0.024ms

16.153W

257

1

640

DopplerRxRange

2S32

0.279ms

0.026ms

15.459W

257

1

600

DopplerRxRange

2S32

0.381ms

0.026ms

16.95W

257

1

625

DopplerRxRange

2S32

0.533ms

0.026ms

15.943W

257

1

1000

DopplerRxRange

2S32

0.675ms

0.029ms

17.248W

257

1

400

RangeRxDoppler

2S32

0.220ms

0.023ms

15.436W

257

1

640

RangeRxDoppler

2S32

0.242ms

0.027ms

16.842W

257

1

600

RangeRxDoppler

2S32

0.378ms

0.026ms

14.831W

257

1

625

RangeRxDoppler

2S32

0.531ms

0.027ms

14.031W

257

1

1000

RangeRxDoppler

2S32

0.671ms

0.029ms

15.133W

257

3

512

DopplerRxRange

2S32

0.603ms

0.031ms

14.763W

257

3

512

RangeRxDoppler

2S32

0.377ms

0.032ms

15.451W

257

4

512

DopplerRxRange

2S32

0.788ms

0.032ms

14.763W

257

4

512

RangeRxDoppler

2S32

0.495ms

0.033ms

15.451W

257

4

729

DopplerRxRange

2S32

2.217ms

0.033ms

16.248W

257

4

729

RangeRxDoppler

2S32

2.214ms

0.033ms

14.633W

257

4

400

DopplerRxRange

2S32

0.791ms

0.028ms

16.755W

257

4

640

DopplerRxRange

2S32

0.992ms

0.033ms

15.256W

257

4

600

DopplerRxRange

2S32

1.417ms

0.031ms

16.451W

257

4

625

DopplerRxRange

2S32

2.043ms

0.034ms

16.444W

257

4

1000

DopplerRxRange

2S32

2.586ms

0.034ms

15.748W

257

4

400

RangeRxDoppler

2S32

0.788ms

0.030ms

16.038W

257

4

640

RangeRxDoppler

2S32

0.872ms

0.033ms

17.045W

257

4

600

RangeRxDoppler

2S32

1.414ms

0.033ms

15.336W

257

4

625

RangeRxDoppler

2S32

2.040ms

0.033ms

14.132W

257

4

1000

RangeRxDoppler

2S32

2.582ms

0.033ms

15.234W

257

8

512

DopplerRxRange

2S32

1.616ms

0.032ms

13.357W

257

8

512

RangeRxDoppler

2S32

1.019ms

0.033ms

15.349W

257

8

512

RangeDopplerRx

2S32

1.289ms

0.036ms

14.647W

257

8

729

DopplerRxRange

2S32

4.410ms

0.034ms

15.748W

257

8

729

RangeRxDoppler

2S32

4.406ms

0.032ms

14.633W

257

8

729

RangeDopplerRx

2S32

5.497ms

0.038ms

14.038W

257

8

900

RangeDopplerRx

2S32

6.102ms

0.038ms

15.038W

257

8

400

DopplerRxRange

2S32

1.548ms

0.030ms

15.053W

257

8

640

DopplerRxRange

2S32

2.194ms

0.032ms

14.154W

257

8

600

DopplerRxRange

2S32

2.799ms

0.032ms

15.349W

257

8

625

DopplerRxRange

2S32

4.056ms

0.033ms

16.146W

257

8

1000

DopplerRxRange

2S32

5.136ms

0.033ms

15.248W

257

8

400

RangeRxDoppler

2S32

1.545ms

0.033ms

16.038W

257

8

640

RangeRxDoppler

2S32

1.712ms

0.032ms

16.443W

257

8

600

RangeRxDoppler

2S32

2.796ms

0.035ms

15.437W

257

8

625

RangeRxDoppler

2S32

4.054ms

0.034ms

14.132W

257

8

1000

RangeRxDoppler

2S32

5.130ms

0.035ms

15.234W

257

8

400

RangeDopplerRx

2S32

1.771ms

0.035ms

15.038W

257

8

640

RangeDopplerRx

2S32

2.672ms

0.037ms

15.747W

257

8

600

RangeDopplerRx

2S32

3.689ms

0.038ms

15.14W

257

8

625

RangeDopplerRx

2S32

4.997ms

0.038ms

14.437W

513

1

512

DopplerRxRange

2S32

0.404ms

0.029ms

14.759W

513

1

512

RangeRxDoppler

2S32

0.241ms

0.028ms

15.552W

513

1

729

DopplerRxRange

2S32

1.105ms

0.031ms

16.146W

513

1

729

RangeRxDoppler

2S32

1.102ms

0.030ms

14.633W

513

1

400

DopplerRxRange

2S32

0.405ms

0.028ms

16.755W

513

1

640

DopplerRxRange

2S32

0.506ms

0.030ms

15.958W

513

1

600

DopplerRxRange

2S32

0.716ms

0.030ms

17.052W

513

1

625

DopplerRxRange

2S32

1.021ms

0.029ms

15.943W

513

1

1000

DopplerRxRange

2S32

1.293ms

0.032ms

17.349W

513

1

400

RangeRxDoppler

2S32

0.404ms

0.027ms

15.937W

513

1

640

RangeRxDoppler

2S32

0.446ms

0.029ms

17.44W

513

1

600

RangeRxDoppler

2S32

0.713ms

0.030ms

15.336W

513

1

625

RangeRxDoppler

2S32

1.019ms

0.029ms

14.132W

513

1

1000

RangeRxDoppler

2S32

1.289ms

0.033ms

15.133W

513

4

512

DopplerRxRange

2S32

1.554ms

0.031ms

13.256W

513

4

512

RangeRxDoppler

2S32

0.984ms

0.033ms

15.451W

513

4

729

DopplerRxRange

2S32

4.345ms

0.032ms

15.246W

513

4

729

RangeRxDoppler

2S32

4.339ms

0.034ms

14.633W

513

4

400

DopplerRxRange

2S32

1.526ms

0.031ms

14.553W

513

4

640

DopplerRxRange

2S32

2.219ms

0.033ms

13.553W

513

4

600

DopplerRxRange

2S32

2.758ms

0.033ms

14.846W

513

4

625

DopplerRxRange

2S32

3.996ms

0.033ms

16.045W

513

4

1000

DopplerRxRange

2S32

5.060ms

0.032ms

15.248W

513

4

400

RangeRxDoppler

2S32

1.523ms

0.034ms

16.035W

513

4

640

RangeRxDoppler

2S32

1.687ms

0.033ms

16.443W

513

4

600

RangeRxDoppler

2S32

2.754ms

0.034ms

15.336W

513

4

625

RangeRxDoppler

2S32

3.993ms

0.034ms

14.132W

513

4

1000

RangeRxDoppler

2S32

5.053ms

0.033ms

15.336W

513

8

512

DopplerRxRange

2S32

3.966ms

0.032ms

12.858W

513

8

512

RangeRxDoppler

2S32

1.967ms

0.032ms

13.85W

513

8

512

RangeDopplerRx

2S32

2.493ms

0.038ms

13.945W

513

8

729

DopplerRxRange

2S32

8.662ms

0.030ms

15.349W

513

8

729

RangeRxDoppler

2S32

8.657ms

0.032ms

14.633W

513

8

729

RangeDopplerRx

2S32

10.829ms

0.041ms

14.14W

513

8

900

RangeDopplerRx

2S32

12.044ms

0.044ms

14.639W

513

8

1000

DopplerRxRange

2S32

10.080ms

0.034ms

15.053W

513

8

400

DopplerRxRange

2S32

3.048ms

0.033ms

14.452W

513

8

640

DopplerRxRange

2S32

5.052ms

0.031ms

13.655W

513

8

600

DopplerRxRange

2S32

5.480ms

0.032ms

14.951W

513

8

625

DopplerRxRange

2S32

7.962ms

0.032ms

15.748W

513

8

1000

RangeRxDoppler

2S32

10.072ms

0.032ms

15.64W

513

8

400

RangeRxDoppler

2S32

3.014ms

0.034ms

16.14W

513

8

640

RangeRxDoppler

2S32

3.345ms

0.033ms

14.944W

513

8

600

RangeRxDoppler

2S32

5.476ms

0.034ms

15.437W

513

8

625

RangeRxDoppler

2S32

7.958ms

0.033ms

14.531W

513

8

400

RangeDopplerRx

2S32

3.455ms

0.036ms

15.436W

513

8

640

RangeDopplerRx

2S32

5.245ms

0.039ms

14.647W

513

8

600

RangeDopplerRx

2S32

7.238ms

0.040ms

15.14W

513

8

625

RangeDopplerRx

2S32

9.835ms

0.040ms

14.437W

References#