RadarBartlettBeamformerDoA (Direction of Arrival)#
Overview#
Bartlett beamforming (conventional beamforming) [1] is a Direction of Arrival (DOA) estimation technique for radar signal processing. Unlike FFT-based methods that require uniform half-wavelength antenna spacing, Bartlett beamforming uses pre-computed steering vectors to support arbitrary antenna array geometries. The algorithm computes a spatial power spectrum by correlating antenna snapshots with steering vectors across a grid of azimuth and elevation angles, where the peak indicates the target direction.
This operator performs DOA angle estimation using Bartlett beamforming and outputs target angles (azimuth, elevation, and optional peak power) and a target index map. The operator supports both cuPVA and CUDA stream submission. For complete target processing (velocity, range, Cartesian coordinates), chain the output with the standalone Target Processing operator.
Fundamental Principle#
Spatial Sampling Principle
Antenna Array: [Ant1] --d--> [Ant2] --d--> [Ant3] --d--> [Ant4]
↓ ↓ ↓ ↓
Received Signals: s₁(t) s₂(t) s₃(t) s₄(t)
Key Relationship:
θ (angle) → Δφ (phase shift) → Δt (time delay)
Δφ (Phase Shift): The phase difference that results from the time delay.
Δt (Time Delay): The time difference between when the signal arrives at adjacent antenna elements.
These relationships form the basis for angle estimation.
Time Delay Relationship:
- where:
\(d\) = element spacing
\(c\) = speed of light
Phase Shift Relationship:
- where:
\(λ\) = wavelength
This phase shift relationship is what the steering vectors encode. For each candidate angle, the steering vector contains the expected phase shifts at each antenna element. Bartlett beamforming finds the angle whose steering vector best matches the observed phase pattern in the received snapshot.
Mathematical Foundation#
The Bartlett beamformer computes a spatial power spectrum by projecting the received signal onto steering vectors for each candidate angle. The power spectrum is given by:
- where:
\(P(\theta, \phi)\) - power at azimuth \(\theta\) and elevation \(\phi\)
\(\mathbf{a}(\theta, \phi)\) - steering vector for the given angle
\(\mathbf{x}\) - complex snapshot vector from the antenna array
\((\cdot)^H\) - Hermitian (conjugate transpose)
Steering Vector Definition:
For a planar antenna array (z = 0), the steering vector encodes the expected phase shifts across antenna elements:
- where:
\(\theta\) - azimuth angle
\(\phi\) - elevation angle
\((x_n, y_n)\) - position of the \(n\)-th antenna element in wavelengths
Angle Estimation:
The estimated direction is obtained by finding the peak in the power spectrum:
Algorithm Description#
The operator performs DOA angle estimation via Bartlett beamforming.
1. Processing Inputs#
The operator requires the following inputs:
detectionCount, scalar count of valid detections at runtime [1], data type S32.snapshots, complex antenna samples for each detection [numDetections × numTx × numRx], data type 2S32 (Complex SQ11.20 fixed-point). The tensor is sized by maximum capacity; the actual runtime detection count is provided via thedetectionCounttensor.steeringVectors, pre-computed steering vectors [numAzimuthBins × numElevationBins × numTx × numRx], data type 2S16 (Complex Q15 fixed-point). Steering vectors can be computed usingpvaDoaBartlettBeamformingComputeSteeringVectors()utility or pre-calibrated steering vectors can be passed to the operator.azimuthBins,elevationBins, arrays of discrete angle values (in degrees) that define the search grid for DOA estimation, data type F32.detectionList, range and Doppler bin indices for each detection [numDetections × 2], data type S32. Each row contains [rangeIdx, dopplerIdx].ddmDopplerOffsets, DDM Doppler offsets for multi-TX radar [numDopplerFolds], data type F32. Currently only the first element (TX1 offset) is used.
2. Steering Vector Computation#
Steering vectors can be pre-computed using pvaDoaBartlettBeamformingComputeSteeringVectors() utility and reused across submissions, or pre-calibrated steering vectors can be passed directly to the operator. The utility function requires:
virtualArrayLocations, virtual antenna array positions in wavelengths [numTx × numRx], data type 2F32. Each element stores [x, y] position coordinates.azimuthBinsandelevationBins, the angle grid tensors (same as used for submission).
The function computes steering vectors as \(e^{-j \cdot 2\pi \cdot \text{phase}}\) where:
The resulting steering vectors are stored in Q15 fixed-point format and conjugated for efficient inner-product computation on the VPU.
3. DOA Angle Estimation#
For each detection, the algorithm performs:
Beamforming: Compute the power spectrum by correlating the snapshot with each steering vector:
For each candidate angle (azimuth, elevation), compute \(P(\theta,\phi) = |\mathbf{a}^H(\theta,\phi) \cdot \mathbf{x}|^2\)
This measures how well the received phase pattern matches the expected pattern for that angle
Peak Search: Find the angle bin with maximum power in the 2D spectrum
Interpolation (optional): Apply quadratic interpolation using neighboring bins to achieve sub-bin accuracy (see Angle Peak Interpolation)
4. Processing Outputs#
targetCount, number of valid targets [1], data type S32. Equals the runtime detection count, clamped to the target capacity.targetAngles, estimated angles per target [H × maxTargets], data type F32:Row 0: Azimuth angle (degrees)
Row 1: Elevation angle (degrees)
Row 2: Peak power in dB (optional, when
enablePowerOutputis true)
H = 2 when
enablePowerOutputis false, H = 3 when true.targetIndexMap, maps each target to its corresponding detection index [maxTargets], data type S32.
Note
To obtain complete target information (velocity, range, Cartesian coordinates), chain the outputs of this operator with the standalone Target Processing operator.
Angle Peak Interpolation#
When enableInterpolation is true, quadratic interpolation is applied to the 2D power spectrum to achieve sub-bin angle accuracy. Using the three neighboring power values around the peak in each dimension:
where \(Y_0, Y_1, Y_2\) are the power values at the previous, peak, and next bins. The fractional offset \(\delta\) is added to the peak bin index before mapping to the physical angle. Azimuth and elevation offsets are computed simultaneously using double-vector operations.
Note
Quadratic interpolation (enableInterpolation = true) is strongly recommended for coarse bin spacing to achieve sub-degree accuracy. Without interpolation, angle accuracy degrades significantly.
Peak Power Computation#
When enablePowerOutput is true, the peak beamforming power is converted to dB and included in the target angles output (row 2):
where \(P_{\text{raw}}\) is the fixed-point beamforming output and the normalization factor \(2^{46} = 2^{2 \times 23}\) (where 23 = BEAMFORMING_QBITS) accounts for the SQ11.20 × Q15 multiplication with 12-bit shift.
Separable Bartlett Mode#
Overview#
When separableParams.enableSeparable is true, the operator decomposes the full 2D azimuth–elevation search into two sequential 1D scans, reducing computational complexity from \(O(N_{\text{az}} \times N_{\text{el}})\) to \(O(N_{\text{az}} + (2N+1) \times N_{\text{el}})\) where \(N\) is the azimuth neighborhood half-width.
Algorithm#
Step 1 — Azimuth Scan: Beamform over all azimuth bins at the elevation row closest to 0° (pre-computed as
elZeroIdx). Find the coarse azimuth peak index. An optional channel mask (azChannelMask) allows restricting which virtual channels participate in this step.Step 2 — Elevation Scan: Beamform over all elevation bins using a small azimuth neighborhood (±N bins) centered on the Step 1 peak. The joint peak of this local 2D patch yields the refined DOA estimate.
Angle Extraction: The final azimuth and elevation estimates depend on the
azSourceparameter:PVA_BARTLETT_AZ_FROM_NEIGHBORHOOD(default): Both azimuth and elevation come from the Step 2 joint peak.PVA_BARTLETT_AZ_FROM_EL0: Azimuth comes from Step 1 (at el ≈ 0°); only elevation is taken from Step 2. This can give better azimuth accuracy for targets near the horizon but may degrade at steep elevation angles.
Parameters#
The separable mode is configured via PVASeparableBartlettParams:
enableSeparable(bool, default false): Enable the two-step separable search.azNeighborhood(PVABartlettAzNeighborhood): Half-width of the azimuth neighborhood used in Step 2:PVA_BARTLETT_AZ_NEIGHBORHOOD_0(0): Peak bin only (1 az bin). Complexity ≈ \(N_{\text{az}} + N_{\text{el}}\).PVA_BARTLETT_AZ_NEIGHBORHOOD_1(1): Peak ± 1 bin (3 az bins). Complexity ≈ \(N_{\text{az}} + 3 \times N_{\text{el}}\).PVA_BARTLETT_AZ_NEIGHBORHOOD_2(2): Peak ± 2 bins (5 az bins). Complexity ≈ \(N_{\text{az}} + 5 \times N_{\text{el}}\).
azSource(PVABartlettAzSource): Controls which step supplies the final azimuth estimate (see Algorithm above).azChannelMask(uint64_t, default 0): Bitmask over virtual channels for Step 1. When 0 (default), all channels participate. When non-zero, bit i = 1 keeps virtual channel i; bit i = 0 zeroes it out. Step 2 always uses all channels regardless of this mask. Example:0xFFretains only VCs 0–7 in a 64-channel array.
Separable Mode Constraints#
numAzimuthBins≤ 256numElevationBins≤ 255(2 × N + 1) × numElevationBins≤ 4096 (Step 2 power spectrum must fit in VMEM)numAzimuthBins≥2 × N + 1(enough bins to form the neighborhood)
Configuration Parameters#
PVADoaBartlettBeamformingParams#
enableInterpolation(bool, default true): Enable quadratic interpolation for sub-bin angle accuracy.enablePowerOutput(bool, default true): Include peak power (dB) as row 2 in the target angles output. When true, the target angles tensor must have H ≥ 3.separableParams(PVASeparableBartlettParams): Separable mode configuration (see above).
Design Requirements#
Maximum number of detections/targets: 8192.
Maximum virtual channels: numTxAntennas × numRxAntennas ≤ 64.
Number of virtual channels must be divisible by 8.
Full 2D mode: numAzimuthBins × numElevationBins ≤ 2048 (each dimension ≤ 2048).
Separable mode: numAzimuthBins ≤ 256, numElevationBins ≤ 255, (2 × N + 1) × numElevationBins ≤ 4096, numAzimuthBins ≥ 2 × N + 1 (see Separable Mode Constraints).
Fixed-Point Arithmetic#
The beamforming pipeline uses fixed-point arithmetic to maximize VPU throughput:
Data |
Format |
Description |
|---|---|---|
Steering vectors |
Q15 (16-bit) |
Complex values, stored conjugated for efficient inner-product |
Snapshots |
SQ11.20 (32-bit) |
Complex fixed-point from range-Doppler processing |
Beamformer accumulator |
48-bit |
Virtual channel accumulation before reduction |
Beamformer output |
SQ18.23 |
After 12-bit right shift ( |
Power spectrum |
float |
Computed as real² + imag² after accumulation |
Implementation Details#
Full 2D Mode#
Dataflow Configuration:
Use CmdMemcpy to copy steering vectors from DRAM to L2 SRAM (one-time transfer before processing).
Use 1 SQDF (SequenceDataflow) to transfer the detection count from DRAM to VMEM.
Use 1 SQDF (SequenceDataflow) to transfer the snapshot data from DRAM to VMEM.
Use 1 SQDF (SequenceDataflow) to stream steering vectors from L2 SRAM to VMEM in batches.
Use 1 SQDF (SequenceDataflow) to transfer the output target angles (azimuth, elevation, optional power) from VMEM to DRAM.
Use 1 SQDF (SequenceDataflow) to transfer the target count and target index map from VMEM to DRAM.
Steering Vector Caching:
Steering vectors use a two-level caching strategy: first copied from DRAM to L2 SRAM before processing begins, then streamed from L2 SRAM to VMEM in batches of 128 angles during processing. L2 allocation: numVC × 2048 × 4 bytes. This reduces DRAM bandwidth and allows steering vector reuse across multiple detections.
Beamforming Kernel:
The beamforming kernel processes 4 angle bins simultaneously. For each angle bin, it loads snapshot and steering vector pairs (8 complex values per vector load), performs complex multiply-accumulate with 12-bit shift for overflow prevention, and reduces the accumulated result to power (real² + imag²). The power spectrum (up to 2048 floats) is stored in VMEM.
Peak Search:
Vectorized peak search using vmaxrid_s finds the maximum power bin in the 2D spectrum. A scalar loop handles any remainder elements.
Separable Mode#
Dataflow Configuration:
Steering vectors stream directly from DRAM (no L2 copy). This avoids the L2 size limitation of full 2D mode.
Use 1 SQDF for Step 1 steering vectors (azimuth scan at el ≈ 0°), pre-loaded up to 256 azimuths.
Use 1 SQDF for Step 2 steering vectors (elevation scan in azimuth neighborhood), streamed in batches of 128.
VMEM power spectrum layout:
azSpectrum(up to 256 floats) +localSpectrum((2N+1) × numEl floats), total ≤ 4096 floats.
Step 1 — Azimuth Scan:
Pre-loads all azimuth steering vectors (at the elevation row closest to 0°) into VMEM via strided DMA. Beamforms across all azimuth bins using a chunked running-max approach (no full azimuth spectrum storage needed). Optional channel masking is applied by zeroing masked channels in the snapshot before the inner product.
Step 2 — Elevation Scan:
Streams steering vectors for the azimuth neighborhood × all elevations from DRAM in batches of 128 angles. Computes the local power spectrum and performs joint peak search over the (2N+1) × numEl patch.
Angle Extraction:
Peak indices in the local patch are mapped back to global azimuth/elevation bin indices. When azSource == PVA_BARTLETT_AZ_FROM_EL0, the Step 1 azimuth is used directly and only elevation comes from Step 2. Quadratic interpolation is applied in 1D or 2D depending on the azSource setting.
Output Transfer#
After DOA estimation completes, the target angles, target index map, and target count are transferred from VMEM to DRAM via SQDF. These outputs can then be chained with the standalone Target Processing operator for velocity, range, and coordinate computation.
Performance#
The operator performance scales linearly with:
Number of detections
Number of angle bins (azimuth × elevation for full 2D, or azimuth + neighborhood × elevation for separable)
Number of virtual channels
The implementation is compute-bound for large angle grids. Key optimizations include L2 caching of steering vectors (full 2D mode), 4-way angle unrolling, and double-buffered DMA. Separable mode provides significant speedup for large angle grids by reducing the search space.
Execution Time is the average time required to execute the operator on a single VPU core.
Note that each PVA contains two VPU cores, which can operate in parallel to process two streams simultaneously, or reduce execution time by approximately half by splitting the workload between the two cores.
Total Power represents the average total power consumed by the module when the operator is executed concurrently on both VPU cores.
Idle power is approximately 7W when the PVA is not processing data.
For detailed information on interpreting the performance table below and understanding the benchmarking setup, see Performance Benchmark.
Detections |
Tx |
Rx |
AzBins |
ElBins |
Mode |
Interpolation |
Execution Time |
Submit Latency |
Total Power |
|---|---|---|---|---|---|---|---|---|---|
100 |
2 |
4 |
81 |
8 |
Sep_N2 |
Enabled |
0.265ms |
0.035ms |
NoneW |
100 |
2 |
4 |
81 |
8 |
Full2D |
Enabled |
0.536ms |
0.099ms |
9.849W |
100 |
4 |
4 |
81 |
8 |
Sep_N2 |
Enabled |
0.306ms |
0.038ms |
10.072W |
100 |
4 |
4 |
128 |
16 |
Sep_N2 |
Enabled |
0.371ms |
0.037ms |
NoneW |
100 |
4 |
4 |
64 |
8 |
Full2D |
Enabled |
0.582ms |
0.099ms |
NoneW |
100 |
4 |
4 |
81 |
8 |
Full2D |
Enabled |
0.718ms |
0.100ms |
10.232W |
100 |
4 |
4 |
128 |
8 |
Full2D |
Enabled |
1.031ms |
0.100ms |
NoneW |
100 |
4 |
4 |
128 |
16 |
Full2D |
Enabled |
1.927ms |
0.099ms |
NoneW |
100 |
8 |
8 |
81 |
8 |
Sep_N2_Mask |
Enabled |
0.647ms |
0.036ms |
NoneW |
100 |
8 |
8 |
81 |
8 |
Sep_N2 |
Enabled |
0.687ms |
0.037ms |
NoneW |
100 |
8 |
8 |
128 |
16 |
Sep_N2 |
Enabled |
0.941ms |
0.039ms |
10.072W |
100 |
8 |
8 |
64 |
8 |
Full2D |
Enabled |
1.667ms |
0.100ms |
NoneW |
100 |
8 |
8 |
81 |
8 |
Full2D |
Enabled |
2.070ms |
0.098ms |
9.748W |
100 |
8 |
8 |
128 |
8 |
Full2D |
Enabled |
3.153ms |
0.100ms |
NoneW |
100 |
8 |
8 |
128 |
16 |
Full2D |
Enabled |
6.156ms |
0.100ms |
9.547W |
1000 |
2 |
4 |
161 |
51 |
Sep_N2 |
Enabled |
3.866ms |
0.041ms |
NoneW |
1000 |
2 |
4 |
81 |
8 |
Full2D |
Enabled |
4.515ms |
0.101ms |
NoneW |
1000 |
4 |
4 |
81 |
8 |
Sep_N0 |
Enabled |
2.168ms |
0.039ms |
NoneW |
1000 |
4 |
4 |
161 |
51 |
Sep_N0 |
Enabled |
3.197ms |
0.040ms |
NoneW |
1000 |
4 |
4 |
161 |
51 |
Sep_N2 |
Enabled |
4.935ms |
0.041ms |
NoneW |
1000 |
4 |
4 |
81 |
8 |
Full2D |
Enabled |
6.360ms |
0.100ms |
9.648W |
1000 |
4 |
4 |
161 |
51 |
Sep_N0 |
Disabled |
3.197ms |
0.040ms |
NoneW |
1000 |
4 |
4 |
161 |
51 |
Sep_N2 |
Disabled |
4.934ms |
0.041ms |
NoneW |
1000 |
8 |
8 |
161 |
51 |
Sep_N2 |
Enabled |
14.636ms |
0.042ms |
NoneW |
1000 |
8 |
8 |
81 |
8 |
Full2D |
Enabled |
19.926ms |
0.101ms |
NoneW |
1000 |
8 |
8 |
81 |
8 |
Sep_N0 |
Enabled |
5.123ms |
0.043ms |
10.756W |
1000 |
8 |
8 |
81 |
8 |
Sep_N2 |
Enabled |
6.034ms |
0.042ms |
10.656W |
1000 |
8 |
8 |
161 |
51 |
Sep_N0 |
Enabled |
8.759ms |
0.043ms |
NoneW |
1000 |
8 |
8 |
161 |
51 |
Sep_N2 |
Disabled |
14.634ms |
0.041ms |
NoneW |
1000 |
8 |
8 |
81 |
8 |
Full2D |
Disabled |
19.925ms |
0.100ms |
NoneW |
1000 |
8 |
8 |
81 |
8 |
Sep_N0 |
Disabled |
5.121ms |
0.040ms |
10.857W |
1000 |
8 |
8 |
161 |
51 |
Sep_N0 |
Disabled |
8.758ms |
0.042ms |
NoneW |
Reference#
M. S. Bartlett, “Smoothing periodograms from time-series with continuous spectra”, Nature, vol. 161, pp. 686-687, 1948.