Radar Processing Pipeline#

Radar Pipeline Architecture Overview#

The radar PVA pipeline architecture is designed to process raw radar data (ADC samples) and generate a point cloud. The pipeline architecture consists of 5 stages. The Range-fft, Doppler-fft and Peak Detection stages are implemented using fixed-point arithmetic, and NCI and DOA algorithms employ a hybrid approach, utilizing fixed-point arithmetic for computational efficiency while incorporating floating-point operations where the operations do not have native HW support.

../../_images/radar-pva-pipeline-architecture.png

Range FFT Processing -> Converts time-domain samples to range-frequency domain using windowed FFT. The PVA operator used for this stage is pvaRangeFFT.
Doppler FFT Processing -> Processes the range data to extract velocity of targets. The PVA operator used for this stage is pvaDopplerFFT.
NCI Processing -> Performs noise estimation and non-coherent integration to improve signal-to-noise ratio. The PVA operator used is pvaNci.
Peak Detection -> Performs local maximum detection in both range and doppler dimensions, followed by transmitter-specific peak identification and snapshot data extraction. The PVA operator is pvaPeakDetection.
DOA Processing -> Calculates the direction of arrivals of the radar signals. It uses input tensors including peak count, calibration vector, and range-doppler snapshots to calculate target information such as azimuth, elevation, range, velocity and 3D points. The output is a 2D tensor with shape of [7, DOA_MAX_TARGET_COUNT]. The PVA operator is pvaDOA.

Radar fixed-point pipeline sample application#

The Radar fixed-point pipeline sample application is a comprehensive, reference implementation for processing radar data through multiple computational stages using NVIDIA’s PVA (Programmable Vision Accelerator) platform.

Key Characteristics:

Fixed-Point Arithmetic: The entire pipeline operates using mixed fixed-point and floating-point data types (S32, 2S32, U32, F32) to ensure deterministic computation, reduce memory bandwidth, and optimize for PVA hardware acceleration
Multi-Stage Processing: Five sequential processing stages that progressively refine raw radar data into target detection and tracking information
Hardware Acceleration: Leverages NVIDIA PVA for parallel processing of computationally intensive operations like FFT, non-coherent integration, and peak detection
Reference Validation: Each stage includes CPU-based reference implementations for accuracy validation and regression testing
Batch Processing: Supports processing multiple radar data files in sequence with efficient resource reuse
Configurable Parameters: Flexible configuration for different radar configurations (transmit/receive channels, range bins, sampling rates)

Pipeline Data Flow:

The pipeline processes radar data through the following transformation sequence:

Raw ADC Data → Range FFT → Doppler FFT → NCI Processing → Peak Detection → DOA Processing → Target Information

Technical Implementation:

The implementation follows a modular, object-oriented design pattern with clear separation of concerns:

Resource Management: RAII-based tensor and operator lifecycle management with automatic cleanup
Memory Optimization: Uses PVA-optimized memory allocators for efficient data placement and access patterns. In production code, users can leverage NvSci and CUDA interop APIs to eliminate memory copies required for input and output data in the end-to-end pipeline.
Synchronization: PVA SDK based synchronization primitives for coordinating PVA and CPU operations
Error Handling: Comprehensive error checking with graceful degradation and resource cleanup
Validation Framework: Built-in reference implementations and tolerance-based output validation

Supported Radar Configurations:

Multi-Channel Processing: PVA_RADAR_RX_ANTENNA_COUNT (4) receive channels and PVA_RADAR_TX_ANTENNA_COUNT (4) transmit channels
Configurable Resolution: Adjustable range bins (Nb), sample count (sampleCount), and Doppler bins (chirpCount)
Doppler Fold Processing: Support for non-coherent integration across multiple pulse repetition intervals (PVA_RADAR_DOPPLER_FOLD_COUNT)
Peak Capacity: Handles up to PVA_RADAR_MAX_TARGET_COUNT (8192) simultaneous target detections
Target Properties: Outputs PVA_RADAR_NUM_TARGET_DETECTION_PROPERTIES (7) detection properties per target

Performance Features:

Pipeline Mapping: Efficiently map the pipeline stages to the PVA hardware resources to maximize throughput
Deterministic Timing: Fixed-point operations provide consistent execution times for real-time applications

Design Details#

The radar pipeline consists of the following main components:

RadarPipelineTensorRequirements: Holds tensor requirements without allocation (used for operator creation)
RadarPipelineTensors: Manages all tensor allocations for processing a single radar data file
RadarPipelineOperators: Manages PVA operator handles (created once, reused across multiple files)
RadarPipelinePVAWorkloadParams: Handles CUPVA synchronization objects and allocator

Fixed-Point Data Types and Precision:

The pipeline uses carefully selected fixed-point data types optimized for radar signal processing:

NVCV_DATA_TYPE_S32: 32-bit signed integers for input radar samples and window coefficients
NVCV_DATA_TYPE_2S32: Complex 32-bit signed integers (real + imaginary) for FFT operations
NVCV_DATA_TYPE_U32: 32-bit unsigned integers for NCI outputs and peak detection results
NVCV_DATA_TYPE_F32: 32-bit floating-point for final DOA calculations requiring high precision

Quantization and Q-Format:

The implementation uses Q-format fixed-point arithmetic with configurable precision:

Range FFT: Requires QBits = 20 for input tensor to maintain high precision for subsequent pipeline stages, e.g. Peak Detection, NCI and DOA processing
Peak Detection: Integer arithmetic for robust threshold comparison and peak identification
Doppler Fold Count: PVA_RADAR_DOPPLER_FOLD_COUNT defines non-coherent integration factor

Memory Layout and Tensor Organization:

Tensors are organized with optimal memory layouts for PVA processing:

HCW Layout: Height-Channel-Width organization for efficient parallel processing
Transpose: Uses transpose operations to optimize memory access patterns
Stride Optimization: Memory strides aligned to PVA requirements for maximum throughput
Buffer Reuse: Input tensors for subsequent stages point to output tensors of previous stages (e.g., inDopplerFFTTensorHandle = outRangeFFTTensorHandle)

Execution Model and Workload Management:

The pipeline implements a sophisticated execution model optimized for batch processing with operator reuse:

Phase 1: Operator Creation (Once per Configuration)

PVA operators created once using RadarPipelineTensorRequirements
Tensor requirements calculated without actual memory allocation
Operators shared across all data files for efficiency
Geometric parameters (GP) initialized with calibration data

Phase 2: Per-File Processing

RadarPipelineTensors created for each file with actual memory allocation
RAII (Resource Acquisition Is Initialization) ensures automatic cleanup
Input data loaded into tensors
All pipeline stages submitted sequentially to PVA
PVA SDK based synchronization ensures proper execution ordering

Phase 3: Reference Validation

CPU-based reference implementations execute after PVA processing
Tolerance-based comparison validates numerical accuracy

Multi-File Batch Processing:

The implementation supports efficient processing of multiple radar data files with operator reuse:

// Create operators once (shared across all files)
RadarPipelineOperators operators{};
RadarPipelineTensorRequirements tensorReqs(sampleCount, rxAntennaCount,
                                           chirpCount, dopplerFoldCount, txAntennaCount);
radar_create_pva_workloads(operators, tensorReqs, txAntennaCount, dopplerFoldCount);

// Process each file
for (const auto &file : files) {
    // Tensors created per file (RAII cleanup)
    RadarPipelineTensors tensors(allocatorHandle, sampleCount, rxAntennaCount,
                                 chirpCount, dopplerFoldCount, txAntennaCount);

    // Submit workloads and execute
    radar_submit_pva_workloads(operators, tensors, pvaWorkloadParams);
    radar_execute_ref_workloads(tensors, txAntennaCount, rxAntennaCount, dopplerFoldCount);
}  // Tensors automatically destroyed here

Pipeline Stages#

Range FFT Processing
- Purpose: Converts time-domain radar samples to range-frequency domain for target range estimation
- Algorithm: Windowed FFT with Hanning window (PVA_BATCH_FFT_WINDOW_HANNING) to minimize spectral leakage
- Input: Raw radar ADC samples [sampleCount=512][rxAntennaCount][chirpCount=512] as 32-bit signed integers
- Output: Complex range FFT data [chirpCount=512][rxAntennaCount][NbNci=224] as complex 32-bit signed integers
- Window Function: Pre-computed Hanning window coefficients applied to reduce sidelobe artifacts
- Validation: Reference implementation comparison with configurable tolerance (0.0f for exact match)
Doppler FFT Processing
- Purpose: Extracts target velocity information through coherent processing across pulse repetition intervals
- Algorithm: Windowed FFT with output transpose (transposeOutput = 1) for optimal memory layout
- Input: Range FFT output [chirpCount][rxAntennaCount][224] as complex 32-bit signed integers
- Output: Doppler FFT data [224][rxAntennaCount][chirpCount] as complex 32-bit signed integers (transposed)
- Window Function: Range-domain Hanning window to suppress range sidelobes in velocity processing
- Memory Optimization: Transpose operation optimizes subsequent NCI processing access patterns
NCI (Non Coherent Integration) Processing
- Purpose: Performs noise estimation and non-coherent integration to improve signal-to-noise ratio
- Algorithm: Multi-output non-coherent integration with configurable doppler fold processing
- Input: Doppler FFT output [224][rxAntennaCount][chirpCount] with 20-bit Q-format precision
- Processing Parameters:
  - dopplerFoldCount = PVA_RADAR_DOPPLER_FOLD_COUNT: Non-coherent integration factor for SNR improvement
- Outputs:
  - NCI RX: [Nb=224][chirpCount=512] - Receive-channel non-coherent integration results
  - NCI Final: [Nb=224][chirpCount/dopplerFoldCount=64] - Final non-coherent integration across doppler fold
  - Noise Estimate: [Nb=224] - Range-dependent noise floor estimation
Peak Detection
- Purpose: Identifies potential targets by detecting peaks above noise threshold
- Algorithm: Configurable threshold-based peak detection with spatial clustering and Doppler disambiguation
- Input: All NCI outputs (original input, RX, final, noise estimate)
- Processing Parameters:
  - txAntennaCount: Number of transmit channels for beamforming considerations
  - dopplerFoldCount: Coherent integration factor for detection threshold calculation
- Outputs:
  - Peak Count: [1] - Number of detected peaks (up to PVA_RADAR_MAX_TARGET_COUNT = 8192)
  - Peak Indices: [PVA_RADAR_PEAKDET_NUM_PEAK_INDICES][8192] - Range, Doppler, and receive channel indices for each peak
  - Peak Snap: [8192][16] - Local neighborhood data around each peak for sub-bin interpolation
- Capacity: Supports up to 8192 simultaneous target detections
DOA (Direction of Arrival) Processing
- Purpose: Calculates target position, velocity, and angular information using array processing
- Algorithm: FFT-based digital beamforming and dual-aperture interferometry with geometric parameter (GP) calibration
- Input Components:
  - Peak detection outputs (count, indices, snap data)
  - Calibration vector [1][PVA_RADAR_NUM_TOTAL_ANTENNA_ELEMENTS] - Complex antenna array calibration coefficients
  - NCI final data for amplitude/phase reference
  - Geometric parameters (PVARadarGP GP) for coordinate transformation
- Calibration: populateCalibVector(calibVector, 30) - 30-element calibration for array processing
- Output: DOA results [PVA_RADAR_NUM_TARGET_DETECTION_PROPERTIES][PVA_RADAR_MAX_TARGET_COUNT] containing:
  - Velocity: Target radial velocity (m/s)
  - Range: Target distance (m)
  - Azimuth: Horizontal angle (degrees)
  - Elevation: Vertical angle (degrees)
  - X, Y, Z: Cartesian coordinates (m)
- Validation: Tolerance-based comparison (2e-4f) between PVA and reference implementations

Classes and Components#

RadarPipelineTensorRequirements#

A C++ class that holds tensor requirements without actual memory allocation. Used for operator creation.

Constructor Parameters:

sampleCount (int32_t): Number of samples per chirp
rxAntennaCount (int32_t): Number of receive antenna channels
chirpCount (int32_t): Number of chirps per frame
dopplerFoldCount (int32_t): Doppler fold factor for non-coherent integration
txAntennaCount (int32_t): Number of transmit antenna channels

Key Components:

Tensor Requirements: Requirements for all pipeline tensors (shape, layout, data type)
NCI Output Requirements Vector: outNciTensorReqs - Pointer array for NCI output requirements
DOA Input Requirements Vector: inDOATensorReqs - Pointer array for DOA input requirements
Geometric Parameters: PVARadarGP GP - Radar system geometric calibration

RadarPipelineTensors#

A C++ class that manages all tensor allocations for processing a single radar data file. Uses RAII for automatic cleanup.

Constructor Parameters:

allocatorHandle (NVCVAllocatorHandle): PVA memory allocator
sampleCount (int32_t): Number of samples per chirp
rxAntennaCount (int32_t): Number of receive antenna channels
chirpCount (int32_t): Number of chirps per frame
dopplerFoldCount (int32_t): Doppler fold factor for non-coherent integration
txAntennaCount (int32_t): Number of transmit antenna channels

Key Methods:

void initRadarPipelineTensors();
void cleanupRadarPipelineTensors();
void createRadarPipelineTensors(NVCVAllocatorHandle allocatorHandle,
                                const int32_t sampleCount,
                                const int32_t rxAntennaCount,
                                const int32_t chirpCount,
                                const int32_t dopplerFoldCount,
                                const int32_t txAntennaCount);

Tensor Categories:

Range FFT Tensors: Input, window, output, and reference tensors
Doppler FFT Tensors: Input, window, output, and reference tensors
NCI Tensors: Input and multiple output tensors for different NCI stages
Peak Detection Tensors: Output tensors for peak count, indices, and snap data
DOA Tensors: Input calibration vectors and output result tensors
Geometric Parameters: PVARadarGP GP - Shared with requirements object

RadarPipelineOperators#

A C++ class that manages PVA operator handles. Created once and reused across multiple files.

Key Components:

Operator Handles: For each processing stage (Range FFT, Doppler FFT, NCI, Peak Detection, DOA)
Lifetime: Created once per radar configuration, destroyed at end of application

Key Methods:

void initRadarPipelineOperators();
void cleanupRadarPipelineOperators();

RadarPipelinePVAWorkloadParams#

A C++ class that handles CUPVA synchronization objects and allocator.

Key Components:

Allocator Handle: NVCVAllocatorHandle - PVA memory allocator for tensor allocation
CUPVA Objects: Synchronization objects (sync, fence, stream) for PVA execution

Key Methods:

void initRadarPipelinePVAWorkloadParams();
void cleanupRadarPipelinePVAWorkloadParams();

API Functions#

Workload Management Functions#