64 TR MU-MIMO Support with Static and Dynamic Beamforming#

This document describes beamforming in the SCF 5G FAPI layer for 64T64R MU-MIMO. Two modes are supported: static beamforming, where L2 provides a fixed set of beam weights at cell setup, and dynamic beamforming, where L1 computes weights per slot from SRS channel estimates. Both rely on correct Cell_Config TLVs, YAML configuration, and slot-level timing between L2 and L1. The following sections explain what L2 must configure, how each mode works end-to-end, and how SRS and BFW messages fit into the slot timeline.


Configuration (cuphycontroller_xxx.yaml)#

Several options in cuphycontroller_xxx.yaml control whether and how beamforming runs.

  • mMIMO_enable
    Must be set for 64T64R MU-MIMO. Both static and dynamic beamforming require this flag. When enabled, L1 uses the NUM_TX_PORT and NUM_RX_PORT from Cell_Config and runs the DL/UL beamforming paths; dynamic BFW and SRS-based weight computation also depend on it.

  • enable_srs
    Must be set for SRS channel estimation and reporting. When enabled, L1 allocates SRS channel estimate buffers (per TLV 0xA019 or default), processes SRS PDUs, and sends SRS.IND to L2. Dynamic beamforming requires SRS, so both mMIMO_enable and enable_srs must be set for dynamic BFW.

  • total_num_srs_chest_buffers
    Total number of SRS channel estimate buffers available to L2 across the system (e.g. currently maximum set to 6144). L1 pre-allocates this many buffers at bring-up. L2 reserves how many of these buffers to use per cell via TLV 0xA019 (NUM_SRS_CHEST_BUFFERS) in Cell_Config (up to 1024 per cell).

  • static_beam_id_start and static_beam_id_end
    Define the range of beam IDs that L2 may use for static beamforming (e.g. 1 and 16527). L1 only accepts and looks up static beam IDs within this range when building C-Plane ExtnType=11 from the DBT.

  • dynamic_beam_id_start and dynamic_beam_id_end
    Define the range of beam IDs that L1 may assign when it encodes dynamically computed weights in the C-Plane (e.g. 16528 and 32767). L1 assigns IDs from this range so they do not collide with static beam IDs.

SRS channel estimation and reporting (algorithm parameters):

  • srs_chest_algo_type (default: 0)
    Selects the SRS channel estimation algorithm used by L1. Supported values (from code): 0 = MMSE (SRS_CH_EST_ALGO_TYPE_MMSE), 1 = RKHS (SRS_CH_EST_ALGO_TYPE_RKHS). This affects the channel estimates that L1 stores and later uses for dynamic beamforming weight computation.

  • srs_chest_tol2_normalization_algo_type (default: 1)
    Controls how L1 normalizes the SRS channel estimates before sending them to L2 in SRS.IND. Supported values (from API/code): 0 = disabled (no normalization applied), 1 = constant scaler (each estimate is scaled by srs_chest_tol2_constant_scaler), 2 = auto (L1 chooses normalization automatically). Typical config uses 1 so that L2 receives channel estimates in a consistent scale.

  • srs_chest_tol2_constant_scaler (default: 32768.0)
    Used when srs_chest_tol2_normalization_algo_type is 1. L1 multiplies the channel estimate (e.g. I/Q) by this factor before putting it in SRS.IND. Type: float. 32768 (2^15) is a common choice for 16-bit fixed-point representation and keeps values in a range suitable for L2 and for later use in BFW computation.

  • bfw_power_normalization_alg_selector (default: 1)
    Selects the algorithm used to normalize the power of the beamforming weights that L1 computes for DL and UL dynamic BFW. The weights are scaled so that transmit/receive power constraints are met before they are encoded in the C-Plane (ExtnType=11). Supported values (from code): 0 = Frobenius-norm–based normalization (group reduce of coefficient power, then scale by reciprocal of Frobenius norm), 1 = per-layer / per-antenna scaling (reciprocal of layer scaling factors, then per-antenna energy computation). Value 1 is the default in YAML configs.

  • bfw_beta_prescaler (default: 16384)
    Beamforming beta prescaling factor used when L1 computes DL and UL dynamic BFW coefficients. Passed to the BFW computation as the beta parameter; 16384 (2^14) is the default and is commonly used for fixed-point scaling of the weight computation. Type: float.


Fronthaul (C-Plane BFW transmission)#

The following options in cuphycontroller_xxx.yaml control how BFW (beamforming weight) C-Plane messages are sent over the fronthaul. They affect timing, queue allocation, and how weight data is handed off to the C-Plane send path. The current Aerial SW tries to close the timing for ORAN IoT spec for Entry 1. These parameters must be carefully configured for multi-cell use cases (e.g. per-cell division of the BFW transmission window, dedicated BFW TX queues, and backoff thresholds) so that C-Plane deadlines are met across all cells.

Timing (backoff thresholds):

  • sendCPlane_ulbfw_backoff_th_ns (default: 300000)
    UL BFW processing must finish by the C-Plane send time minus this backoff, in nanoseconds. So C-Plane send time − sendCPlane_ulbfw_backoff_th_ns is the latest time by which ULBFW must be completed. Example: 300000 ns = 0.3 ms.

  • sendCPlane_dlbfw_backoff_th_ns (default: 300000)
    Same as above for DL BFW: DLBFW must be completed by C-Plane send time minus this backoff (ns). Example: 300000 ns = 0.3 ms.

Per-cell division of BFW C-Plane transmission window:

  • dlc_bfw_enable_divide_per_cell (default: 0)
    0 = disabled: DL BFW C-Plane transmission window is shared across cells in a slot. 1 = enabled: the DL BFW C-Plane transmission window is divided/spread per cell in the slot (e.g. to stagger or separate transmission per cell).

  • ulc_bfw_enable_divide_per_cell (default: 0)
    0 = disabled: UL BFW C-Plane transmission window is shared across cells. 1 = enabled: the UL BFW C-Plane transmission window is divided/spread per cell in the slot.

Dedicated BFW C-Plane TX queues:

  • dlc_alloc_cplane_bfw_txq (default: 0)
    0 = use shared C-Plane TX queue for DL. 1 = allocate a second, dedicated TX queue for DL BFW C-Plane packets.

  • ulc_alloc_cplane_bfw_txq (default: 1)
    0 = use shared C-Plane TX queue for UL. 1 = allocate a second, dedicated TX queue for UL BFW C-Plane packets. Many configs use 1 for UL.

BFW C-Plane chaining mode:

  • bfw_c_plane_chaining_mode (default: 2)
    How BFW weight data is passed into the C-Plane send path. 0 = default: memcpy of BFW weights into the C-Plane buffer. 1 = CPU mbuf chaining: BFW weights are referenced via CPU mbuf chaining. 2 = GPU mbuf chaining: BFW weights are referenced via GPU mbuf chaining (avoids extra copy from GPU). Value 2 is typical when weights reside in GPU memory.


Required TLVs during Cell_Config#

CONFIG.req must carry certain vendor TLVs so that L1 can support 64T64R and beamforming.

For dynamic beamforming and mMIMO:

  • TLV 0xA016 (NUM_TX_PORT, uint8_t)
    Tells L1 how many downlink baseband ports the PHY uses. This is required when mMIMO is enabled so that L1 can size and handle DL precoding correctly.

  • TLV 0xA017 (NUM_RX_PORT, uint8_t)
    Tells L1 how many uplink baseband ports the PHY uses. Required for mMIMO so that UL receive beamforming and SRS processing use the right port count.

  • TLV 0xA019 (NUM_SRS_CHEST_BUFFERS, uint32_t)
    Defines how many SRS channel estimate buffers are reserved for this cell. L1 uses these buffers to store per-UE channel estimates that are later used for dynamic BFW. L2 configures this per cell (up to 1024 buffers per cell; indexes 0–1023). The system-wide total is capped (e.g. 6144) via YAML. If this TLV is omitted when SRS is enabled, L1 uses a default per-cell count.

For static beamforming:

  • TLV 0xA010 (DIGITAL_BEAM_TABLE_PDU)
    Carries the Digital Beam Table (DBT) PDU: the set of predefined beam weights that L1 will use whenever L2 requests a static beam ID. The format is per FAPI 222.10.04 Table 3-61. L2 may send this PDU using the NVIPC buffer pool cpu_large (typically on the order of ~4 MB), which gives the flexibility to configure a large number of beam IDs in the DBT.


Static Beamforming#

Static beamforming uses a fixed set of beam weights provided by L2 at cell setup. L2 sends these weights once in a Digital Beam Table (DBT); L1 stores them and, on each slot, uses the beam IDs that L2 indicates in the channel PDUs to look up weights and fill the C-Plane.

Cell setup and DBT

At cell setup, L2 includes the DBT PDU in the FAPI Cell_Config_Req (TLV 0xA010). The DBT lists predefined weights for each beam ID. L1 receives this during Cell Setup and stores the table in memory, keyed by beam ID. To support a large number of beam IDs, L2 may send the DBT in an NVIPC buffer from the cpu_large pool (on the order of ~4 MB), which gives the flexibility to configure many beams in the DBT.

Which channels use static beamforming

In the downlink, static beamforming is supported for PBCH (SSB), PDCCH, PDSCH, and CSI-RS. For each of these, L2 sends the beam ID inside the Tx Precoding and Beamforming PDU of the corresponding channel PDU (SSB_PDU, PDCCH_PDU, PDSCH_PDU, CSI_RS PDU) in DL_TTI.Req. The PDU carries one beam ID per digBFInterface / layer / eAxC_Id as required by the spec. L1 decodes these IDs and looks up the matching weights from the DBT.

L1 then builds the C-Plane message for the ORAN interface. The beam IDs and their weights are sent in the ExtnType=11 section. If L2 sends a beam ID that is not in the DBT (e.g. a “predefined” or out-of-range ID), L1 still encodes the beam ID in the Section Header but does not attach weights for that beam; the RU may treat it as a predefined beam.

For static beamforming, only numPRGs = 1 is supported. The PRG size is the same as the RB allocation for that channel.

In the uplink, static beamforming is supported for PUCCH and PUSCH. L2 encodes the beam ID in UL_TTI.Req in the Tx Beamforming PDU (or, depending on channel, the Rx Beamforming PDU), again per digBFInterface / layers / eAxC_Id. RACH uses only static beam weights (no dynamic BFW). SRS does not use static beamforming; it has its own reporting path.

TLV and beam ID range

TLV 0xA010 (DIGITAL_BEAM_TABLE_PDU) is required in CONFIG.request when static beamforming is used. The DBT format follows FAPI 222.10.04 Table 3-61. The valid range of static beam IDs is configured in cuphycontroller_xxx.yaml via static_beam_id_start and static_beam_id_end (e.g. 1 and 16527). L2 should only use IDs in this range in the channel PDUs when expecting DBT-based weights in the C-Plane.


SRS Channel Estimation#

SRS (Sounding Reference Signal) is the uplink reference that L1 uses to estimate the channel per UE. Those estimates are then used for dynamic beamforming weight computation when L2 sends DLBFW or ULBFW CVI requests. So SRS is the bridge between “what the UE channel looks like” and “what weights L1 should apply” for that UE.

How L2 requests SRS reports

L2 requests SRS processing by sending UL_TTI.Req with an SRS PDU. Aerial SW supports the FAPI 10.04 definition of this PDU so that channel estimation results can be reported back to L2 in a standard way. The SRS PDU indicates which UEs and resources to sound; L1 runs channel estimation and fills the SRS channel estimate buffers.

Supported SRS usage

Supported SRS usage types are BEAM_MANAGEMENT, CODEBOOK, and NON_CODEBOOK. These cover the main use cases for UL scheduling and for feeding dynamic BFW. Antenna switching is not supported; L2 must not request it or SRS.IND may fail.

How L1 keeps channel estimates for dynamic BFW

L1 does not discard the SRS channel estimates after sending SRS.IND. It keeps a copy in GPU memory. When L2 later sends a DLBFW_CVI or ULBFW_CVI request for a given UE, L2 references the same SRS buffer index that was used for that UE’s SRS. L1 then uses the stored channel estimate for that buffer as input to the beamforming weight computation. So the lifecycle is: SRS PDU (with handle = buffer index) → L1 computes and stores channel estimate → SRS.IND (same handle) → later, BFW request (same handle) → L1 reads channel estimate and computes weights.

A fixed number of SRS channel estimate buffers are allocated at bring-up (or at cell config when SRS is enabled). That total is configured via cuphycontroller_xxx.yaml (e.g. up to 6144 across all cells). L2 decides how many of these to assign per cell using TLV 0xA019 (up to 1024 per cell, indexes 0–1023).

L2 buffer management rules

L2 must manage buffer indexes so that they are unique and not reused too early:

  • L1 pre-allocates total_num_srs_chest_buffers (YAML; max 6144 system-wide). Per cell, L2 sets the count with TLV 0xA019 (max 1024, indexes 0–1023).

  • L2 assigns a buffer index to each UE (or SRS instance) and must keep track of which index is in use until the corresponding SRS.IND has been received.

  • In the UL_TTI SRS PDU, L2 puts the buffer index in the handle field: bits 8–23 are the buffer index; bits 0–7 and 24–31 are reserved.

  • L1 returns the same buffer index in the handle field of the SRS.IND so L2 can match the report to the right UE and buffer.

  • L2 must not reuse a buffer index for a new SRS until it has received the SRS.INDICATION for the previous use of that index. If L2 sends a DLBFW or ULBFW CVI request that references a buffer index before the corresponding SRS.IND has been sent, L1 drops that BFW request and may report an error. So the order is: SRS PDU (handle = N) → … → SRS.IND (handle = N) → only then may L2 use handle N in a BFW request or reuse N for another SRS.


Dynamic Beamforming#

In dynamic beamforming, L1 computes the beamforming weights per slot from the SRS channel estimates. L2 tells L1 “for the next DL (or UL) slot, use the channel estimate in buffer X (and optionally others) to compute weights for these UEs.” L1 computes the weights, assigns a dynamic beam ID, and encodes both in the C-Plane (ExtnType=11). So L2 does not send beam IDs or weights for dynamic BFW; it only sends the SRS buffer index(es) and scheduling info.

BFW request messages

L2 uses two vendor-specific FAPI messages to request weight calculation:

  • SCF_FAPI_DL_BFW_CVI_REQUEST (Message ID 0x90) — for downlink (PDSCH). Often called DLBFW_CVI.request.

  • SCF_FAPI_UL_BFW_CVI_REQUEST (Message ID 0x91) — for uplink (PUSCH). Often called ULBFW_CVI.request.

The message structure is the same for both; the message ID and context (DL vs UL) tell L1 which direction and which slot to prepare. These messages are available when the build uses the SCF_FAPI_10_04 compile flag.

Timing: one slot before the data

L2 must send DL_BFW_CVI.Req and UL_BFW_CVI.Req one slot before the slot where the corresponding PDSCH or PUSCH is scheduled. For example, for PDSCH in slot [N+1, 0], L2 sends DLBFW CVI.Req in slot [N, 19]. For PUSCH in slot [N+1, 4], L2 sends ULBFW CVI.Req in slot [N+1, 3]. This lead time gives L1 time to compute the weights and have them ready when building the C-Plane for the data slot. The weights are then encoded in the ExtnType=11 message for that slot.

SRS buffer and ordering

In each DLBFW/ULBFW CVI request, L2 identifies which SRS channel estimate(s) to use by putting the SRS buffer index in the handle field (same encoding as in the SRS PDU). L1 uses that buffer’s channel estimate as input to the weight computation. If L2 references a buffer index that has not yet been filled (i.e. SRS.INDICATION for that index has not been sent), L1 drops that BFW request. So L2 must ensure SRS has been requested, SRS.IND has been received for the buffer, and only then send the BFW request for the next slot.

C-Plane and dynamic beam IDs

The weights computed by L1 are sent to the RU in the C-Plane ExtnType=11 section. L1 assigns the beam ID for these dynamic weights from the configured dynamic range (dynamic_beam_id_start to dynamic_beam_id_end in cuphycontroller_xxx.yaml, e.g. 16528–32767). L2 does not choose dynamic beam IDs; L1 does, so that they stay within the reserved range and do not overlap with static beam IDs.


Call Sequence and Slot Timing#

Slot indices are [SFN, Slot] (e.g. [N, 3], [N+1, 0]). L1 sends Slot.Ind at the start of each slot; L2 sends requests in the same slot or the next, depending on the message type.

SRS: L2 sends UL_TTI.Req with SRS PDU after Slot.Ind. L1 runs channel estimation, fills the buffer indexed by the handle, and sends SRS.IND with the same handle later. L2 uses that handle in DLBFW/ULBFW when requesting weights for that UE.

DL-BFW: L2 sends DLBFW CVI.Req one slot before the PDSCH slot so L1 can compute weights in time. In the PDSCH slot, L2 sends DL_TTI.Req (PDSCH.PDU) and TX_Data; L1 already has the weights for the C-Plane (ExtnType=11).

UL-BFW: L2 sends ULBFW CVI.Req one slot before the PUSCH slot. In the PUSCH slot, L2 sends UL_TTI.Req (PUSCH.PDU); L1 applies the precomputed receive weights. CRC.IND and RX_Data.IND refer back to that PUSCH slot.


Appendix – Sequence diagram source#

Source for the sequence diagrams referenced in this document.

1. SRS (example: SRS PDU in [N,3] and [N,13]; SRS.IND after Slot.Ind [N,13] and [N+5,13]):

SRS Diagram

2. DL-BFW and PDSCH (one slot before):

DL-BFW and PDSCH Diagram

3. UL-BFW and PUSCH (one slot before):

UL-BFW and PUSCH Diagram