DMA Configuration Details#
Overview#
Note
The ConfigDataFlow and DynamicDataFlow APIs are deprecated.
SequenceDataFlow and GatherScatterDataFlow are the preferred replacement APIs.
With SequenceDataFlow and GatherScatterDataFlow, the user is not required to understand the details covered in this document.
Through the use of ConfigDataFlow and DynamicDataFlow APIs it is possible to reconfigure the DMA engine on the fly from VPU device code. In this section, we provide a reference for each configurable DMA field and how to use the relevant Device APIs to modify the DMA engine behavior from VPU code.
The VPUC Table#
Configuration of the DMA engine from VPU is marshaled through a block of VMEM called the VPUC table. The basic layout of a VPUC table is like so:
Header: [63:32 0xDEADC0DE | 31:0 Number of entries]
Entry : [63:32 Address in DMA RAM | 31:0 Value to write]
... N contiguous entries up to the number specified in the header.
ConfigDataFlow users are required to manually set up the VPUC table.
It should be declared as a block of VMEM with enough size to hold the header and all field entries that would need be modified in a single batch by VPU code.
A device pointer to the table is provided to the ConfigDataFlow::src()
method on the host side.
On the VPU side, user code must set up the VPUC table including the header and all (value, address) pair entries.
With DynamicDataFlow, there is an internally managed VPUC table that should be declared as a block of VMEM using the DDF_PARALLEL_TBL_SIZE
or DDF_TBL_SIZE
helpers.
Field values should only be manipulated through the DynamicDataFlow Device APIs.
Only a subset of DMA fields can be modified with DynamicDataFlow, as described in the DDFPayload
struct.
DynamicDataFlow abstracts the VPUC header, the address part of each entry, and also most of the bit packing for the values.
DMA Fields#
There are 6 core DMA fields which can be modified via ConfigDataFlow or DynamicDataFlow, and several advanced fields which can only be modified by ConfigDataFlow.
The field names used here correspond to the names of members in the PvaDmaDescriptor
struct.
In this section, we provide details on the meaning of each field and how to set up VPUC entries for them.
Basic Fields#
These basic fields are supported by both ConfigDataFlow and DynamicDataFlow. Understanding these fields allows you to implement any typical dynamic access pattern in up to 2 dimensions, such as gather and scatter of tiles.
DESCR_CNTL [31:24 DST_ADDR1 | 23:16 SRC_ADDR1 | 15:8 LINK_DID | 7:7 DST_TF | 6:4 DDTM | 3:3 SRC_TF | 2:0 DSTM]#
The descriptor control field contains multiple pieces of info packed tightly into 32 bits.
SRC_ADDR1/DST_ADDR1 hold the upper 8 bits of the source/destination address, respectively.
LINK_DID holds the descriptor ID of the next descriptor in the linked list.
SRC_TF/DST_TF hold the source/destination transfer format: either 0 for pitch linear or 1 for block linear.
DSTM/DDTM hold the source/destination transfer mode as defined in the DmaTransferModeType enum.
DynamicDataFlow manages these fields internally when any of these related APIs are called: cupvaDDFOpen, cupvaDDFParallelOpen, cupvaDDFUpdateSrcAddr, cupvaDDFUpdateDstAddr, cupvaDDFUpdateLdid, cupvaDDFUpdateDstm, cupvaDDFUpdateDdtm.
ConfigDataFlow must bit-pack the field manually. Here is an example which configures a StaticDataFlow to write to VMEM from pitch linear DRAM:
...
/* vpuc_table is declared as a 2D array of uint32_t in VMEM */
/* dramAddr is declared as VMEM_POINTER and set from host code */
uint64_t srcAddr = dramAddr.base + dramAddr.offset;
uint32_t srcAddr1 = (srcAddr >> 32) & 0xFF; /* extract upper 8 bits of srcAddr */
/* srcDescId is declared as a uint8_t in VMEM and the value is set with the output of StaticDataFlow::id() on host side */
/* set up the header */
vpuc_tbl[0][0] = 1; /* the table will have one entry */
vpuc_tbl[0][1] = 0xDEADC0DE; /* magic VPUC header */
/* set up table entries */
/* Notes specific to this example:
* - VMEM uses 32b addresses, so no need to set the high bits of DST_ADDR1
* - Both buffers are pitch linear layout, no need to set SRC_TF/DST_TF
* - There is only one descriptor (StaticDataFlow) involved, so no LINK_DID
*/
vpuc_tbl[1][0] = CUPVA_DMA_FIELD(DESCR_CNTL, SRC_ADDR1, srcAddr1) |
CUPVA_DMA_FIELD(DESCR_CNTL, DDTM, DMA_TRANS_MODE_VMEM) |
CUPVA_DMA_FIELD(DESCR_CNTL, DSTM, DMA_TRANS_MODE_DRAM);
vpuc_tbl[1][1] = CUPVA_GET_DESC_ATTR_ADDR(srcDescId, DESCR_CNTL); /* derive the address from descriptor ID */
/* trigger and sync the ConfigDataFlow to ensure the table entries have been written to DMA RAM */
cupvaDataFlowTrig(cfgTrig);
cupvaDataFlowSync(cfgTrig);
/* trigger the source data flow that was modified */
cupvaDataFlowTrig(srcTrig);
...
SRC_ADR/DST_ADR [31:0]#
The source/destination address fields contain the lower 32 bits of the respective address value.
DynamicDataFlow manages these fields internally when using these APIs: cupvaDDFOpen, cupvaDDFParallelOpen, cupvaDDFUpdateSrcAddr, cupvaDDFUpdateDstAddr.
Example of manual VPUC entry setup for ConfigDataFlow, building on the above example. The triggering is omitted for brevity since it follows the identical pattern as earlier:
/* Note: It's assumed the DESC_CTRL field is unchanged from prior example, so we update 2 fields this time. */
vpuc_tbl[0][0] = 2;
vpuc_tbl[1][0] = srcAddr & 0xFFFFFFFF;
vpuc_tbl[1][1] = CUPVA_GET_DESC_ATTR_ADDR(srcDescId, SRC_ADR);
vpuc_tbl[2][0] = dstAddr; /* dstAddr is the destination buffer declared in VMEM */
vpuc_tbl[2][1] = CUPVA_GET_DESC_ATTR_ADDR(srcDescId, DST_ADR);
TILE_CNTL [31:16 TY | 15:0 TX]#
The tile control field contains the tile size TY in the upper 16 bits and TX in the lower 16 bits. Tile size is defined in units of pixels.
DynamicDataFlow sets this through the following APIs: cupvaDDFUpdateTileCntl, cupvaDDFUpdateTx, cupvaDDFUpdateTy.
Example of manual VPUC entry setup, again building on prior examples, is as follows:
/* some new tile dimensions, derived in VPU code */
uint32_t tx = newTx;
uint32_t ty = newTy;
vpuc_tbl[0][0] = 1;
vpuc_tbl[1][0] = (ty << 16) | (tx & 0xFFFF);
vpuc_tbl[1][1] = CUPVA_GET_DESC_ATTR_ADDR(srcDescId, TILE_CNTL);
LP_CNTL [31:16 DST_LP | 15:0 SRC_LP]#
The line pitch control field contains the destination line pitch in upper 16 bits and source line pitch in lower 16 bits. Line pitch is defined in units of pixels.
DynamicDataFlow sets these values though the following APIs: cupvaDDFUpdateLpCntl, cupvaDDFUpdateLpSrc, cupvaDDFUpdateLpDst.
Example manual VPUC entry setup:
/* some new line pitches, derived in VPU code */
uint32_t lpSrc = newSrcLp;
uint32_t lpDst = newDstLp;
vpuc_tbl[0][0] = 1;
vpuc_tbl[1][0] = (lpDst << 16) | (lpSrc & 0xFFFF);
vpuc_tbl[1][1] = CUPVA_GET_DESC_ATTR_ADDR(srcDescId, LP_CNTL);
TRANS_CNTL [31:27 SBADR | 26:26 SCBM | 25:25 DCBM | 24:24 PREFEN | 23:23 ITC | (22:22 RSVD) | 21:21 TTS | 20:20 BPE | 19:19 PYDIR | 18:18 PXDIR | 17:16 BPP | 15:8 PY | 7:0 PX]#
The transfer control field contains multiple pieces of info packed tightly into 32 bits.
SBADR holds the surface base address for block linear transfers.
SCBM/DCBM bits control whether the source/destination are a circular buffer (1) or not (0).
PREFEN controls prefetch behavior and is not supported by DDF or CDF (set to 0 always).
ITC controls whether intermediate transfer completion signaling is enabled. When enabled, DMA only signals VPU that a transfer is complete after all outstanding writes are complete. Otherwise, some writes may still be in flight when VPU is signaled.
TTS controls transfer size for certain internal memories and is not supported for DDF or CDF (set to 0 always).
BPE controls boundary pixel extension enabled (1) or disabled (0).
PYDIR controls whether to pad the top (0) or bottom (1) of tiles.
PXDIR controls whether to pad the left (0) or right (1) of tiles.
BPP encodes the number of bytes per pixel: 0 = 1 byte, 1 = 2 bytes, 2 = 4 bytes (a value of 3 is unsupported).
PY/PX hold the padding amount in the Y and X dimension respectively.
DynamicDataFlow initializes this field on the host side, but modifies some ranges through the following APIs: cupvaDDFUpdateSrcAddr, cupvaDDFUpdateDstAddr, cupvaDDFUpdatePx, cupvaDDFUpdatePy, cupvaDDFUpdatePxDir, cupvaDDFUpdatePyDir, cupvaDDFUpdateItc, cupvaDDFUpdateTransCntl.
It is not typical, and potentially dangerous, to modify the TRANS_CNTL field dynamically with ConfigDataFlow. Use caution when doing so. Here is a simple but typical example of modifying padding dynamically:
/* Initialize a TRANS_CNTL field with some default values for this usecase:
* - DRAM pitch linear 16b surface as source and non-circular buffer in VMEM as destination
* - utilizing boundary pixel extension when padding is in use
*/
uint32_t transCntl = CUPVA_DMA_FIELD(TRANS_CNTL, ITC, 1) |
CUPVA_DMA_FIELD(TRANS_CNTL, BPE, 1) |
CUPVA_DMA_FIELD(TRANS_CNTL, BPP, 1);
/* later on, modify the padding while maintaining the other bits as default */
vpuc_tbl[0][0] = 1;
vpuc_tbl[1][0] = transCntl |
CUPVA_DMA_FEILD(TRANS_CNTL, PY, padY) |
CUPVA_DMA_FEILD(TRANS_CNTL, PX, padX);
vpuc_tbl[1][1] = CUPVA_GET_DESC_ATTR_ADDR(srcDescId, TRANS_CNTL);
Advanced Fields#
When more exotic control over DMA is required, the following advanced fields may need to be modified using ConfigDataFlow.
SRCPT{1,2,3}_CNTL/DSTPT{1,2,3}_CNTL [31:24 NS{1,2,3}/ND{1,2,3} | 23:0 ST{1,2,3}/DT{1,2,3}]#
The source/destination pointer advancement fields, one set per each of 3 dimensions. The upper 8 bits of each field holds the number of times to repeat the transfer in the given dimension while the lower 24 bits holds the number of pixels to advance the source/destination pointer.
BFSTART_CNTL [31:16 DB_START | 15:0 SB_START]#
The circular buffer start address offset in bytes for each of the source (low 16b) and destination (high 16b) circular buffers.
BFSIZE_CNTL [31:16 DB_SIZE | 15:0 SB_SIZE]#
The circular buffer size in bytes for each of the source (low 16b) and destination (high 16b) circular buffers.
NDTM_CNTL [31:16 FRDA | (15:8 RSVD) | 7:6 DB_SIZEUB | 5:4 SB_SIZEUB | 3:2 DB_STARTUB | 1:0 SB_STARTUB]#
The NDTM controls two disparate pieces of information:
FRDA is the full replication destination address: An address in VMEM where an exact duplicate of the transfer is placed. Must be aligned to 64B.
The SB/DB_SIZEUB/STARTUB ranges hold one extra bit to extend the range of the corresponding BFSTART/BFSIZE fields. The low bit of each range is active, while the high bit is reserved (ignored).
EVENT_CNTL [31:30 PRTM | (29:29 RSVD) | 28:28 DSCLOAD | 27:25 TRIG_SW_EVENTS | 24:22 TRIG_MISC_HW_EVENTS | 21:18 TRIG_VPU_HW_EVENTS | ECET 17:16 | TRIG_CH_EVENTS 15:0]#
The event control field contains multiple pieces of info packed tightly in 32 bits.
PRTM controls special privilaged transfer settings (not supported by CDF, leave as 0).
DSCLOAD controls whether a descriptor is reloaded after any SW/HW/CH event trigger (1), or loaded before a trigger (0, default).
TRIG_SW_EVENTS controls which events from R5 triggers the transfer (not supported by CDF, leave as 0).
TRIG_MISC_HW_EVENTS controls DMA performance monitoring (not supported by CDF, leave as 0).
TRIG_VPU_HW_EVENTS controls which VPU trigger signal is enabled for this transfer (it is unusual to change this dynamically):
0: none
1: READ0, 2: STORE0
3: VPU CONFIG
4: READ1, 5: STORE1
6: READ2, 7: STORE2
8: READ3, 9: STORE3
10: READ4, 11: STORE4
12: READ5, 13: STORE5
14: READ6, 15: STORE6
ECET controls continuous event trigger setting:
0: disabled (transfer does not wait for trigger once started, streams all dimensions)
1: Every 4th dimension (after DIM2 repetitions)
2: Every 3rd dimension (after DIM1 repetitions)
3: Every tile (most typical)
TRIG_CH_EVENTS control special DMA channel trigger events (not supported by CDF, leave as 0)