SortPayload#
Overview#
SortPayload primitive performs sorting operations on a 1D array of elements. It treats consecutive elements as key-payload pairs and sort them based on the key values. It implements an optimized version of the bitonic sort algorithm using VPU vector instructions.
Requirements#
SortPayload primitive has the following requirements:
The key and payload must have the same data type.
The data type must be one of: uint32_t, int32_t.
The number of elements to sort (Size) must be one of: 512, 1024, 2048, 4096, 8192.
The SortPayload primitive only supports ascending sort based on key values.
Because we need transpose operations, the input and output buffers need to be larger than Size.
The size of input buffer should be (vecw + 2) * (Size / vecw) * sizeof(DataType) bytes.
The size of output buffer should be (Size / vecw + 1) * vecw * sizeof(DataType) bytes.
where Size = the number of elements to sort, vecw = 16 for 32-bit data.
The input data should be consecutively stored in a 1D array with no pitch.
The output data will be stored in a 2D buffer with:
Width = 2 * Size / vecw
Height = vecw / 2
Pitch = 2 * Size / vecw + 2
where Size = the number of elements to sort, vecw = 16 for 32-bit data.
Implementation Details#
The SortPayload primitive implements bitonic sort algorithm in the same way as the Sort primitive. The only difference is that the SortPayload primitive treats consecutive elements as key-payload pairs and sort them based on the key values.
The SortPayload primitive can be broken down into two phases: first it uses even-odd merge sort to sort each group of 8 vectors, then it uses bitonic merge to merge two groups of 8 sorted vectors into one group of 16 sorted vectors. This process is recursively applied until the entire array is sorted.
Intrinsics#
The SortPayload primitive uses the vsort2pl
intrinsic to sort two vectors of key-payload pairs.
void vsort2pl(vintx src1, vintx src2, vintx & dst1, vintx & dst2);
In each src
vector, key and payload are lane-interleaved. Even lanes carry key, odd lanes carry payload.
The vsort2pl
intrinsic performs a 2-way merge sort on two input vectors. For each key-payload pair in the vector, it compares the key values and swaps both key and payload values accordingly.
The SortPayload primitive also uses dvint_load_perm_transp2
to load data with reverse order permutation. This helps to form a bitonic sequence.
dvintx dvint_load_perm_transp2( agen& agen1, vshortx src);
The dvint_load_perm_transp2
intrinsic loads pairs of data with the permutation specified by src
vector. By setting src
to decreasing order (e.g. [15, 14, 13, … 1, 0]), it can load pairs in reverse order.
Please refer to Sort primitive documentation for more details about the VPU implementation of the bitonic sort algorithm.
Compatibility#
Requires PVA SDK 2.6.0 and later.