SortPayload#

Overview#

SortPayload primitive performs sorting operations on a 1D array of elements. It treats consecutive elements as key-payload pairs and sort them based on the key values. It implements an optimized version of the bitonic sort algorithm using VPU vector instructions.

Requirements#

SortPayload primitive has the following requirements:

  1. The key and payload must have the same data type.

  2. The data type must be one of: uint32_t, int32_t.

  3. The number of elements to sort (Size) must be one of: 512, 1024, 2048, 4096, 8192.

  4. The SortPayload primitive only supports ascending sort based on key values.

  5. Because we need transpose operations, the input and output buffers need to be larger than Size.

    • The size of input buffer should be (vecw + 2) * (Size / vecw) * sizeof(DataType) bytes.

    • The size of output buffer should be (Size / vecw + 1) * vecw * sizeof(DataType) bytes.

where Size = the number of elements to sort, vecw = 16 for 32-bit data.

  1. The input data should be consecutively stored in a 1D array with no pitch.

  2. The output data will be stored in a 2D buffer with:

    • Width = 2 * Size / vecw

    • Height = vecw / 2

    • Pitch = 2 * Size / vecw + 2

where Size = the number of elements to sort, vecw = 16 for 32-bit data.

Implementation Details#

The SortPayload primitive implements bitonic sort algorithm in the same way as the Sort primitive. The only difference is that the SortPayload primitive treats consecutive elements as key-payload pairs and sort them based on the key values.

The SortPayload primitive can be broken down into two phases: first it uses even-odd merge sort to sort each group of 8 vectors, then it uses bitonic merge to merge two groups of 8 sorted vectors into one group of 16 sorted vectors. This process is recursively applied until the entire array is sorted.

Intrinsics#

The SortPayload primitive uses the vsort2pl intrinsic to sort two vectors of key-payload pairs.

void vsort2pl(vintx src1, vintx src2, vintx & dst1, vintx & dst2);

In each src vector, key and payload are lane-interleaved. Even lanes carry key, odd lanes carry payload.

The vsort2pl intrinsic performs a 2-way merge sort on two input vectors. For each key-payload pair in the vector, it compares the key values and swaps both key and payload values accordingly.

The SortPayload primitive also uses dvint_load_perm_transp2 to load data with reverse order permutation. This helps to form a bitonic sequence.

dvintx dvint_load_perm_transp2( agen& agen1, vshortx src);

The dvint_load_perm_transp2 intrinsic loads pairs of data with the permutation specified by src vector. By setting src to decreasing order (e.g. [15, 14, 13, … 1, 0]), it can load pairs in reverse order.

Please refer to Sort primitive documentation for more details about the VPU implementation of the bitonic sort algorithm.

Compatibility#

Requires PVA SDK 2.6.0 and later.