cuPHY
0.1
CUDA PHY Layer Acceration Library
|
This section describes the error correction functions of the cuPHY application programming interface. More...
Data Structures | |
struct | cuphyLDPCDecodeConfigDesc_t |
struct | cuphyLDPCDecodeDesc_t |
LDPC Decoder Descriptor. More... | |
Typedefs | |
typedef struct cuphyLDPCDecoder * | cuphyLDPCDecoder_t |
Functions | |
cuphyStatus_t | cuphyCreateLDPCDecoder (cuphyContext_t context, cuphyLDPCDecoder_t *pdecoder, unsigned int flags) |
Allocates and initializes a cuPHY LDPC decoder instance. More... | |
cuphyStatus_t | cuphyDestroyLDPCDecoder (cuphyLDPCDecoder_t decoder) |
Destroys a cuPHY LDPC decoder object. More... | |
cuphyStatus_t | cuphyErrorCorrectionLDPCDecode (cuphyLDPCDecoder_t decoder, cuphyTensorDescriptor_t tensorDescDst, void *dstAddr, cuphyTensorDescriptor_t tensorDescLLR, const void *LLRAddr, const cuphyLDPCDecodeConfigDesc_t *config, cudaStream_t strm) |
Perform a bulk LDPC decode operation on a tensor of soft input values. More... | |
cuphyStatus_t | cuphyErrorCorrectionLDPCDecodeGetLaunchDescriptor (cuphyLDPCDecoder_t decoder, cuphyLDPCDecodeLaunchConfig_t *launchConfig) |
Populates a launch configuration for the LDPC decoder. More... | |
cuphyStatus_t | cuphyErrorCorrectionLDPCDecodeGetWorkspaceSize (cuphyLDPCDecoder_t decoder, const cuphyLDPCDecodeConfigDesc_t *config, int numCodeWords, size_t *sizeInBytes) |
Returns the workspace size for and LDPC decode operation. More... | |
cuphyStatus_t | cuphyErrorCorrectionLDPCDecodeSetNormalization (cuphyLDPCDecoder_t decoder, cuphyLDPCDecodeConfigDesc_t *decodeDesc) |
Sets the min-sum normalization constant for a given LDPC configuration. More... | |
cuphyStatus_t | cuphyErrorCorrectionLDPCTransportBlockDecode (cuphyLDPCDecoder_t decoder, const cuphyLDPCDecodeDesc_t *decodeDesc, cudaStream_t strm) |
Perform a bulk LDPC decode operation on a tensor of soft input values. More... | |
typedef struct cuphyLDPCDecoder* cuphyLDPCDecoder_t |
cuPHY LDPC decoder handle
cuphyStatus_t cuphyCreateLDPCDecoder | ( | cuphyContext_t | context, |
cuphyLDPCDecoder_t * | pdecoder, | ||
unsigned int | flags | ||
) |
Allocates a cuPHY decoder instance and returns a handle in the address provided by the caller.
Returns CUPHY_STATUS_INVALID_ARGUMENT if pdecoder
is NULL.
Returns CUPHY_STATUS_ALLOC_FAILED if an LDPC decoder cannot be allocated on the host.
Returns CUPHY_STATUS_SUCCESS if allocation and initialization were successful.
context | - cuPHY context |
pdecoder | - Address for the new cuphyLDPCDecoder_t instance |
flags | - Creation flags (currently unused) |
cuphyStatus_t cuphyDestroyLDPCDecoder | ( | cuphyLDPCDecoder_t | decoder | ) |
Destroys a cuPHY LDPC decoder object that was previously created by a call to cuphyCreateLDPCDecoder. The handle provided to this function should not be used for any operations after this function returns.
Returns CUPHY_STATUS_INVALID_ARGUMENT if decoder
is NULL.
Returns CUPHY_STATUS_SUCCESS if destruction was successful.
decoder | - previously allocated cuphyLDPCDecoder_t instance |
cuphyStatus_t cuphyErrorCorrectionLDPCDecode | ( | cuphyLDPCDecoder_t | decoder, |
cuphyTensorDescriptor_t | tensorDescDst, | ||
void * | dstAddr, | ||
cuphyTensorDescriptor_t | tensorDescLLR, | ||
const void * | LLRAddr, | ||
const cuphyLDPCDecodeConfigDesc_t * | config, | ||
cudaStream_t | strm | ||
) |
Performs a bulk LDPC decode operation on an input tensor of "soft" log likelihood ratio (LLR) values.
decoder | - cuPHY LDPC decoder instance |
tensorDescDst | - tensor descriptor for LDPC output |
dstAddr | - address for LDPC output |
tensorDescLLR | - tensor descriptor for soft input LLR values |
LLRAddr | - address for soft input LLR values |
config | - LDPC configuration structure |
strm | - CUDA stream for LDPC execution |
If the value of algoIndex
is zero, the library will choose the "best" algorithm for the given LDPC configuration.
The type of input tensor descriptor tensorDescLLR
must be either CUPHY_R_32F or CUPHY_R_16F, and the rank must be 2.
The type of output tensor descriptor tensorDescDst
must be CUPHY_BIT, and the rank must be 2.
For input LLR tensors of type CUPHY_R_16F, loads occur as multiples of 8 elements (i.e. 16 bytes). Therefore, memory allocation should be performed such that the number of LLR elements that can be read is a multiple of 8 for each codeword. This can be done by specifying a stride that is multiple of 8 for the second dimension, or by using the CUPHY_TENSOR_ALIGN_COALESCE flag when allocating the tensor. Values read from this padded memory will not be used, and do not need to be zeroed or cleared.
For input LLR tensors of type CUPHY_R_32F, loads occur as multiples of 4 elements (i.e. 16 bytes). Therefore, memory allocation should be performed such that the number of LLR elements that can be read is a multiple of 4 for each codeword. This can be done by specifying a stride that is multiple of 4 for the second dimension, or by using the CUPHY_TENSOR_ALIGN_COALESCE flag when allocating the tensor. Values read from this padded memory will not be used, and do not need to be zeroed or cleared.
The union member of the normalization value in the configuration config
must match the LLR type in config
. In other words, if the LLR type is CUPHY_R_32F, the normalization value should be populated using the f32 union member, and if the LLR type is CUPHY_R_16F, both halves of the f16x2 union member should be set with the same normalization value in fp16 format. The CUDA __float2half2_rn() function can be used to convert a float value to a pair of fp16 values. Alternatively, if the cuphyErrorCorrectionLDPCDecodeSetNormalization() function is used, the correct union member will be set automatically by that function.
Returns CUPHY_STATUS_INVALID_ARGUMENT if:
decoder
is NULL BG
, Kb
, mb
, and Z
do not represent a valid LDPC configuration maxNumIterations
<= 0 tensorDescDst
is NULL tensorDescLLR
is NULL dstAddr
NULL LLRAddr
is NULL the
data type of tensorDescDst
and llr_type in config
do not match Returns CUPHY_STATUS_UNSUPPORTED_CONFIG if the combination of the LDPC configuration (BG
, Kb
, mb
, and Z
) is not supported for a given LLR tensor and/or algorithm index (algoIndex
).
Returns CUPHY_STATUS_UNSUPPORTED_RANK if either the input tensor descriptor (tensorDescLLR
) or output tensor descriptor (tensorDescDst
) do not have a rank of 2.
Returns CUPHY_STATUS_UNSUPPORTED_TYPE if the output tensor descriptor (tensorDescLLR
) is not of type CUPHY_BIT, or if the input tensor descriptor is not one of (CUPHY_R_32F or CUPHY_R_16F)
Returns CUPHY_STATUS_SUCCESS if the decode operation was submitted to the stream successfully.
cuphyStatus_t cuphyErrorCorrectionLDPCDecodeGetLaunchDescriptor | ( | cuphyLDPCDecoder_t | decoder, |
cuphyLDPCDecodeLaunchConfig_t * | launchConfig | ||
) |
decoder | - decoder object created by cuphyCreateLDPCDecoder |
launchConfig | - launch structure with a populated config (see cuphyLDPCDecodeConfigDesc_t) |
Returns CUPHY_STATUS_INVALID_ARGUMENT if:
decoder
is not a valid cuphyLDPCDecoder_t instance launchConfig
is NULL Returns CUPHY_STATUS_UNSUPPORTED_CONFIG if the combination of the LDPC configuration (BG
, Kb
, mb
, and Z
) is not supported for a given LLRtype
and/or algorithm index (algo
).
Returns CUPHY_STATUS_SUCCESS if the launch configuration was populated successfully
cuphyStatus_t cuphyErrorCorrectionLDPCDecodeGetWorkspaceSize | ( | cuphyLDPCDecoder_t | decoder, |
const cuphyLDPCDecodeConfigDesc_t * | config, | ||
int | numCodeWords, | ||
size_t * | sizeInBytes | ||
) |
Calculates the workspace size (in bytes) required to perform an LDPC decode operation for the given LDPC configuration.
If the algoIndex
parameter is -1, the function will return the maximum workspace size for all numbers of parity nodes less than or equal to the value of the mb
parameter (for the given lifting size Z
). This is useful for determining the maximum workspace size across different code rates.
decoder | - decoder object created by cuphyCreateLDPCDecoder |
config | - LDPC decoder configuration |
numCodeWords | - number of codewords to decode simultaneously |
sizeInBytes | - output address for calculated workspace size |
Different LDPC decoding algorithms may have different workspace requirements. If the value of algoIndex
is zero, the library will choose the "best" algorithm for the given LDPC configuration.
Returns CUPHY_STATUS_INVALID_ARGUMENT if:
BG
, Kb
, mb
, and Z
do not represent a valid LDPC configuration numCodeWords
<= 0 sizeInBytes
is NULL Returns CUPHY_STATUS_UNSUPPORTED_CONFIG if the combination of the LDPC configuration (BG
, Kb
, mb
, and Z
) is not supported for a given LLRtype
and/or algorithm index (algoIndex
).
Returns CUPHY_STATUS_SUCCESS if the size calculation was successful.
cuphyStatus_t cuphyErrorCorrectionLDPCDecodeSetNormalization | ( | cuphyLDPCDecoder_t | decoder, |
cuphyLDPCDecodeConfigDesc_t * | decodeDesc | ||
) |
Determines an appropriate LDPC decoder min-sum normalization constant, given the LLR type and num_parity_nodes fields of the input configuration. Note that if the llr_type field of the configuration is CUPHY_R_16F, the field will be set to a pair of __half values (as is expected by the LDPC decoder kernel).
decoder | - decoder object created by cuphyCreateLDPCDecoder |
decodeDesc | - decode descriptor with valid llr_type and num_parity_nodes fields |
Returns CUPHY_STATUS_INVALID_ARGUMENT if:
llr_type
or num_parity_nodes fields do not represent a valid LDPC configuration Returns CUPHY_STATUS_UNSUPPORTED_CONFIG if the combination of the LDPC configuration (BG
, Kb
, mb
, and Z
) is not supported for a given LLRtype
and/or algorithm index (algo
).
Returns CUPHY_STATUS_SUCCESS if the constant was set successfully.
cuphyStatus_t cuphyErrorCorrectionLDPCTransportBlockDecode | ( | cuphyLDPCDecoder_t | decoder, |
const cuphyLDPCDecodeDesc_t * | decodeDesc, | ||
cudaStream_t | strm | ||
) |
Performs a bulk LDPC decode operation on "soft" log likelihood ratio (LLR) values for one or more transport blocks
decoder | - cuPHY LDPC decoder instance |
decodeDesc | - LDPC decode descriptor |
strm | - CUDA stream for LDPC execution |
If the value of algo field of the descriptor decodeDesc
is zero, the library will choose the "best" algorithm for the given LDPC configuration.
The llr_type field of the decodeDesc
must be either CUPHY_R_32F or CUPHY_R_16F.
For input LLR buffers of type CUPHY_R_16F, loads occur as multiples of 8 elements (i.e. 16 bytes). Therefore, memory allocation should be performed such that the number of LLR elements that can be read is a multiple of 8 for each codeword. The memory need only be addressable. (For a multi-codeword case, the memory can lie in the next codeword.) Values read from padded memory will not be used, and do not need to be zeroed or cleared.
For input LLR tensors of type CUPHY_R_32F, loads occur as multiples of 4 elements (i.e. 16 bytes). Therefore, memory allocation should be performed such that the number of LLR elements that can be read is a multiple of 4 for each codeword. Values read from padded memory will not be used, and do not need to be zeroed or cleared.
The union member of the normalization value in the configuration config
must match the LLR type in the decode descriptor configuration. In other words, if the LLR type is CUPHY_R_32F, the normalization value should be populated using the f32 union member, and if the LLR type is CUPHY_R_16F, both halves of the f16x2 union member should be set with the same normalization value in fp16 format. The CUDA __float2half2_rn() function can be used to convert a float value to a pair of fp16 values. Alternatively, if the cuphyErrorCorrectionLDPCDecodeSetNormalization() function is used, the correct union member will be set automatically by that function.
Returns CUPHY_STATUS_INVALID_ARGUMENT if:
decoder
is NULL BG
, Kb
, mb
, and Z
do not represent a valid LDPC configuration maxNumIterations
<= 0 Returns CUPHY_STATUS_UNSUPPORTED_CONFIG if the combination of the LDPC configuration (BG
, Kb
, mb
, and Z
) is not supported for a given LLR tensor and/or algorithm index.
Returns CUPHY_STATUS_SUCCESS if the decode operation was submitted to the stream successfully.