LDPC#

Low-Density Parity-Check (LDPC) coding modules for 5G NR channel decoding.

Overview#

The LDPC library provides GPU-accelerated modules for 5G NR LDPC channel decoding, implementing the 3GPP TS 38.212 specification. Built on highly optimized cuPHY LDPC CUDA kernels, these modules deliver efficient runtime performance on NVIDIA GPUs. The library consists of three pipeline modules that work together to decode received data:

LDPC Derate Matching: Reverses rate matching to produce LLRs suitable for LDPC decoding
LDPC Decoder: Performs LDPC decoding on Log-Likelihood Ratios (LLRs)
CRC Decoder: Validates CRC checksums and concatenates code blocks into transport blocks

Each module implements the pipeline::IModule interface from the pipeline library, enabling integration into larger processing pipelines with standardized configuration, memory management, and execution patterns.

Core Concepts#

Module Configuration#

All LDPC modules are configured with static parameters at construction time. These parameters define maximum capacities and processing modes.

Creating an LDPC Decoder#

// Configure LDPC decoder with static parameters
const ran::ldpc::LdpcDecoderModule::StaticParams decoder_params{
        .clamp_value = 20.0F,
        .max_num_iterations = 20,
        .max_num_cbs_per_tb = 152,
        .max_num_tbs = 1,
        .normalization_factor = 0.125F,
        .max_iterations_method = ran::ldpc::LdpcMaxIterationsMethod::Fixed,
        .max_num_ldpc_het_configs = ran::ldpc::LDPC_MAX_HET_CONFIGS};

// Create decoder module instance
const auto decoder =
        std::make_unique<ran::ldpc::LdpcDecoderModule>("ldpc_decoder", decoder_params);

Creating a Derate Match Module#

// Configure LDPC derate matching module
const ran::ldpc::LdpcDerateMatchModule::StaticParams derate_params{
        .enable_scrambling = true,
        .max_num_tbs = 1,
        .max_num_cbs_per_tb = 152,
        .max_num_rm_llrs_per_cb = 27000,
        .max_num_ue_grps = 1};

// Create derate matching module
const auto derate_match =
        std::make_unique<ran::ldpc::LdpcDerateMatchModule>("ldpc_derate_match", derate_params);

Creating a CRC Decoder#

// Configure CRC decoder module
const ran::ldpc::CrcDecoderModule::StaticParams crc_params{
        .reverse_bytes = true, .max_num_cbs_per_tb = 152, .max_num_tbs = 1};

// Create CRC decoder module
const auto crc_decoder =
        std::make_unique<ran::ldpc::CrcDecoderModule>("crc_decoder", crc_params);

LDPC Decoder Parameters#

// Configure decoder with custom parameters
const ran::ldpc::LdpcDecoderModule::StaticParams custom_params{
        .clamp_value = 15.0F,          // Custom LLR clamping value
        .max_num_iterations = 10,      // Reduce max iterations for performance
        .max_num_cbs_per_tb = 100,     // Maximum code blocks per transport block
        .max_num_tbs = 4,              // Support multiple transport blocks
        .normalization_factor = 0.15F, // Custom normalization
        .max_iterations_method = ran::ldpc::LdpcMaxIterationsMethod::Fixed,
        .max_num_ldpc_het_configs = ran::ldpc::LDPC_MAX_HET_CONFIGS};

const auto decoder =
        std::make_unique<ran::ldpc::LdpcDecoderModule>("custom_decoder", custom_params);

Key parameters include:

clamp_value: Maximum absolute value for input LLRs
early_termination: Stop decoding when convergence is detected
max_num_iterations: Maximum decoder iterations per code block
normalization_factor: LLR normalization applied during decoding

Module Ports#

Each module exposes input and output ports for data flow:

const ran::ldpc::LdpcDecoderModule::StaticParams params{
        .clamp_value = 20.0F,
        .max_num_iterations = 20,
        .max_num_cbs_per_tb = 152,
        .max_num_tbs = 1,
        .normalization_factor = 0.125F,
        .max_iterations_method = ran::ldpc::LdpcMaxIterationsMethod::Fixed,
        .max_num_ldpc_het_configs = ran::ldpc::LDPC_MAX_HET_CONFIGS};

const auto decoder = std::make_unique<ran::ldpc::LdpcDecoderModule>("decoder", params);

// Query module input and output ports
const auto input_ports = decoder->get_input_port_names();
const auto output_ports = decoder->get_output_port_names();

CRC Decoder Ports#

The CRC decoder processes decoded bits and outputs CRC results for both code blocks and transport blocks:

const ran::ldpc::CrcDecoderModule::StaticParams params{
        .reverse_bytes = true, .max_num_cbs_per_tb = 152, .max_num_tbs = 1};

const auto crc = std::make_unique<ran::ldpc::CrcDecoderModule>("crc", params);

// CRC decoder has one input port for decoded bits
const auto inputs = crc->get_input_port_names();

// CRC decoder has three output ports
const auto outputs = crc->get_output_port_names();

The CRC decoder takes decoded bits as input and produces three outputs: code block CRCs, transport block CRCs, and transport block payloads after CRC validation and concatenation.

Memory Management#

Modules report memory requirements that must be allocated before use:

const ran::ldpc::LdpcDerateMatchModule::StaticParams params{
        .enable_scrambling = true,
        .max_num_tbs = 1,
        .max_num_cbs_per_tb = 152,
        .max_num_rm_llrs_per_cb = 27000,
        .max_num_ue_grps = 1};

const auto module = std::make_unique<ran::ldpc::LdpcDerateMatchModule>("derate_match", params);

// Query memory requirements before allocation
const auto requirements = module->get_requirements();

Processing Flow#

A typical LDPC decoding flow involves three stages:

Derate Matching: Converts received symbols to LLRs for LDPC decoding
LDPC Decoding: Decodes LLRs to produce hard-decision bits
CRC Validation: Checks CRC and assembles transport blocks

Each stage is configured with dynamic parameters (transport block configuration) via the configure_io() method and executed via execute() or integrated into a CUDA graph.

Execution Modes#

Modules support two execution modes:

Stream Execution: Direct kernel launch via IStreamExecutor::execute()
CUDA Graph Execution: Graph node creation via IGraphNodeProvider::add_node_to_graph()

CUDA graph mode enables lower-latency operation via execution graphs.

Stream Execution#

Stream execution provides direct kernel launching on a CUDA stream. After setting inputs, configure the module and execute:

// Configure I/O with dynamic parameters
ldpc_module.configure_io(params, stream.get());

// Execute based on mode
if (execution_mode == pipeline::ExecutionMode::Stream) {
    // Stream mode: Execute directly
    RT_LOG_DEBUG("Executing LDPC decoder module in stream mode");
    ldpc_module.execute(stream.get());
} else {
    // Graph mode: Create graph, add node, instantiate, and launch
    auto graph_manager = std::make_unique<pipeline::GraphManager>();

    graph_node_provider = ldpc_module.as_graph_node_provider();

    // Add module node to graph with no dependencies
    const std::vector<CUgraphNode> no_deps{};
    nodes = graph_manager->add_kernel_node(
            gsl_lite::not_null<pipeline::IGraphNodeProvider *>(graph_node_provider), no_deps);

    // Instantiate and upload graph
    graph_manager->instantiate_graph();
    graph_manager->upload_graph(stream.get());

    // Update graph node parameters
    auto *const exec = graph_manager->get_exec();
    graph_node_provider->update_graph_node_params(exec, params);

    // Launch graph
    RT_LOG_DEBUG("Executing LDPC decoder module in graph mode");
    graph_manager->launch_graph(stream.get());
}

The configure_io() method sets up internal state based on transport block parameters, then execute() launches the kernel on the provided stream.

CUDA Graph Execution#

Modules implementing IGraphNodeProvider can be integrated into CUDA graphs for lower-latency execution. First, create a graph manager and get the module’s graph interface:

// Graph mode: Create graph manager and get module's graph interface
auto graph_manager = std::make_unique<pipeline::GraphManager>();

auto *graph_node_provider = ldpc_module.as_graph_node_provider();

Next, add the module’s kernel node to the graph:

// Add module node(s) to graph with no dependencies
const std::vector<CUgraphNode> no_deps{};
const auto nodes = graph_manager->add_kernel_node(
        gsl_lite::not_null<pipeline::IGraphNodeProvider *>(graph_node_provider), no_deps);

Finally, instantiate the graph, upload it to the GPU, update parameters, and launch:

// Instantiate and upload graph
graph_manager->instantiate_graph();
graph_manager->upload_graph(stream.get());

// Update graph node parameters
auto *const exec = graph_manager->get_exec();
graph_node_provider->update_graph_node_params(exec, params);

// Launch graph
RT_LOG_DEBUG("Executing LDPC derate match module in graph mode");
graph_manager->launch_graph(stream.get());

The graph is instantiated once and can be launched repeatedly with updated parameters. This approach minimizes kernel launch overhead for repeated operations.

Additional Examples#

For complete working examples with full setup and validation, see the test files:

LDPC Decoder Tests: ran/runtime/ldpc/tests/ldpc_decoder_module_test.cpp - Full decoder module tests with H5 test vectors
Derate Match Tests: ran/runtime/ldpc/tests/ldpc_derate_match_module_test.cpp - Stream and graph execution modes
CRC Decoder Tests: ran/runtime/ldpc/tests/crc_decoder_module_test.cpp - CRC validation and transport block assembly

These test files demonstrate complete workflows including memory allocation, input/output setup, and result validation.

API Reference#

enum class ran::ldpc::ModulationOrder : std::uint32_t#

Modulation order enumeration for type-safe modulation scheme specification

Values:

enumerator Qpsk#: QPSK modulation (2 bits per symbol)

enumerator Qam16#: 16-QAM modulation (4 bits per symbol)

enumerator Qam64#: 64-QAM modulation (6 bits per symbol)

enumerator Qam256#: 256-QAM modulation (8 bits per symbol)

enum class ran::ldpc::NewDataIndicator : std::uint32_t#

New Data Indicator enumeration for type-safe transmission type specification

Values:

enumerator Retransmission#: Retransmission of previous data.

enumerator NewTransmission#: New data transmission.

enum class ran::ldpc::LdpcMaxIterationsMethod : std::uint8_t#

Method for determining maximum LDPC decoding iterations

Values:

enumerator Fixed#: Use fixed max_num_iterations value.

enumerator Lut#: Use lookup table based on spectral efficiency.

constexpr float ran::ldpc::LDPC_CLAMP_VALUE = 32.0F#: Clamp value for LLRs.

constexpr std::size_t ran::ldpc::LDPC_MAX_ITERATIONS = 10#: Maximum number of LDPC decoding iterations.

constexpr float ran::ldpc::LDPC_NORMALIZATION_FACTOR = 0.8125F#: Normalization factor.

constexpr std::size_t ran::ldpc::LDPC_MAX_HET_CONFIGS = 32#: Maximum number of heterogeneous LDPC configurations.

constexpr std::size_t ran::ldpc::MAX_NUM_RM_LLRS_PER_CB = 26112#: Maximum number of rate matching LLRs per CB.

static constexpr int ran::ldpc::BITS_PER_BYTE = 8#

ran::ldpc::DECLARE_LOG_COMPONENT( LdpcComponent, LdpcParams, OuterRxParams, DerateMatch, LdpcDecoder, CrcDecoder, LdpcDecoderModuleFactory, LdpcDerateMatchModuleFactory, CrcDecoderModuleFactory, )#: Declare logging components for LDPC subsystem

inline std::uint32_t ran::ldpc::get_scrambling_init( std::uint32_t rnti, std::uint32_t data_scram_id, )#

Get the scrambling initialization value for a given RNTI and data scrambling ID.

The scrambling initialization value is calculated per 3GPP TS 38.211 Section 6.3.1.1: c_init = n_RNTI * 2^15 + n_ID

Where:

n_RNTI is the Radio Network Temporary Identifier
n_ID is the data scrambling identity (data_scram_id)
2^15 = 32768 is the shift value defined in the specification

The scrambling sequence generator uses a length-31 Gold sequence with initialization value c_init.