Pipeline#
Modular data processing framework for GPU-accelerated computational pipelines.
Overview#
The Pipeline library provides a high-performance framework for building modular, GPU-accelerated processing pipelines. It enables composing complex data transformations from reusable modules with efficient CUDA stream management and optional CUDA graph execution for minimal latency.
Key Features#
Modular Architecture: Compose pipelines from reusable processing modules (
IModule)Flexible Execution: Support for both stream-based and CUDA graph execution modes
Factory Pattern: Configuration-driven pipeline and module creation via
PipelineSpecMemory Management: Unified memory allocation with
PipelineMemoryManagerZero-Copy Optimization: Direct data flow between modules without intermediate copies
Module Routing: Automatic data routing between modules based on port connections
TensorRT Integration: First-class support for TensorRT engines
CUDA Graph Capture: Automatic CUDA graph construction for low-latency execution
Quick Start#
Creating a Module#
// Create a module with configuration parameters
const SimpleModule::StaticParams params{.tensor_size = 2048};
auto module = std::make_unique<SimpleModule>("my_module", params);
// Query module properties
const auto type_id = module->get_type_id();
const auto instance_id = module->get_instance_id();
Querying Module Ports#
// Query input and output ports
const auto input_ports = module->get_input_port_names();
const auto output_ports = module->get_output_port_names();
// Get tensor information for a specific port
const auto input_info = module->get_input_tensor_info("input");
const auto output_info = module->get_output_tensor_info("output");
Using a Module Factory#
// Create a module factory
auto factory = std::make_unique<SimpleModuleFactory>();
// Check if a module type is supported
const bool supported = factory->supports_module_type("simple_module");
// Create a module using the factory
const SimpleModule::StaticParams params{.tensor_size = 512};
auto module = factory->create_module("simple_module", "factory_module", std::any(params));
Core Concepts#
Pipeline Architecture#
A Pipeline orchestrates the execution of multiple Modules connected through named Ports. Data flows through the pipeline according to defined connections, with modules processing data on GPU streams.
External Input → Module A → Module B → Module C → External Output
↓ ↓ ↓
Port Connections (defined in PipelineSpec)
Key abstractions:
IPipeline: Coordinates module execution and manages data flow
IModule: Individual processing unit with defined inputs/outputs
PortInfo: Describes tensor data at module inputs/outputs
PipelineSpec: Configuration structure for pipeline creation
Modules#
A Module (IModule) is the fundamental processing unit in a pipeline.
Each module:
Has named input and output ports
Implements one of several execution interfaces:
IStreamExecutor: Direct CUDA stream executionIGraphNodeProvider: CUDA graph node provider for graph modeIAllocationInfoProvider: Memory requirements for allocation
Receives its memory allocation from
PipelineMemoryManagerProcesses data independently without knowledge of other modules
Modules are created via IModuleFactory and configured with static
parameters at construction time. Dynamic parameters (per-iteration data) are
provided via configure_io().
Pipeline Specification#
A PipelineSpec defines the complete pipeline configuration including
modules, connections, and execution mode.
Basic Pipeline Specification#
// Create a pipeline specification
pipeline::PipelineSpec spec;
spec.pipeline_name = "MyPipeline";
spec.execution_mode = pipeline::ExecutionMode::Stream;
// Define module configuration
const SimpleModule::StaticParams module_params{.tensor_size = 1024};
const pipeline::ModuleSpec module_spec(pipeline::ModuleCreationInfo{
.module_type = "simple_module",
.instance_id = "module_1",
.init_params = std::any(module_params)});
spec.modules.push_back(module_spec);
// Define external I/O
spec.external_inputs = {"input"};
spec.external_outputs = {"output"};
Pipeline with Module Connections#
pipeline::PipelineSpec spec;
spec.pipeline_name = "TwoModulePipeline";
// Add two modules
const SimpleModule::StaticParams params{.tensor_size = 1024};
spec.modules.emplace_back(pipeline::ModuleCreationInfo{
.module_type = "simple_module",
.instance_id = "module_a",
.init_params = std::any(params)});
spec.modules.emplace_back(pipeline::ModuleCreationInfo{
.module_type = "simple_module",
.instance_id = "module_b",
.init_params = std::any(params)});
// Connect module_a output to module_b input
const pipeline::PortConnection connection{
.source_module = "module_a",
.source_port = "output",
.target_module = "module_b",
.target_port = "input"};
spec.connections.push_back(connection);
Execution Modes#
The pipeline supports two execution modes configured via ExecutionMode:
// Stream mode - flexible addressing, suitable for development
pipeline::PipelineSpec stream_spec;
stream_spec.execution_mode = pipeline::ExecutionMode::Stream;
// Graph mode - fixed addressing, optimal for production
pipeline::PipelineSpec graph_spec;
graph_spec.execution_mode = pipeline::ExecutionMode::Graph;
Stream Mode (ExecutionMode::Stream):
Sequential module execution on a CUDA stream
Flexible addressing - modules accept different tensor addresses per iteration
Supports dynamic topology changes
Suitable for development and debugging
Graph Mode (ExecutionMode::Graph):
Pre-built CUDA graph executed as a single unit
Fixed addressing - tensor addresses captured during warmup
Lower latency through single graph launch
Requires stable tensor addresses before graph build
Suitable for production deployments
Pipeline Lifecycle#
A typical pipeline follows this lifecycle:
Construction: Create pipeline and modules via factory
Setup: Allocate memory and initialize modules (
setup())I/O Configuration: Establish connections and set inputs (
configure_io())Warmup: One-time initialization - load models, capture graphs (
warmup())Graph Build (graph mode only): Build CUDA graph (
build_graph()or automatic)Execution: Process data (
execute_stream()orexecute_graph())Iteration: Repeat steps 3 and 6 with new data
Warmup Phase:
The warmup() method performs expensive one-time initialization:
Loading models to device memory (TensorRT engines)
Initializing module-specific resources
This is called once after the first configure_io() and before execution.
For TensorRT modules, graph capture requires a non-default CUDA stream.
Data Flow#
Port Information#
// Allocate device memory for a tensor
const std::size_t tensor_size = 1024;
void *device_ptr{};
cudaMalloc(&device_ptr, tensor_size * sizeof(float));
// Create tensor info describing the data
const tensor::TensorInfo tensor_info{tensor::TensorInfo::DataType::TensorR32F, {tensor_size}};
// Create device tensor wrapper
const pipeline::DeviceTensor device_tensor{
.device_ptr = device_ptr, .tensor_info = tensor_info};
// Create port info for external input
pipeline::PortInfo port_info{.name = "input0", .tensors = {device_tensor}};
PortInfo describes tensor data at module ports:
name: Port identifier (e.g., “input0”, “output”)
tensors: Vector of device tensors with metadata
Data flow sequence per iteration:
External inputs provided to pipeline via
PortInfoconfigure_io()callsset_inputs()on first moduleModule processes and provides outputs via
get_outputs()Router passes outputs to next module’s inputs
Process repeats through all modules
Final outputs mapped to external outputs
DynamicParams can be passed to configure_io() to provide per-iteration
configuration to modules. Modules receive these parameters and can use them to
update their per-iteration state.
Memory Management#
Connection Copy Modes#
// Configure zero-copy mode for a connection
const auto module_params = SimpleModule::StaticParams{.tensor_size = 1024};
auto module = std::make_unique<SimpleModule>("my_module", module_params);
// Set connection to use zero-copy (if supported)
module->set_connection_copy_mode("input", pipeline::ConnectionCopyMode::ZeroCopy);
// Or configure to always copy data
module->set_connection_copy_mode("input", pipeline::ConnectionCopyMode::Copy);
ConnectionCopyMode controls data transfer between modules:
Copy: Allocate buffer and copy data via
cudaMemcpyZeroCopy: Use upstream pointer directly (no copy)
Memory Characteristics#
// Query input memory characteristics
const auto input_chars = module->get_input_memory_characteristics("input");
// Query output memory characteristics
const auto output_chars = module->get_output_memory_characteristics("output");
// Check if zero-copy is possible
const bool can_use_zero_copy = pipeline::can_zero_copy(output_chars, input_chars);
Modules declare their memory capabilities:
provides_fixed_address_for_zero_copy: Whether output addresses are stablerequires_fixed_address_for_zero_copy: Whether inputs need fixed addresses
The helper function can_zero_copy() determines if zero-copy is possible
for a connection based on upstream and downstream characteristics.
Zero-Copy Optimization#
The pipeline supports zero-copy data flow where downstream modules use
upstream pointers directly, eliminating intermediate cudaMemcpy operations.
TensorRT Modules:
Zero-copy requires upstream modules to provide fixed/stable device addresses. TensorRT fuses I/O addresses during graph capture, so addresses must be known ahead of time.
Non-TensorRT Modules:
Zero-copy is supported in both Graph and Stream modes. Device pointers can be
updated per-iteration via configure_io(), enabling flexible addressing.
However, when upstream module pointers change, data must be copied to the
downstream module’s input buffer.
General Rule:
Zero-copy possible when: upstream modules provide fixed address OR downstream modules accept dynamic addresses.
Benefits:
Reduced latency (eliminates copy overhead)
Lower memory usage (no duplicate buffers)
Better throughput (less memory bandwidth consumption)
Factory Pattern#
The pipeline library uses the factory pattern for flexible creation:
Module Factory (IModuleFactory):
Creates modules by type identifier (string)
Receives static configuration parameters as
std::anyReturns
std::unique_ptr<IModule>
Pipeline Factory (IPipelineFactory):
Creates pipelines by type identifier
Receives module factory and
PipelineSpecConstructs complete pipeline with all modules and connections
Complete Example#
The sample pipeline in framework/pipeline/samples/ demonstrates a
complete two-module pipeline chaining TensorRT execution with a CUDA kernel:
External Input 0 ┐
├─→ Module A (TensorRT Add) ─→ Module B (ReLU) ─→ Output
External Input 1 ┘
Creating the Pipeline#
const auto spec =
create_pipeline_spec(tensor_size, engine_path, pipeline::ExecutionMode::Stream);
auto pipeline = pipeline_factory_->create_pipeline("sample", "test_pipeline", spec);
// Setup pipeline (allocates memory, initializes modules)
pipeline->setup();
Configuring and Executing#
// Configure I/O with external inputs/outputs
RT_LOG_INFO("Calling configure_io()");
const pipeline::DynamicParams params{.module_specific_params = {}};
pipeline->configure_io(params, external_inputs, external_outputs, stream_);
// Perform one-time warmup (loads TRT engine, captures CUDA graph)
RT_LOG_INFO("Calling warmup()");
pipeline->warmup(stream_);
// Execute pipeline in stream mode
RT_LOG_INFO("Calling execute_stream()");
pipeline->execute_stream(stream_);
// Synchronize to ensure completion
FRAMEWORK_CUDA_RUNTIME_CHECK_THROW(cudaStreamSynchronize(stream_));
RT_LOG_INFO("Stream execution completed");
Graph Mode Execution#
// Step 1: configure_io FIRST (provides tensor addresses, establishes
// connections)
RT_LOG_INFO("Calling configure_io() - establishes connections");
const pipeline::DynamicParams params{.module_specific_params = {}};
pipeline->configure_io(params, external_inputs, external_outputs, stream_);
// Step 2: warmup() (loads TRT engine, captures CUDA graph)
RT_LOG_INFO("Calling warmup() - loads engine, captures graph");
pipeline->warmup(stream_);
// Step 3: execute graph (build_graph() called automatically on first
// execution)
RT_LOG_INFO("Calling execute_graph()");
pipeline->execute_graph(stream_);
// Synchronize
FRAMEWORK_CUDA_RUNTIME_CHECK_THROW(cudaStreamSynchronize(stream_));
RT_LOG_INFO("Graph execution completed");
Additional Examples#
For complete working examples with full setup and validation, see the test files:
Documentation Examples:
framework/pipeline/tests/pipeline_sample_tests.cpp- Clean examples demonstrating core concepts with focused test casesComplete Pipeline:
framework/pipeline/samples/tests/sample_pipeline_test.cpp- Full-featured pipeline with TensorRT and CUDA kernels, including stream and graph execution modes
These test files demonstrate complete workflows including memory allocation, module creation, pipeline setup, warmup, execution, and result validation.
API Reference#
-
enum class framework::pipeline::ConnectionCopyMode#
Mode for data transfer between modules
Specifies whether data copying is required between connected modules.
Values:
-
enumerator Copy#
Allocate buffer and copy data (cudaMemcpy)
-
enumerator ZeroCopy#
Use upstream address directly (no copy)
-
enumerator Copy#
-
enum class framework::pipeline::ExecutionMode#
Pipeline execution mode determining addressing and memory allocation strategy
The execution mode is a static configuration set at pipeline construction time and cannot change during the pipeline’s lifetime. It determines:
Memory allocation strategy (fixed vs. dynamic addressing)
Zero-copy optimization possibilities
TRT engine configuration (graph capture vs. stream mode)
Values:
-
enumerator Stream#
Stream mode: flexible addressing, supports dynamic set_tensor_address() per iteration, enables zero-copy with dynamic upstream addresses
-
using framework::pipeline::DualKernelLaunchConfig = KernelLaunchConfig<2>#
Type alias for dual kernel configuration (2 parameters)
-
using framework::pipeline::ModuleSpec = fluent::NamedType<ModuleCreationInfo, struct ModuleSpecTag>#
Strong type for module specifications using NamedType
Provides type safety and clearer intent when working with module specifications in pipeline configurations.
-
template<std::size_t NUM_PARAMS>
inline void framework::pipeline::setup_kernel_function( - KernelLaunchConfig<NUM_PARAMS> &config,
- const void *kernel_func,
Helper to setup kernel function in launch configuration
- Parameters:
config – Kernel launch configuration to update
kernel_func – Pointer to the kernel function
- Throws:
std::runtime_error – if cudaGetFuncBySymbol fails
-
template<std::size_t NUM_PARAMS>
inline void framework::pipeline::setup_kernel_dimensions( - KernelLaunchConfig<NUM_PARAMS> &config,
- const dim3 grid_dim,
- const dim3 block_dim,
- const std::size_t shared_mem_bytes = 0,
Helper to setup kernel dimensions in launch configuration
- Parameters:
config – Kernel launch configuration to update
grid_dim – Grid dimensions
block_dim – Block dimensions
shared_mem_bytes – Shared memory size in bytes (default: 0)
-
template<std::size_t NUM_PARAMS, typename ...Args>
void framework::pipeline::setup_kernel_arguments( - KernelLaunchConfig<NUM_PARAMS> &config,
- Args&... args,
Helper to setup kernel arguments in launch configuration
- Parameters:
config – Kernel launch configuration to update
args – Variable number of device pointer arguments
- inline CUresult framework::pipeline::launch_kernel(
- const CUDA_KERNEL_NODE_PARAMS &kernel_node_params,
- cudaStream_t stream,
Launch a kernel using the pre-configured parameters.
This function wraps cuLaunchKernel with the parameters from CUDA_KERNEL_NODE_PARAMS. It enables consistent kernel launching across all modules using the cuBB pattern.
- Parameters:
kernel_node_params – Pre-configured kernel launch parameters
stream – CUDA stream for kernel execution
- Returns:
CUresult indicating success (CUDA_SUCCESS) or failure
- std::size_t framework::pipeline::align_memory_offset(
- std::size_t offset,
- std::size_t alignment,
Align offset to specified boundary
- Parameters:
offset – [in] Current offset
alignment – [in] Alignment requirement
- Returns:
Aligned offset
- ModuleMemoryRequirements framework::pipeline::calculate_pipeline_memory_requirements(
- const std::vector<ModuleMemoryRequirements> &module_requirements,
Calculate total memory requirements for a collection of modules
This function simulates the allocation process to account for alignment padding between modules. It ensures the calculated total matches what will actually be needed during allocation.
- Parameters:
module_requirements – [in] Vector of memory requirements from each module
- Returns:
Total memory requirements including alignment padding
- ModuleMemoryRequirements framework::pipeline::calculate_pipeline_memory_requirements(
- const std::vector<IModule*> &modules,
Calculate total memory requirements for a collection of modules
This overload extracts requirements from the modules directly.
- Parameters:
modules – [in] Vector of modules to calculate requirements for
- Throws:
std::runtime_error – if any module doesn’t implement IAllocationInfoProvider
- Returns:
Total memory requirements including alignment padding
- inline std::ostream &framework::pipeline::operator<<(
- std::ostream &oss,
- const ModuleMemoryRequirements &req,
Stream output operator for ModuleMemoryRequirements
- Parameters:
oss – [inout] Output stream to write to
req – [in] ModuleMemoryRequirements to output
- Returns:
Reference to the output stream
- inline std::ostream &framework::pipeline::operator<<(
- std::ostream &oss,
- const ModuleMemorySlice &req,
Stream output operator for ModuleMemorySlice
- Parameters:
oss – [inout] Output stream to write to
req – [in] ModuleMemorySlice to output
- Returns:
Reference to the output stream
- inline bool framework::pipeline::can_zero_copy(
- const OutputPortMemoryCharacteristics &upstream,
- const InputPortMemoryCharacteristics &downstream,
Helper function to determine if zero-copy is possible for a connection.
Zero-copy decision matrix:
| Upstream Provides Fixed | Downstream Requires Fixed | Zero-Copy? | Explanation | |———————-—|—————————|————|—————————————————————————–—| | true | true | YES | Graph mode: downstream uses upstream’s fixed address (no allocation) | | true | false | YES | Stream mode: downstream uses upstream’s fixed address each tick | | false | false | YES | Stream mode: downstream uses upstream’s changing address each tick | | false | true | NO | Incompatible: downstream needs fixed address but upstream changes → must copy |
The ONLY case requiring allocation + copy: upstream dynamic AND downstream requires fixed (e.g., graph mode with changing external inputs)
- Parameters:
upstream – [in] Output characteristics from the producing module
downstream – [in] Input characteristics from the consuming module
- Returns:
true if zero-copy is possible, false if copy is required
-
struct DeviceTensor#
- #include <types.hpp>
Represents a tensor with its device memory location and metadata.
Each device tensor contains the device pointer to tensor data along with its associated metadata (dimensions, type, etc.). This allows ports to contain multiple tensors, each with its own device memory address.
-
class DynamicKernelLaunchConfig : public framework::pipeline::IKernelLaunchConfig#
- #include <dynamic_kernel_launch_config.hpp>
Dynamic kernel launch configuration supporting arbitrary number of arguments
This class extends the base kernel launch configuration to support kernels with variable numbers of arguments. Use this when the number of kernel arguments is not known at compile time or varies significantly. For known fixed sizes, prefer KernelLaunchConfig<N>.
Public Functions
-
virtual void setup_kernel_function(const void *kernel_func) override#
Setup kernel function pointer
- Parameters:
kernel_func – [in] Pointer to the kernel function
- Throws:
std::runtime_error – if cudaGetFuncBySymbol fails
- virtual void setup_kernel_dimensions(
- const dim3 grid_dim,
- const dim3 block_dim,
- const std::size_t shared_mem_bytes = 0,
Setup kernel dimensions and shared memory
- Parameters:
grid_dim – [in] Grid dimensions
block_dim – [in] Block dimensions
shared_mem_bytes – [in] Shared memory size in bytes (default: 0)
- Throws:
std::invalid_argument – if shared memory size exceeds maximum
-
virtual CUresult launch(cudaStream_t stream) const override#
Launch kernel using the configured parameters
- Parameters:
stream – [in] CUDA stream for kernel execution
- Returns:
CUresult indicating success (CUDA_SUCCESS) or failure
-
virtual void setup_kernel_function(const void *kernel_func) override#
-
struct DynamicParams#
- #include <types.hpp>
Container for dynamic parameters that can change per iteration.
This structure holds parameters that may vary for each iteration or unit of work being processed through the pipeline. The module_specific_params field allows modules to receive custom parameters for dynamic updates like matrix dimensions, kernel launch configurations, or other per-iteration variations.
Public Members
-
std::any module_specific_params#
Module-specific parameters for dynamic updates (e.g., matrix dimensions, kernel configs)
-
std::any module_specific_params#
-
class Graph : public framework::pipeline::IGraph#
- #include <graph.hpp>
Concrete implementation of IGraph for CUDA graph management.
This class provides a complete implementation of the IGraph interface, managing the lifecycle of CUDA graphs including creation, node addition, instantiation, upload, and launch operations.
Public Functions
-
Graph() = default#
Default constructor.
-
~Graph() override#
Destructor - cleans up CUDA graph resources.
-
virtual void create() override#
Creates a CUDA graph.
-
inline virtual bool is_created() const override#
Checks if the graph has been created.
- Returns:
true if created, false otherwise
-
inline virtual CUgraph handle() const override#
Returns the graph handle.
- Returns:
The CUDA graph handle
-
inline virtual CUgraphNode root_node() const override#
Returns the root node of the graph.
- Returns:
The root node of the graph
- virtual CUgraphNode add_kernel_node(
- std::span<const CUgraphNode> deps,
- const CUDA_KERNEL_NODE_PARAMS ¶ms,
Adds a kernel node to the graph.
- Parameters:
deps – [in] Dependency nodes that must complete before this node executes
params – [in] Kernel launch parameters for the node
- Returns:
The created graph node handle
- virtual CUgraphNode add_child_graph_node(
- std::span<const CUgraphNode> deps,
- CUgraph child_graph,
Adds a child graph node to the graph.
- Parameters:
deps – [in] Dependency nodes that must complete before this node executes
child_graph – [in] The CUDA graph to add as a child
- Returns:
The created graph node handle
-
inline virtual CUgraphExec exec_handle() const override#
Returns the graph execution handle.
- Returns:
The graph execution handle
-
virtual void instantiate(unsigned int flags) override#
Instantiates the graph for execution.
- Parameters:
flags – [in] Instantiation flags passed to cuGraphInstantiate
-
virtual void upload(cudaStream_t stream) override#
Uploads the graph to the device.
- Parameters:
stream – [in] CUDA stream for upload passed to cuGraphUpload
-
virtual void launch(cudaStream_t stream) override#
Launches the graph on the specified stream.
- Parameters:
stream – [in] CUDA stream for graph execution passed to cuGraphLaunch
-
Graph() = default#
-
class GraphManager : public framework::pipeline::IGraphManager#
- #include <graph_manager.hpp>
Concrete implementation of IGraphManager.
This class manages the lifecycle of a CUDA graph, providing a high-level interface for graph operations. It owns a single Graph instance for standard pipeline execution.
Note
Future extensions could support multiple graphs for conditional execution, DGL (Dynamic Graph Launch), or segmented execution modes.
Public Functions
-
GraphManager()#
Default constructor. Creates the internal graph instance.
-
~GraphManager() override = default#
Destructor.
-
GraphManager(const GraphManager&) = delete#
-
GraphManager &operator=(const GraphManager&) = delete#
-
GraphManager(GraphManager&&) = delete#
-
GraphManager &operator=(GraphManager&&) = delete#
-
virtual void instantiate_graph() const override#
Instantiates the graph for execution.
-
virtual void upload_graph(cudaStream_t stream) const override#
Uploads the graph to the device.
- Parameters:
stream – [in] CUDA stream for upload operation
-
virtual void launch_graph(cudaStream_t stream) const override#
Launches the graph on the specified stream.
- Parameters:
stream – [in] CUDA stream for graph execution
-
virtual CUgraphExec get_exec() const override#
Get the graph execution handle.
Returns the CUgraphExec handle needed for dynamic parameter updates via cuGraphExecKernelNodeSetParams.
- Throws:
std::runtime_error – if graph is not instantiated
- Returns:
The graph execution handle
- virtual std::span<const CUgraphNode> add_kernel_node(
- gsl_lite::not_null<IGraphNodeProvider*> graph_node_provider,
- std::span<const CUgraphNode> deps,
Adds kernel node(s) to the graph via a graph node provider.
- Parameters:
graph_node_provider – [in] Provider that will add nodes to the graph
deps – [in] Dependency nodes that must complete before these nodes execute
- Returns:
Span of created graph node handles (can contain single or multiple nodes)
-
GraphManager()#
-
class IAllocationInfoProvider#
- #include <iallocation_info_provider.hpp>
Interface for providing memory allocation requirements.
This interface allows modules to specify their memory allocation needs using the detailed ModuleMemoryRequirements structure that aligns with cuBB’s memory management patterns.
Subclassed by framework::pipelines::samples::SampleModuleA, framework::pipelines::samples::SampleModuleB, ran::fronthaul::OrderKernelModule, ran::ldpc::CrcDecoderModule, ran::ldpc::LdpcDecoderModule, ran::ldpc::LdpcDerateMatchModule, ran::pusch::InnerRxModule
Public Functions
-
virtual ~IAllocationInfoProvider() = default#
Virtual destructor.
-
IAllocationInfoProvider(const IAllocationInfoProvider&) = delete#
-
IAllocationInfoProvider(IAllocationInfoProvider&&) = delete#
- IAllocationInfoProvider &operator=(
- const IAllocationInfoProvider&,
- IAllocationInfoProvider &operator=( ) = delete#
-
virtual ModuleMemoryRequirements get_requirements() const = 0#
Get the memory requirements for a module.
- Returns:
The module’s memory requirements including static/dynamic descriptors and device tensors
-
virtual ~IAllocationInfoProvider() = default#
-
class IGraph#
- #include <igraph.hpp>
Abstract base class for all graph types Focus on the essential operations needed for graph building.
Subclassed by framework::pipeline::Graph
Public Functions
-
IGraph() = default#
Default constructor.
-
virtual ~IGraph() = default#
Virtual Destructor.
-
virtual void create() = 0#
Creates a graph.
-
virtual bool is_created() const = 0#
Checks if the graph has been created.
- Returns:
true if created, false otherwise
-
virtual CUgraph handle() const = 0#
Returns the graph handle.
- Returns:
The graph handle
-
virtual CUgraphNode root_node() const = 0#
Returns the root node of the graph.
- Returns:
The root node of the graph
- virtual CUgraphNode add_kernel_node(
- std::span<const CUgraphNode> deps,
- const CUDA_KERNEL_NODE_PARAMS ¶ms,
Adds a kernel node to the graph.
Creates a kernel node in the CUDA graph with the specified dependencies and parameters. Returns the created node handle which should be stored by the caller for later parameter updates.
- Parameters:
deps – [in] Dependency nodes that must complete before this node executes
params – [in] Kernel launch parameters for the node
- Throws:
std::runtime_error – if CUDA operation fails
- Returns:
The created graph node handle
- virtual CUgraphNode add_child_graph_node(
- std::span<const CUgraphNode> deps,
- CUgraph child_graph,
Adds a child graph node to the graph.
Integrates a pre-captured CUDA graph as a child node within this graph. This is commonly used for TensorRT engines or other stream-captured execution units. Returns the created node handle which should be stored by the caller for later parameter updates.
- Parameters:
deps – [in] Dependency nodes that must complete before this node executes
child_graph – [in] The CUDA graph to add as a child (must be a valid, captured graph)
- Throws:
std::runtime_error – if CUDA operation fails
- Returns:
The created graph node handle
-
virtual CUgraphExec exec_handle() const = 0#
Returns the graph execution handle.
- Returns:
The graph execution handle
-
virtual void instantiate(unsigned int flags) = 0#
Instantiates the graph.
- Parameters:
flags – [in] The flags passed to cuGraphInstantiate
-
virtual void upload(cudaStream_t stream) = 0#
Uploads the graph to the device.
- Parameters:
stream – [in] The stream passed to cuGraphUpload
-
virtual void launch(cudaStream_t stream) = 0#
Launches the graph.
- Parameters:
stream – [in] The stream passed to cuGraphLaunch
-
IGraph() = default#
-
class IGraphManager#
- #include <igraph_manager.hpp>
Interface for managing CUDA graph lifecycle.
This interface abstracts the creation, instantiation, uploading, and launching of CUDA graphs. It provides a high-level API for pipeline implementations to build and execute computational graphs efficiently.
Subclassed by framework::pipeline::GraphManager
Public Functions
-
IGraphManager() = default#
Default constructor.
-
virtual ~IGraphManager() = default#
Virtual destructor.
-
IGraphManager(const IGraphManager&) = delete#
-
IGraphManager(IGraphManager&&) = delete#
-
IGraphManager &operator=(const IGraphManager&) = delete#
-
IGraphManager &operator=(IGraphManager&&) = delete#
-
virtual void instantiate_graph() const = 0#
Instantiate the graph for execution.
Converts the graph definition into an executable form. Must be called after all nodes have been added and before launching.
- Throws:
std::runtime_error – if instantiation fails
-
virtual void upload_graph(cudaStream_t stream) const = 0#
Upload the graph to the device.
Prepares the graph for efficient execution on the GPU. Should be called after instantiation and before the first launch.
- Parameters:
stream – [in] CUDA stream for upload operation
- Throws:
std::runtime_error – if upload fails
-
virtual void launch_graph(cudaStream_t stream) const = 0#
Launch the graph on the specified stream.
Executes the pre-built graph on the given CUDA stream. The graph must have been created, instantiated, and uploaded before calling this method.
- Parameters:
stream – [in] CUDA stream for graph execution
- Throws:
std::runtime_error – if launch fails
-
virtual CUgraphExec get_exec() const = 0#
Get the executable graph handle.
Returns the CUDA graph execution handle, which can be used for dynamic parameter updates via cuGraphExecKernelNodeSetParams.
- Throws:
std::runtime_error – if graph is not instantiated
- Returns:
The graph execution handle
- virtual std::span<const CUgraphNode> add_kernel_node(
- gsl_lite::not_null<IGraphNodeProvider*> graph_node_provider,
- std::span<const CUgraphNode> deps,
Add kernel node(s) to the graph via a graph node provider.
Delegates to the provided graph node provider to add its kernel node(s) to the managed graph. The provider specifies dependencies and returns the created node handles.
- Parameters:
graph_node_provider – [in] Provider that will add nodes to the graph
deps – [in] Dependency nodes that must complete before these nodes execute
- Throws:
std::runtime_error – if node addition fails
- Returns:
Span of created graph node handles (can contain single or multiple nodes)
-
IGraphManager() = default#
-
class IGraphNodeProvider#
- #include <igraph_node_provider.hpp>
Interface for providing a graph of nodes.
This interface provides methods for providing a graph of nodes, which are connected by data dependencies. It also supports dynamic updates to graph node parameters for scenarios requiring cuGraphExecKernelNodeSetParams.
Subclassed by framework::pipelines::samples::SampleModuleA, framework::pipelines::samples::SampleModuleB, ran::fronthaul::OrderKernelModule, ran::ldpc::CrcDecoderModule, ran::ldpc::LdpcDecoderModule, ran::ldpc::LdpcDerateMatchModule, ran::pusch::InnerRxModule
Public Functions
-
IGraphNodeProvider() = default#
Default constructor.
-
virtual ~IGraphNodeProvider() = default#
Virtual destructor.
-
IGraphNodeProvider(IGraphNodeProvider&&) = default#
Move constructor.
-
IGraphNodeProvider &operator=(IGraphNodeProvider&&) = default#
Move assignment operator.
- Returns:
Reference to this object
-
IGraphNodeProvider(const IGraphNodeProvider&) = delete#
-
IGraphNodeProvider &operator=(const IGraphNodeProvider&) = delete#
- virtual std::span<const CUgraphNode> add_node_to_graph(
- gsl_lite::not_null<IGraph*> graph,
- const std::span<const CUgraphNode> deps,
Add node(s) to the graph.
- Parameters:
graph – [in] The graph to add the node(s) to
deps – [in] The dependencies of the node(s)
- Throws:
std::runtime_error – if CUDA graph node creation fails
- Returns:
Span of created graph node handles (can contain single or multiple nodes)
- virtual void update_graph_node_params(
- CUgraphExec exec,
- const DynamicParams ¶ms,
Update graph node parameters for dynamic iteration changes.
This method enables dynamic updates to kernel launch parameters using cuGraphExecKernelNodeSetParams. Modules can extract their specific parameters from params.module_specific_params and update their graph nodes accordingly (e.g., changing grid dimensions, shared memory size).
- Parameters:
exec – The executable graph to update
params – Dynamic parameters containing module-specific parameters
- Throws:
std::runtime_error – if cuGraphExecKernelNodeSetParams fails
-
IGraphNodeProvider() = default#
-
class IKernelLaunchConfig#
- #include <ikernel_launch_config.hpp>
Interface for kernel launch configuration
This interface provides a polymorphic approach to kernel launch configuration, supporting variable numbers of kernel arguments and different configuration types while maintaining backward compatibility and clean architecture.
Subclassed by framework::pipeline::DynamicKernelLaunchConfig, framework::pipeline::KernelLaunchConfig< NUM_PARAMS >
Public Functions
-
IKernelLaunchConfig() = default#
Default constructor
-
virtual ~IKernelLaunchConfig() = default#
Virtual destructor
-
IKernelLaunchConfig(const IKernelLaunchConfig&) = default#
Copy constructor
-
IKernelLaunchConfig(IKernelLaunchConfig&&) = default#
Move constructor
-
IKernelLaunchConfig &operator=(const IKernelLaunchConfig&) = default#
Copy assignment operator
- Returns:
Reference to this object
-
IKernelLaunchConfig &operator=(IKernelLaunchConfig&&) = default#
Move assignment operator
- Returns:
Reference to this object
-
virtual CUresult launch(cudaStream_t stream) const = 0#
Launch kernel using the configured parameters
- Parameters:
stream – [in] CUDA stream for kernel execution
- Returns:
CUresult indicating success (CUDA_SUCCESS) or failure
-
virtual void setup_kernel_function(const void *kernel_func) = 0#
Setup kernel function pointer
- Parameters:
kernel_func – [in] Pointer to the kernel function
- virtual void setup_kernel_dimensions(
- const dim3 grid_dim,
- const dim3 block_dim,
- const std::size_t shared_mem_bytes = 0,
Setup kernel dimensions and shared memory
- Parameters:
grid_dim – [in] Grid dimensions
block_dim – [in] Block dimensions
shared_mem_bytes – [in] Shared memory size in bytes (default: 0)
-
template<typename ...Args>
inline void setup_kernel_arguments( - Args&&... args,
Setup kernel arguments using variadic template
This method provides a compile-time interface for setting up kernel arguments. It clears existing arguments, adds each argument, and finalizes the configuration.
- Parameters:
args – [in] Variadic arguments to pass to the kernel (must be pointers)
-
IKernelLaunchConfig() = default#
-
class IModule#
- #include <imodule.hpp>
Base interface for all algorithm modules.
This interface defines the contract that all modules in the processing pipeline must adhere to, including initialization, tensor information, and capability interfaces.
Subclassed by framework::pipelines::samples::SampleModuleA, framework::pipelines::samples::SampleModuleB, ran::fronthaul::OrderKernelModule, ran::ldpc::CrcDecoderModule, ran::ldpc::LdpcDecoderModule, ran::ldpc::LdpcDerateMatchModule, ran::pusch::InnerRxModule
Public Functions
-
IModule() = default#
Default constructor.
-
IModule &operator=(const IModule&) = default#
Copy assignment operator.
- Returns:
Reference to this object
-
IModule &operator=(IModule&&) = default#
Move assignment operator.
- Returns:
Reference to this object
-
virtual ~IModule() = default#
Virtual destructor.
-
virtual std::string_view get_type_id() const = 0#
Get the type identifier of the module.
- Returns:
The type ID as a string_view
-
virtual std::string_view get_instance_id() const = 0#
Get the instance identifier of the module.
- Returns:
The instance ID as a string_view
-
virtual void setup_memory(const ModuleMemorySlice &memory_slice) = 0#
Perform one-time setup after memory allocation.
- Parameters:
memory_slice – [in] Memory slice allocated by PipelineMemoryManager
- virtual std::vector<tensor::TensorInfo> get_input_tensor_info(
- std::string_view port_name,
Get the input tensor information for a specified port.
- Parameters:
port_name – [in] The name of the input port
- Returns:
Vector of tensor information for all tensors on this port
- virtual std::vector<tensor::TensorInfo> get_output_tensor_info(
- std::string_view port_name,
Get the output tensor information for a specified port.
- Parameters:
port_name – [in] The name of the output port
- Returns:
Vector of tensor information for all tensors on this port
-
virtual std::vector<std::string> get_input_port_names() const = 0#
Get the names of all input ports.
- Returns:
A vector of port names
-
virtual std::vector<std::string> get_output_port_names() const = 0#
Get the names of all output ports.
- Returns:
A vector of port names
-
virtual void set_inputs(std::span<const PortInfo> inputs) = 0#
Set input connections for the module.
This method is called by the pipeline to connect input ports to their data sources. The module should validate that all required inputs are provided and that the port names match expected inputs.
Note
This is typically called during pipeline configuration after all modules are created
- Parameters:
inputs – [in] Span of port information with device pointers to input data
- Throws:
std::invalid_argument – if required inputs are missing or port names don’t match
-
virtual std::vector<PortInfo> get_outputs() const = 0#
Get output port information.
Returns information about all output ports including their device pointers and tensor metadata. This is used by the pipeline to route data between modules.
Note
Device pointers are only valid after setup_memory() has been called
- Returns:
Vector of port information for all outputs
-
inline virtual void warmup(cudaStream_t stream)#
Perform one-time warmup and initialization after connections are established.
This method is called once after set_inputs() to perform any expensive one-time initialization that requires knowledge of input/output connections. Examples include:
Loading machine learning models to device memory (TensorRT, PyTorch)
Capturing CUDA graphs for graph-mode execution
Allocating and initializing lookup tables
Typical pipeline lifecycle:
setup() - allocate memory, initialize data structures
set_inputs() - establish data flow connections (lightweight)
warmup(stream) - one-time initialization (expensive, called once)
loop: configure_io() + execute() - process data
Note
Default implementation is no-op (most modules don’t need warmup)
Note
This should only be called once after the first set_inputs() call
Note
Warmup must complete before build_graph() for graph-mode execution
Note
TensorRT graph capture requires a non-default stream (cannot use cudaStreamDefault)
- Parameters:
stream – [in] CUDA stream to use for warmup operations (e.g., graph capture)
- Throws:
std::runtime_error – if warmup fails
- virtual void configure_io(
- const DynamicParams ¶ms,
- cudaStream_t stream,
Configure I/O for the current iteration.
This method is called before execute() to update any parameters that change per execution. The module should use this opportunity to update internal state, kernel parameters, or any iteration-specific configuration. The subsequent execute() call will use this prepared state.
Execution flow for both stream and graph modes:
configure_io(params, stream) - prepare internal state
execute(stream) - launch work using prepared state
Note
This is called after warmup() has completed one-time initialization
Note
For graph mode, this is called before each graph launch to update captured parameters
- Parameters:
params – [in] Dynamic parameters for the current iteration
stream – [in] CUDA stream for async operations during configuration
- inline virtual InputPortMemoryCharacteristics get_input_memory_characteristics(
- std::string_view port_name,
Get memory characteristics for input ports (for zero-copy optimization).
Allows modules to declare whether they require fixed input addresses for zero-copy. Called during pipeline setup to optimize memory allocation strategy.
IMPORTANT: Only the requires_fixed_address_for_zero_copy field is used.
Note
Default implementation: flexible (doesn’t require fixed addresses)
- Parameters:
port_name – [in] Input port name
- Returns:
Input port memory characteristics
- inline virtual OutputPortMemoryCharacteristics get_output_memory_characteristics(
- std::string_view port_name,
Get memory characteristics for output ports (for zero-copy optimization).
Allows modules to declare whether they provide fixed addresses for outputs. Called during pipeline setup to optimize memory allocation strategy.
IMPORTANT: Only the provides_fixed_address_for_zero_copy field is used.
Note
Default implementation: provides fixed addresses (typical case)
- Parameters:
port_name – [in] Output port name
- Returns:
Output port memory characteristics
- inline virtual void set_connection_copy_mode(
- std::string_view port_name,
- ConnectionCopyMode mode,
Configure connection copy mode for an input port (for zero-copy optimization).
Called by the pipeline during setup() to inform the module about input characteristics. This allows modules to optimize memory allocation in get_requirements():
ConnectionCopyMode::Copy: Module must allocate input buffer and copy data
ConnectionCopyMode::ZeroCopy: Module can use input address directly (skip allocation)
Typical flow:
Pipeline analyzes input/output memory characteristics
Pipeline calls set_connection_copy_mode() to configure each input port
Module’s get_requirements() uses this info to calculate memory needs
Module allocates only what it needs (skips buffers for zero-copy inputs)
Note
Default implementation is no-op (modules that don’t support zero-copy inputs ignore)
Note
Must be called before get_requirements() for memory optimization to work
- Parameters:
port_name – [in] Input port name
mode – [in] Connection copy mode (Copy or ZeroCopy)
-
template<typename T>
inline std::add_pointer_t<const std::remove_reference_t<T>> as_type( Cast the module to a specific type, using dynamic_cast.
- Template Parameters:
T – The type to cast to
- Returns:
The casted module
-
virtual IGraphNodeProvider *as_graph_node_provider() = 0#
Cast the module to a IGraphNodeProvider.
- Returns:
The casted module
-
virtual IStreamExecutor *as_stream_executor() = 0#
Cast the module to a IStreamExecutor.
- Returns:
The casted module
-
IModule() = default#
-
class IModuleFactory#
- #include <imodule_factory.hpp>
Interface for creating modules dynamically.
This interface defines the contract for factories that can create different types of modules based on type identifiers and parameters.
Subclassed by framework::pipeline::ModuleFactory, framework::pipelines::samples::SampleModuleAFactory, framework::pipelines::samples::SampleModuleBFactory, framework::pipelines::samples::SampleModuleFactory, ran::fronthaul::OrderKernelModuleFactory, ran::ldpc::CrcDecoderModuleFactory, ran::ldpc::LdpcDecoderModuleFactory, ran::ldpc::LdpcDerateMatchModuleFactory, ran::pusch::InnerRxModuleFactory, ran::pusch::PuschModuleFactory
Public Functions
-
IModuleFactory() = default#
Default constructor.
-
virtual ~IModuleFactory() = default#
Virtual destructor.
-
IModuleFactory(IModuleFactory&&) = default#
Move constructor.
-
IModuleFactory &operator=(IModuleFactory&&) = default#
Move assignment operator.
- Returns:
Reference to this object
-
IModuleFactory(const IModuleFactory&) = delete#
-
IModuleFactory &operator=(const IModuleFactory&) = delete#
- virtual std::unique_ptr<IModule> create_module(
- std::string_view module_type,
- const std::string &instance_id,
- const std::any &static_params,
Create a module of the specified type.
- Parameters:
module_type – [in] The type of module to create (e.g., “gemm”)
instance_id – [in] The unique instance identifier for this module
static_params – [in] Type-erased static parameters for module initialization
- Throws:
std::invalid_argument – if module_type is not supported
std::bad_any_cast – if static_params type doesn’t match module requirements
- Returns:
Unique pointer to the created module
- virtual bool supports_module_type(
- std::string_view module_type,
Check if a module type is supported by this factory.
- Parameters:
module_type – [in] The type of module to check
- Returns:
true if the module type is supported, false otherwise
-
IModuleFactory() = default#
-
struct InputPortMemoryCharacteristics#
- #include <types.hpp>
Memory characteristics for a module INPUT port (for zero-copy optimization)
Describes what an input port requires from its upstream connection to enable zero-copy. Used by get_input_memory_characteristics() to declare input requirements.
Public Members
-
bool requires_fixed_address_for_zero_copy = {false}#
Whether this input port requires fixed upstream addresses for zero-copy optimization.
This field determines if the module can zero-copy with any upstream or only with fixed ones:
false = Can zero-copy with ANY upstream (fixed OR dynamic addresses)
Module is flexible: accepts any address, even if it changes per iteration
Example: TRT engine in STREAM MODE (uses set_tensor_address() per iteration)
Example: CUDA kernel with dynamic descriptors (pointer updated per iteration)
Zero-copy: ALWAYS possible! Just use whatever upstream provides
If upstream fixed: same address every iteration (no copy)
If upstream dynamic: different address each iteration (still no copy!)
true = Can ONLY zero-copy if upstream provides fixed addresses
Module MUST have fixed address BEFORE warmup()
Reason: Address needed for CUDA graph capture or other pre-warmup operations
Example: TRT engine in GRAPH MODE (address captured during graph warmup)
Zero-copy: Only possible if upstream provides_fixed_address_for_zero_copy=true (downstream uses upstream’s fixed address, no allocation needed)
If upstream provides_fixed_address_for_zero_copy=false: MUST allocate + cudaMemcpy each iteration
-
bool requires_fixed_address_for_zero_copy = {false}#
-
class IPipeline#
- #include <ipipeline.hpp>
Base interface for all processing pipelines.
This interface defines the contract that all pipelines must adhere to, including initialization, setup, and execution phases. Pipelines coordinate the execution of multiple modules and manage data flow between them.
Subclassed by framework::pipelines::samples::SamplePipeline, ran::fronthaul::OrderKernelPipeline, ran::pusch::PuschPipeline
Public Functions
-
virtual ~IPipeline() = default#
Virtual destructor.
-
virtual std::string_view get_pipeline_id() const = 0#
Get the pipeline identifier.
- Returns:
The pipeline ID as a string view
-
IPipeline() = default#
Default constructor.
-
IPipeline &operator=(const IPipeline&) = default#
Copy assignment operator.
- Returns:
Reference to this IPipeline
-
IPipeline &operator=(IPipeline&&) = default#
Move assignment operator.
- Returns:
Reference to this IPipeline
-
virtual void setup() = 0#
Perform one-time setup after initialization.
This method should create modules, allocate memory, and establish data flow connections between modules.
-
inline virtual void warmup(cudaStream_t stream)#
Perform one-time warmup and initialization of all modules.
This method calls warmup() on all modules after connections are established. It should be called once after the first set_inputs() call and before any execution. This is where expensive one-time operations occur, such as:
Loading models to device memory (TensorRT engines)
Capturing CUDA graphs for graph-mode execution
Typical pipeline lifecycle:
setup() - allocate memory, create modules
configure_io() - establish connections, set external inputs (first call)
warmup(…, stream) - one-time initialization (expensive, called once)
loop: configure_io() + execute() - process data
Note
Default implementation is no-op (pipelines can override if needed)
Note
Must be called before build_graph() for graph-mode execution
Note
Should only be called once in the pipeline lifecycle
Note
TensorRT graph capture requires a non-default stream (cannot use cudaStreamDefault)
Note
Stream parameter is last to follow standard C++ convention
- Parameters:
stream – [in] CUDA stream to use for warmup operations (passed to modules)
- Throws:
std::runtime_error – if any module warmup fails
- virtual void configure_io(
- const DynamicParams ¶ms,
- std::span<const PortInfo> external_inputs,
- std::span<PortInfo> external_outputs,
- cudaStream_t stream,
Configure the pipeline I/O for the next iteration.
This method configures external inputs/outputs and updates dynamic parameters for all modules. It must be called before execute().
Note
For the first call, this establishes connections between modules
Note
After the first call, warmup() should be called before execution
Note
Stream parameter is last to follow standard C++ convention
- Parameters:
params – [in] Dynamic parameters for the current iteration
external_inputs – [in] Span of external input port information
external_outputs – [out] Span of external output port information. Caller pre-allocates span storage; pipeline writes PortInfo metadata (including device_ptr). Pipeline retains ownership of device memory; caller must not free device pointers. Device pointers remain valid until pipeline destruction or next setup() call.
stream – [in] CUDA stream to use for I/O configuration operations (e.g., descriptor copies)
-
virtual void execute_stream(cudaStream_t stream) = 0#
Execute the pipeline using CUDA streams.
This method launches all module kernels sequentially using the parameters configured in configure_io(). The pipeline must have been initialized, setup, and configure_io must have been called before this method.
- Parameters:
stream – [in] The CUDA stream to execute on
- Throws:
std::runtime_error – if execution fails
-
virtual void execute_graph(cudaStream_t stream) = 0#
Execute the pipeline using CUDA graphs.
This method launches the pre-built CUDA graph. The graph must have been created, instantiated, and uploaded during the setup() phase. The pipeline must have been initialized, setup, and configure_io must have been called before this method.
- Parameters:
stream – [in] The CUDA stream to launch the graph on
- Throws:
std::runtime_error – if graph execution is not supported or fails
-
virtual std::size_t get_num_external_inputs() const = 0#
Get the number of external inputs required by this pipeline.
- Returns:
Number of external input tensors needed
-
virtual std::size_t get_num_external_outputs() const = 0#
Get the number of external outputs produced by this pipeline.
- Returns:
Number of external output tensors produced
-
inline virtual std::vector<PortInfo> get_outputs() const#
Get pipeline output port information.
Provides access to the pipeline’s output buffers. This allows external components to access output buffer addresses without executing the pipeline. Typical use case: getting fixed buffer addresses after warmup for zero-copy data passing between pipelines.
Note
Buffer addresses are typically stable after warmup()
Note
Not all pipelines need to implement this - only those that expose outputs for external consumption
- Throws:
std::logic_error – Default implementation throws - must be overridden by pipelines that need to expose outputs
- Returns:
Vector of output PortInfo describing each output port
-
virtual ~IPipeline() = default#
-
class IPipelineFactory#
- #include <ipipeline_factory.hpp>
Interface for creating pipelines dynamically.
This interface defines the contract for factories that can create different types of pipelines based on type identifiers and specifications. The factory pattern enables configuration-driven pipeline construction and supports extensibility through runtime registration of pipeline types.
Subclassed by framework::pipeline::PipelineFactory, framework::pipelines::samples::SamplePipelineFactory, ran::fronthaul::OrderKernelPipelineFactory
Public Functions
-
IPipelineFactory() = default#
Default constructor.
-
virtual ~IPipelineFactory() = default#
Virtual destructor.
-
IPipelineFactory(IPipelineFactory&&) = default#
Move constructor.
-
IPipelineFactory &operator=(IPipelineFactory&&) = default#
Move assignment operator.
- Returns:
Reference to this object
-
IPipelineFactory(const IPipelineFactory&) = delete#
-
IPipelineFactory &operator=(const IPipelineFactory&) = delete#
- virtual std::unique_ptr<IPipeline> create_pipeline(
- std::string_view pipeline_type,
- const std::string &pipeline_id,
- const PipelineSpec &spec,
Create a pipeline from a specification.
This method constructs a complete pipeline based on the provided specification, which includes module definitions, connections, and external I/O configuration.
- Parameters:
pipeline_type – [in] The type of pipeline to create (e.g., “skeleton”, “multi_module”)
pipeline_id – [in] Unique identifier for this pipeline instance
spec – [in] Complete pipeline specification including modules and connections
- Throws:
std::invalid_argument – if pipeline_type is not supported
std::runtime_error – if pipeline creation fails
- Returns:
Unique pointer to the created pipeline
- virtual bool is_pipeline_type_supported(
- std::string_view pipeline_type,
Check if a pipeline type is supported by this factory.
- Parameters:
pipeline_type – [in] The type of pipeline to check
- Returns:
true if the pipeline type is supported, false otherwise
- virtual std::vector<std::string> get_supported_pipeline_types(
Get all supported pipeline types.
- Returns:
Vector of supported pipeline type identifiers
-
IPipelineFactory() = default#
-
class IPipelineOutputProvider#
- #include <ipipeline_output_provider.hpp>
Interface for accessing pipeline output buffer addresses.
This interface provides access to stable output buffer addresses from pipelines (e.g., Order Kernel) for zero-copy integration with downstream consumers. Addresses remain valid after pipeline warmup for the pipeline’s lifetime.
Thread-safety: Implementations must ensure thread-safe access if called from multiple threads.
Subclassed by ran::fronthaul::Fronthaul
Public Functions
-
IPipelineOutputProvider() = default#
Default constructor.
-
virtual ~IPipelineOutputProvider() = default#
Virtual destructor for proper cleanup of derived classes.
-
IPipelineOutputProvider(IPipelineOutputProvider&&) = default#
Move constructor.
- IPipelineOutputProvider &operator=( ) = default#
Move assignment operator.
- Returns:
Reference to this object
-
IPipelineOutputProvider(const IPipelineOutputProvider&) = delete#
Deleted copy constructor (non-copyable).
- IPipelineOutputProvider &operator=(
- const IPipelineOutputProvider&,
Deleted copy assignment operator (non-copyable).
- Returns:
Reference to this object
- virtual std::span<const PortInfo> get_order_kernel_outputs(
Get Order Kernel pipeline output addresses
Provides access to stable output buffer addresses captured after Order Kernel warmup. These addresses can be used for zero-copy data passing to downstream pipelines (e.g., PUSCH pipeline).
Note
Addresses are stable after warmup and remain valid for pipeline lifetime
Note
Thread-safe if implementation provides thread-safety guarantees
- Returns:
Span of PortInfo describing Order Kernel outputs. Empty span indicates outputs are not available (e.g., pipeline not initialized).
-
IPipelineOutputProvider() = default#
-
class IStreamExecutor#
- #include <istream_executor.hpp>
Interface for executing operations directly on a CUDA stream.
This interface provides a mechanism for executing operations on a CUDA stream, typically when CUDA graph execution is not available or suitable. This interface is designed to be used via composition rather than inheritance, allowing modules to delegate execution to concrete executor implementations.
Subclassed by framework::pipeline::NullStreamExecutor, framework::pipeline::StreamExecutor, framework::pipelines::samples::SampleModuleA, framework::pipelines::samples::SampleModuleB, ran::fronthaul::OrderKernelModule, ran::ldpc::CrcDecoderModule, ran::ldpc::LdpcDecoderModule, ran::ldpc::LdpcDerateMatchModule, ran::pusch::InnerRxModule
Public Functions
-
IStreamExecutor() = default#
Default constructor.
-
virtual ~IStreamExecutor() = default#
Virtual destructor.
-
IStreamExecutor(const IStreamExecutor&) = default#
Copy constructor.
-
IStreamExecutor(IStreamExecutor&&) = default#
Move constructor.
-
IStreamExecutor &operator=(const IStreamExecutor&) = default#
Copy assignment operator.
- Returns:
Reference to this object
-
IStreamExecutor &operator=(IStreamExecutor&&) = default#
Move assignment operator.
- Returns:
Reference to this object
-
virtual void execute(cudaStream_t stream) = 0#
Execute operations on a CUDA stream.
This method launches the module’s GPU operations using parameters previously set by setup_tick(). The separation between setup_tick() and execute() allows the same pattern to work for both stream and graph execution modes.
Note
setup_tick() must be called before execute() to prepare parameters
- Parameters:
stream – [in] The CUDA stream to execute on
-
IStreamExecutor() = default#
-
class KernelDescriptorAccessor#
- #include <kernel_descriptor_accessor.hpp>
Kernel descriptor accessor for type-safe parameter handling
This is Per-Module class type instance.
Provides access to both static and dynamic kernel parameter structures in pinned memory. Each module gets its own KernelDescriptorAccessor and is responsible for copying descriptors from CPU to GPU when ready.
Public Functions
- explicit KernelDescriptorAccessor(
- const ModuleMemorySlice &memory_slice,
Constructor
- Parameters:
memory_slice – [in] Memory slice containing kernel descriptor regions
-
KernelDescriptorAccessor(const KernelDescriptorAccessor&) = delete#
- KernelDescriptorAccessor &operator=(
- const KernelDescriptorAccessor&,
-
KernelDescriptorAccessor(KernelDescriptorAccessor&&) = delete#
- KernelDescriptorAccessor &operator=( ) = delete#
-
~KernelDescriptorAccessor() = default#
Destructor
-
template<typename T>
inline T &create_static_param( - const std::size_t offset_bytes,
Create type-safe static kernel parameter at specific offset
In-place construction (placement new) so no UB when accessing the descriptor.
Note
The created object’s destructor is not called automatically. This is safe for POD types typically used as kernel parameters. Users must ensure proper cleanup for non-POD types.
- Template Parameters:
T – Kernel parameter structure type
- Parameters:
offset_bytes – [in] Byte offset within the module’s static descriptor region
- Throws:
std::runtime_error – if allocation exceeds slice bounds
- Returns:
Reference to constructed kernel parameter object (CPU memory)
-
template<typename T>
inline T &create_dynamic_param( - const std::size_t offset_bytes,
Create type-safe dynamic kernel parameter at specific offset
- Template Parameters:
T – Kernel parameter structure type
- Parameters:
offset_bytes – [in] Byte offset within the module’s dynamic descriptor region
- Throws:
std::runtime_error – if allocation exceeds slice bounds
- Returns:
Reference to constructed kernel parameter object (CPU memory)
-
void copy_static_descriptors_to_device(cudaStream_t stream) const#
Copy static descriptors from CPU to GPU memory (async)
Module calls this when static descriptors are ready. Typically called once during module initialization.
- Parameters:
stream – [in] CUDA stream for async operation
-
void copy_dynamic_descriptors_to_device(cudaStream_t stream) const#
Copy dynamic descriptors from CPU to GPU memory (async)
Module calls this when dynamic descriptors are ready. Typically called every frame/slot when parameters change.
- Parameters:
stream – [in] CUDA stream for async operation
-
template<typename T>
inline T *get_static_device_ptr( - const std::size_t offset_bytes,
Get GPU device pointer for static kernel parameters
- Template Parameters:
T – Kernel parameter structure type
- Parameters:
offset_bytes – [in] Byte offset within the static descriptor region
- Returns:
Device pointer for kernel launch
-
template<typename T>
inline T *get_dynamic_device_ptr( - const std::size_t offset_bytes,
Get GPU device pointer for dynamic kernel parameters
- Template Parameters:
T – Kernel parameter structure type
- Parameters:
offset_bytes – [in] Byte offset within the dynamic descriptor region
- Returns:
Device pointer for kernel launch
-
template<std::size_t NUM_PARAMS>
class KernelLaunchConfig : public framework::pipeline::IKernelLaunchConfig# - #include <kernel_launch_config.hpp>
Template-based kernel launch configuration
This class is designed for CUDA Graph compatibility and efficient kernel launching using the CUDA Driver API. It stores pre-configured kernel launch parameters that can be reused across multiple kernel invocations.
Usage pattern:
During setup(): Configure the CUDA_KERNEL_NODE_PARAMS once
During execute(): Launch kernel with pre-configured parameters
This approach enables:
CUDA Graph capture and replay
Reduced overhead (configuration done once, not per execution)
Consistent error handling with CUresult
Clean separation between setup and execution phases
- Template Parameters:
NUM_PARAMS – Number of kernel parameters (must be > 0)
Public Functions
- inline virtual void setup_kernel_function(
- const void *kernel_func,
Setup kernel function pointer
- Parameters:
kernel_func – [in] Pointer to the kernel function
- Throws:
std::runtime_error – if cudaGetFuncBySymbol fails
- inline virtual void setup_kernel_dimensions(
- const dim3 grid_dim,
- const dim3 block_dim,
- const std::size_t shared_mem_bytes = 0,
Setup kernel dimensions and shared memory
- Parameters:
grid_dim – [in] Grid dimensions
block_dim – [in] Block dimensions
shared_mem_bytes – [in] Shared memory size in bytes (default: 0)
- Throws:
std::invalid_argument – if shared memory size exceeds maximum
-
virtual CUresult launch(cudaStream_t stream) const override#
Launch kernel using the configured parameters
- Parameters:
stream – [in] CUDA stream for kernel execution
- Returns:
CUresult indicating success (CUDA_SUCCESS) or failure
-
inline const CUDA_KERNEL_NODE_PARAMS &get_kernel_params() const#
Get const reference to kernel node parameters for graph node creation
- Returns:
Const reference to CUDA_KERNEL_NODE_PARAMS
-
class KernelNodeHelper#
- #include <kernel_node_helper.hpp>
Helper class for managing CUDA kernel node parameters
This class encapsulates common CUDA kernel launch configuration logic that can be reused across different kernel launch configurations via composition instead of inheritance.
Public Functions
-
void setup_kernel_function(const void *kernel_func)#
Setup kernel function pointer
- Parameters:
kernel_func – [in] Pointer to the kernel function
- Throws:
std::runtime_error – if cudaGetFuncBySymbol fails
- void setup_kernel_dimensions(
- const dim3 grid_dim,
- const dim3 block_dim,
- const std::size_t shared_mem_bytes = 0,
Setup kernel dimensions and shared memory
- Parameters:
grid_dim – [in] Grid dimensions
block_dim – [in] Block dimensions
shared_mem_bytes – [in] Shared memory size in bytes (default: 0)
- Throws:
std::invalid_argument – if shared memory size exceeds maximum
-
const CUDA_KERNEL_NODE_PARAMS &get_kernel_params() const#
Get const reference to kernel node parameters
- Returns:
Const reference to CUDA_KERNEL_NODE_PARAMS for kernel launch
-
void clear_kernel_params()#
Clear kernel node parameters
-
void set_kernel_params_ptr(void **kernel_params_ptr)#
Set kernel node parameters pointer
Note
Caller must ensure the pointer is valid for the duration of the kernel launch.
- Parameters:
kernel_params_ptr – [in] Pointer to kernel parameters
-
void setup_kernel_function(const void *kernel_func)#
-
struct ModuleCreationInfo#
- #include <types.hpp>
Information needed to create a module instance.
Contains all information needed to create and initialize a module through the factory pattern. This is the underlying data for ModuleSpec.
-
class ModuleFactory : public framework::pipeline::IModuleFactory#
- #include <module_factory.hpp>
Concrete implementation of IModuleFactory with runtime registration.
This factory uses a registry pattern allowing module types to be registered at runtime. This provides flexibility for different applications to register their specific module types without modifying the factory implementation.
Example usage:
ModuleFactory factory; factory.register_module_type("gemm", [](const std::string& id, const std::any& params) { return std::make_unique<GemmModule>(id, std::any_cast<GemmConfig>(params)); }); auto module = factory.create_module("gemm", "gemm_0", gemm_config);
Public Functions
-
ModuleFactory() = default#
Default constructor.
-
~ModuleFactory() override = default#
Destructor.
-
ModuleFactory(const ModuleFactory&) = delete#
-
ModuleFactory &operator=(const ModuleFactory&) = delete#
-
ModuleFactory(ModuleFactory&&) = default#
Move constructor.
-
ModuleFactory &operator=(ModuleFactory&&) = default#
Move assignment operator.
- Returns:
Reference to this object
-
template<ModuleCreator Creator>
inline void register_module_type( - std::string_view module_type,
- Creator &&creator,
Register a module type with its creator function.
- Parameters:
module_type – [in] Type identifier for the module
creator – [in] Function that creates instances of this module type
- Throws:
std::invalid_argument – if module_type is already registered
- virtual std::unique_ptr<IModule> create_module(
- std::string_view module_type,
- const std::string &instance_id,
- const std::any &static_params,
Create a module of the specified type.
- Parameters:
module_type – [in] The type of module to create
instance_id – [in] The unique instance identifier for this module
static_params – [in] Type-erased static parameters for module initialization
- Throws:
std::invalid_argument – if module_type is not supported
std::bad_any_cast – if static_params type doesn’t match module requirements
- Returns:
Unique pointer to the created module
- virtual bool supports_module_type(
- std::string_view module_type,
Check if a module type is supported by this factory.
- Parameters:
module_type – [in] The type of module to check
- Returns:
true if the module type is supported, false otherwise
-
ModuleFactory() = default#
-
struct ModuleMemoryRequirements#
- #include <types.hpp>
Memory allocation requirements for a module
Follows cuBB’s pattern with static/dynamic kernel descriptors and device tensor allocation.
Public Members
-
std::size_t static_kernel_descriptor_bytes = {0}#
Size of static kernel parameters (set once)
-
std::size_t dynamic_kernel_descriptor_bytes{0}#
Size of dynamic kernel parameters (updated per frame)
-
std::size_t device_tensor_bytes = {0}#
Size of module’s device tensor allocation (for any use)
-
std::size_t alignment = {DEFAULT_ALIGNMENT}#
Memory alignment requirement.
Public Static Attributes
-
static constexpr std::size_t DEFAULT_ALIGNMENT = 128#
Memory alignment requirement in bytes.
-
std::size_t static_kernel_descriptor_bytes = {0}#
-
struct ModuleMemorySlice#
- #include <types.hpp>
Memory slice assigned to a module
Contains pointers to memory regions allocated by the pipeline. Follows cuBB pattern: CPU/GPU descriptor pairs + device tensor slice.
Public Members
-
std::byte *static_kernel_descriptor_cpu_ptr{nullptr}#
Pinned memory for static kernel parameters (CPU)
-
std::byte *static_kernel_descriptor_gpu_ptr{nullptr}#
Device memory for static kernel parameters (GPU)
-
std::byte *dynamic_kernel_descriptor_cpu_ptr{nullptr}#
Pinned memory for dynamic kernel parameters (CPU)
-
std::byte *dynamic_kernel_descriptor_gpu_ptr{nullptr}#
Device memory for dynamic kernel parameters (GPU)
-
std::byte *device_tensor_ptr = {nullptr}#
Device memory for module’s tensor data (intermediate/output/scratch)
-
std::size_t static_kernel_descriptor_bytes{0}#
Size of static descriptor slices (same for CPU and GPU)
-
std::size_t dynamic_kernel_descriptor_bytes{0}#
Size of dynamic descriptor slices (same for CPU and GPU)
-
std::size_t device_tensor_bytes = {0}#
Size of device tensor slice.
-
std::byte *static_kernel_descriptor_cpu_ptr{nullptr}#
-
class ModuleRouter#
- #include <module_router.hpp>
Manages routing configuration between modules in a pipeline.
This class manages how modules are connected via their input/output ports, providing efficient lookup of connections involving specific modules.
Internal Architecture: The router uses an index-based lookup optimization for efficient connection queries.
connections_: Vector storing all connections sequentially
module_to_connection_indices_: Maps each module ID to indices in connections_
When a connection A->B is added:
Connection is appended to connections_ at index N
Index N is added to both moduleA’s and moduleB’s index lists
This allows O(1) lookup of all connections involving a module, rather than O(n) scanning of all connections.
Example after adding A->B, B->C, A->C:
connections_ = [ 0: {A, output0, B, input0}, 1: {B, output0, C, input0}, 2: {A, output1, C, input1} ] module_to_connection_indices_ = { "A": [0, 2], // A is involved in connections 0 and 2 "B": [0, 1], // B is involved in connections 0 and 1 "C": [1, 2] // C is involved in connections 1 and 2 }
Public Functions
-
ModuleRouter() = default#
Default constructor.
-
~ModuleRouter() = default#
Destructor.
-
ModuleRouter(const ModuleRouter&) = delete#
-
ModuleRouter &operator=(const ModuleRouter&) = delete#
-
ModuleRouter(ModuleRouter&&) = default#
Move constructor.
-
ModuleRouter &operator=(ModuleRouter&&) = default#
Move assignment operator.
- Returns:
Reference to this object
-
void add_connection(const PortConnection &connection)#
Add a connection between two module ports.
- Parameters:
connection – [in] The port connection to add
- Throws:
std::runtime_error – if duplicate connection already exists
- std::vector<PortConnection> get_module_connections(
- std::string_view module_id,
Get all connections for a specific module.
Returns connections where the module is either source or target.
- Parameters:
module_id – [in] The module ID to query
- Returns:
Vector of connections involving this module
- std::vector<PortConnection> get_input_connections(
- std::string_view module_id,
Get input connections for a specific module.
Returns connections where the module is the target (receiving input).
- Parameters:
module_id – [in] The module ID to query
- Returns:
Vector of connections where this module is the target
- std::vector<PortConnection> get_output_connections(
- std::string_view module_id,
Get output connections for a specific module.
Returns connections where the module is the source (providing output).
- Parameters:
module_id – [in] The module ID to query
- Returns:
Vector of connections where this module is the source
-
bool has_connections(std::string_view module_id) const#
Check if a module has any connections.
- Parameters:
module_id – [in] The module ID to check
- Returns:
true if the module has connections, false otherwise
-
std::unordered_set<std::string> get_all_module_ids() const#
Get all unique module IDs in the routing configuration.
- Returns:
Set of module IDs
-
void validate() const#
Validate the routing configuration.
Checks for duplicate connections and invalid port configurations.
- Throws:
std::runtime_error – if configuration is invalid
-
class NullStreamExecutor : public framework::pipeline::IStreamExecutor#
- #include <null_stream_executor.hpp>
Null implementation of IStreamExecutor for testing and placeholder scenarios.
This class provides a no-op implementation of the IStreamExecutor interface. It can be used in unit tests or as a placeholder when stream execution is not needed.
Public Functions
-
inline virtual void execute(cudaStream_t stream) override#
Execute operation - does nothing.
- Parameters:
stream – [in] The CUDA stream to execute on (ignored)
-
inline virtual void execute(cudaStream_t stream) override#
-
struct OutputPortMemoryCharacteristics#
- #include <types.hpp>
Memory characteristics for a module OUTPUT port (for zero-copy optimization)
Describes what an output port provides to its downstream connections to enable zero-copy. Used by get_output_memory_characteristics() to declare output capabilities.
Public Members
-
bool provides_fixed_address_for_zero_copy = {true}#
Whether this output port provides fixed device addresses (for zero-copy optimization).
true = Address allocated once in setup_memory(), never changes
Enables downstream zero-copy (if downstream can accept it)
This is the typical case for most modules false = Address may change per iteration
Examples: external inputs, ping-pong buffers
Limits zero-copy to flexible consumers only
-
bool provides_fixed_address_for_zero_copy = {true}#
-
class PipelineFactory : public framework::pipeline::IPipelineFactory#
- #include <pipeline_factory.hpp>
Concrete implementation of IPipelineFactory with runtime registration.
This factory uses a registry pattern allowing pipeline types to be registered at runtime. Each pipeline creator receives a reference to the module factory for constructing pipeline modules.
Example usage:
ModuleFactory module_factory; // ... register module types ... PipelineFactory pipeline_factory(module_factory); pipeline_factory.register_pipeline_type("skeleton", [](IModuleFactory& mf, const std::string& id, const std::any& params) { return std::make_unique<SkeletonPipeline>(mf, std::any_cast<PipelineSpec>(params)); }); auto pipeline = pipeline_factory.create_pipeline("skeleton", "pipeline_0", spec);
Public Types
-
using PipelineCreator = std::function<std::unique_ptr<IPipeline>(IModuleFactory&, const std::string&, const std::any&)>#
Pipeline creator function signature. Takes module factory reference, instance ID, and parameters. Returns unique pointer to pipeline.
Public Functions
-
explicit PipelineFactory(IModuleFactory &module_factory)#
Constructor.
- Parameters:
module_factory – [in] Reference to module factory for creating pipeline modules
-
~PipelineFactory() override = default#
Destructor.
-
PipelineFactory(const PipelineFactory&) = delete#
-
PipelineFactory &operator=(const PipelineFactory&) = delete#
-
PipelineFactory(PipelineFactory&&) = delete#
-
PipelineFactory &operator=(PipelineFactory&&) = delete#
-
template<StorablePipelineCreator Creator>
void register_pipeline_type( - std::string_view pipeline_type,
- Creator &&creator,
Register a pipeline type with its creator function (concept-constrained).
Accepts any callable (lambda, function pointer, functor, std::function) that satisfies the StorablePipelineCreator concept.
This templated overload enables:
Better compile-time error messages
Zero-overhead for stateless lambdas (avoids std::function wrapper)
Type safety enforced at compile time
- Template Parameters:
Creator – Type of the callable (deduced automatically)
- Parameters:
pipeline_type – [in] Type identifier for the pipeline
creator – [in] Callable that creates instances of this pipeline type
- Throws:
std::invalid_argument – if pipeline_type is already registered
- void register_pipeline_type(
- std::string_view pipeline_type,
- PipelineCreator creator,
Register a pipeline type with its creator function (std::function overload).
This overload accepts std::function directly for explicit usage. Prefer using the templated overload for better performance with lambdas.
- Parameters:
pipeline_type – [in] Type identifier for the pipeline
creator – [in] Function that creates instances of this pipeline type
- Throws:
std::invalid_argument – if pipeline_type is already registered
- virtual std::unique_ptr<IPipeline> create_pipeline(
- std::string_view pipeline_type,
- const std::string &pipeline_id,
- const PipelineSpec &spec,
Create a pipeline from a specification.
- Parameters:
pipeline_type – [in] The type of pipeline to create
pipeline_id – [in] Unique identifier for this pipeline instance
spec – [in] Complete pipeline specification (usually PipelineSpec)
- Throws:
std::invalid_argument – if pipeline_type is not supported
std::runtime_error – if pipeline creation fails
- Returns:
Unique pointer to the created pipeline
- virtual bool is_pipeline_type_supported(
- std::string_view pipeline_type,
Check if a pipeline type is supported by this factory.
- Parameters:
pipeline_type – [in] The type of pipeline to check
- Returns:
true if the pipeline type is supported, false otherwise
- virtual std::vector<std::string> get_supported_pipeline_types(
Get all supported pipeline types.
- Returns:
Vector of supported pipeline type identifiers
-
using PipelineCreator = std::function<std::unique_ptr<IPipeline>(IModuleFactory&, const std::string&, const std::any&)>#
-
class PipelineMemoryManager#
- #include <pipeline_memory_manager.hpp>
Pipeline-level memory manager
Follows cuBB pattern where pipeline owns large contiguous memory arenas and assigns memory slices to modules. Five-tier allocation:
Static kernel descriptors CPU: Small pinned memory (like cuBB’s m_kernelStatDescr CPU)
Static kernel descriptors GPU: Small device memory (like cuBB’s m_kernelStatDescr GPU)
Dynamic kernel descriptors CPU: Small pinned memory (like cuBB’s m_kernelDynDescr CPU)
Dynamic kernel descriptors GPU: Small device memory (like cuBB’s m_kernelDynDescr GPU)
Device tensors: Large device memory (like cuBB’s m_LinearAlloc)
Note: Pipeline only allocates memory slices. Each module is responsible for copying its own descriptors from CPU to GPU when ready.
Public Functions
- PipelineMemoryManager(
- std::size_t total_static_kernel_descriptor_bytes,
- std::size_t total_dynamic_kernel_descriptor_bytes,
- std::size_t total_device_tensor_bytes,
Constructor
- Parameters:
total_static_kernel_descriptor_bytes – [in] Total memory for all static kernel descriptors (both CPU and GPU)
total_dynamic_kernel_descriptor_bytes – [in] Total memory for all dynamic kernel descriptors (both CPU and GPU)
total_device_tensor_bytes – [in] Total device memory for all module tensor allocations
-
PipelineMemoryManager(const PipelineMemoryManager&) = delete#
- PipelineMemoryManager &operator=(
- const PipelineMemoryManager&,
-
PipelineMemoryManager(PipelineMemoryManager&&) = default#
Move constructor.
-
PipelineMemoryManager &operator=(PipelineMemoryManager&&) = default#
Move assignment operator.
- Returns:
Reference to this object
-
~PipelineMemoryManager() = default#
Destructor.
- ModuleMemorySlice allocate_module_slice(
- std::string_view module_id,
- const ModuleMemoryRequirements &requirements,
Allocate memory slice for a module
- Parameters:
module_id – [in] Unique identifier for the module
requirements – [in] Memory requirements for the module
- Throws:
std::runtime_error – if allocation fails or exceeds arena capacity
- Returns:
Memory slice assigned to the module
- const ModuleMemorySlice &get_module_slice(
- std::string_view module_id,
Get memory slice for a previously allocated module
- Parameters:
module_id – [in] Module identifier
- Throws:
std::runtime_error – if module not found
- Returns:
Memory slice for the module
-
MemoryUsage get_memory_usage() const#
Get current memory usage statistics.
- Returns:
Memory usage information across all arenas
- void allocate_all_module_slices(
- const std::vector<IModule*> &modules,
Pre-allocate memory slices for all modules
This method calculates requirements and allocates memory slices for all modules in one atomic operation, ensuring consistency between calculation and allocation order. After calling this method, get_module_slice() can be used to retrieve pre-computed slices.
- Parameters:
modules – [in] Vector of modules that need memory allocation
- Throws:
std::runtime_error – if any module doesn’t implement IAllocationInfoProvider
std::runtime_error – if allocation fails
-
void copy_all_static_descriptors_to_device(cudaStream_t stream) const#
Copy all static kernel descriptors to device in one bulk operation
Copies the entire contiguous static descriptor region (all modules) from CPU pinned memory to GPU device memory. This should be called once at the end of pipeline initialization after all modules have initialized their static parameters.
- Parameters:
stream – [in] CUDA stream for async copy operation
- Throws:
std::runtime_error – if copy fails or pointers are null
- void copy_all_dynamic_descriptors_to_device(
- cudaStream_t stream,
Copy all dynamic kernel descriptors to device in one bulk operation
Copies the entire contiguous dynamic descriptor region (all modules) from CPU pinned memory to GPU device memory. This should be called every iteration in configure_io() after all modules have updated their dynamic parameters.
- Parameters:
stream – [in] CUDA stream for async copy operation
- Throws:
std::runtime_error – if copy fails or pointers are null
Public Static Functions
- static std::unique_ptr<PipelineMemoryManager> create_for_modules(
- const std::vector<IModule*> &modules,
Factory method to create a PipelineMemoryManager sized for the given modules
This method calculates the total memory requirements for all modules (including alignment) and creates a PipelineMemoryManager with the appropriate arena sizes.
- Parameters:
modules – [in] Vector of modules that will use this memory manager
- Throws:
std::runtime_error – if any module doesn’t implement IAllocationInfoProvider
- Returns:
Unique pointer to a properly sized PipelineMemoryManager
-
struct MemoryUsage#
- #include <pipeline_memory_manager.hpp>
Get total memory usage statistics
- Return:
Current memory usage across all arenas
Public Members
-
std::size_t static_kernel_descriptor_used{0}#
Used static kernel descriptor memory (CPU + GPU)
-
std::size_t dynamic_kernel_descriptor_used{0}#
Used dynamic kernel descriptor memory (CPU + GPU)
-
std::size_t device_tensor_used = {0}#
Used device tensor memory.
-
std::size_t static_kernel_descriptor_total{0}#
Total static kernel descriptor memory (CPU + GPU)
-
std::size_t dynamic_kernel_descriptor_total{0}#
Total dynamic kernel descriptor memory (CPU + GPU)
-
std::size_t device_tensor_total = {0}#
Total device tensor memory.
-
struct PipelineModuleConfig#
- #include <types.hpp>
Configuration for creating all modules in a pipeline.
Contains a list of module creation specifications that define what modules a pipeline should create and in what order. The order in the vector determines the execution order of the modules.
Public Members
-
std::vector<ModuleSpec> modules#
Ordered list of modules to create.
-
std::vector<ModuleSpec> modules#
-
struct PipelineSpec#
- #include <types.hpp>
Complete specification for constructing a pipeline.
Contains all modules, connections, and external I/O specifications needed to construct a complete pipeline through the factory.
Public Members
-
std::string pipeline_name#
Pipeline name for identification.
-
std::vector<ModuleSpec> modules#
Ordered list of modules to create.
-
std::vector<PortConnection> connections#
Module interconnections.
-
std::vector<std::string> external_inputs#
External input identifiers.
-
std::vector<std::string> external_outputs#
External output identifiers.
-
ExecutionMode execution_mode = {ExecutionMode::Graph}#
Execution mode (Graph or Stream), default Graph for backward compatibility
-
std::string pipeline_name#
-
struct PortConnection#
- #include <types.hpp>
Represents a connection between module ports.
-
struct PortInfo#
- #include <types.hpp>
Information about a module’s input or output port
This structure represents a named port containing one or more tensors, each with its own device memory pointer and tensor metadata. Used for module interconnection in pipelines.
Usage contexts:
ModuleRouter: Topology definition and connection routing
IModule::set_inputs(): Receives port info and extracts device pointers
IModule::get_outputs(): Returns port info for routing to other modules
Note: Modules typically extract and cache just the device pointers (void*) from PortInfo for execution, not the entire structure.
Public Members
-
std::string name#
Port name (e.g., “input0”, “matrixA”, “output0”)
-
std::vector<DeviceTensor> tensors#
Vector of device tensors for this port.
-
class StreamExecutor : public framework::pipeline::IStreamExecutor#
- #include <stream_executor.hpp>
Concrete implementation of IStreamExecutor that executes CUDA kernels on a stream.
This class takes a kernel launch configuration and executes it on the provided CUDA stream. It handles the actual kernel launch using the CUDA Driver API.
Public Functions
- explicit StreamExecutor(
- const IKernelLaunchConfig *kernel_launch_config,
Constructor
- Parameters:
kernel_launch_config – [in] Pointer to the kernel launch configuration This must remain valid for the lifetime of the executor
-
virtual void execute(cudaStream_t stream) override#
Execute the kernel on a CUDA stream.
Uses kernel parameters previously configured via the kernel_launch_config. Dynamic parameters should be set via setup_tick() on the owning module before calling execute().
- Parameters:
stream – [in] The CUDA stream to execute on
- Throws:
std::runtime_error – if kernel launch fails
-
class StreamExecutorFactory#
- #include <stream_executor_factory.hpp>
Factory for creating IStreamExecutor instances.
This factory can create either real StreamExecutor instances that execute CUDA kernels, or NullStreamExecutor instances for testing or no-op scenarios.
Public Types
-
enum class ExecutorType#
Executor type enumeration
Values:
-
enumerator Real#
Create real StreamExecutor instances.
-
enumerator Null#
Create NullStreamExecutor instances.
-
enumerator Real#
Public Functions
-
explicit StreamExecutorFactory(ExecutorType type)#
Constructor
- Parameters:
type – [in] The type of executors this factory should create
- std::unique_ptr<IStreamExecutor> create_stream_executor(
- const IKernelLaunchConfig *kernel_config,
Create a stream executor instance.
- Parameters:
kernel_config – [in] Pointer to kernel launch configuration Required for Real executors Can be nullptr for Null executors. kernel_config is ignored in case of Null executors.
- Returns:
A unique pointer to the created IStreamExecutor instance
-
template<std::size_t NUM_PARAMS>
inline std::unique_ptr<IStreamExecutor> create_stream_executor( - const KernelLaunchConfig<NUM_PARAMS> *kernel_config,
Create a stream executor instance with KernelLaunchConfig<NUM_PARAMS>.
- Template Parameters:
NUM_PARAMS – Number of kernel parameters
- Parameters:
kernel_config – [in] Pointer to kernel launch configuration
- Returns:
A unique pointer to the created IStreamExecutor instance
- std::unique_ptr<IStreamExecutor> create_stream_executor(
- const DynamicKernelLaunchConfig *kernel_config,
Create a stream executor instance with DynamicKernelLaunchConfig.
- Parameters:
kernel_config – [in] Pointer to dynamic kernel launch configuration
- Returns:
A unique pointer to the created IStreamExecutor instance
-
enum class ExecutorType#