Memory#
Memory management and buffer allocation utilities for CPU and GPU.
Overview#
The Memory library provides efficient memory management abstractions for CUDA-enabled applications. It simplifies memory allocation, buffer management, and data transfers between CPU and GPU with RAII semantics.
Key Features#
Type-Safe Buffers: Template-based buffer management with automatic cleanup
Multiple Allocators: Device (GPU), pinned host, and monotonic allocators
Smart Pointers: CUDA-aware unique_ptr utilities with custom deleters
Zero-Copy Support: Efficient pinned memory for CPU-GPU transfers
Fast Sequential Allocation: Monotonic allocators for deterministic performance
Core Concepts#
Buffers#
Buffer is a RAII wrapper for memory allocations that handles cleanup automatically. Buffers can use different allocators to control where memory is allocated (device, pinned host, etc.).
Buffers support:
Automatic memory deallocation on destruction
Copy construction between different memory spaces
Move semantics for zero-cost ownership transfer
Device Memory Buffer#
// Allocate buffer on GPU device
Buffer<float, DeviceAlloc> device_buffer(1024);
// Query buffer properties
const auto size = device_buffer.size();
auto *const addr = device_buffer.addr();
Pinned Host Memory Buffer#
// Allocate pinned host buffer for efficient CPU-GPU transfers
Buffer<int32_t, PinnedAlloc> pinned_buffer(512);
// Access data on host
pinned_buffer.addr()[0] = 42;
pinned_buffer.addr()[1] = 100;
const auto first_value = pinned_buffer.addr()[0];
const auto second_value = pinned_buffer.addr()[1];
Copying Between Memory Spaces#
// Create pinned buffer and initialize data
Buffer<float, PinnedAlloc> host_buffer(256);
host_buffer.addr()[0] = 3.14F;
// Copy data to device buffer
const Buffer<float, DeviceAlloc> device_buffer(host_buffer);
Move Semantics#
// Create buffer
Buffer<double, DeviceAlloc> buffer1(128);
auto *const original_addr = buffer1.addr();
// Transfer ownership via move
Buffer<double, DeviceAlloc> buffer2(std::move(buffer1));
// buffer1 is now empty, buffer2 owns the memory
auto *const moved_addr = buffer2.addr();
Allocators#
DeviceAlloc and PinnedAlloc are allocator types that provide static methods for memory allocation and deallocation. They can be used directly or with Buffers.
DeviceAlloc: Allocates memory on the GPU using
cudaMalloc/cudaFreePinnedAlloc: Allocates pinned host memory using
cudaHostAlloc/cudaFreeHost
Direct Allocator Usage#
// Direct allocator usage
void *device_mem = DeviceAlloc::allocate(1024);
void *pinned_mem = PinnedAlloc::allocate(2048);
// Manual deallocation
DeviceAlloc::deallocate(device_mem);
PinnedAlloc::deallocate(pinned_mem);
Smart Pointers#
The library provides CUDA-aware smart pointer utilities that integrate with std::unique_ptr for automatic memory management.
Device Memory Smart Pointer#
// Allocate device memory with automatic cleanup
auto device_ptr = make_unique_device<float>(1024);
// Memory is automatically freed when device_ptr goes out of scope
auto *const ptr_value = device_ptr.get();
Pinned Memory Smart Pointer#
// Allocate pinned host memory with automatic cleanup
auto pinned_ptr = make_unique_pinned<int32_t>(2048);
// Access memory on host
pinned_ptr.get()[0] = 123;
const auto value = pinned_ptr.get()[0];
Monotonic Allocator#
MonotonicAlloc provides fast, sequential memory allocation from a pre-allocated buffer. This allocator is ideal for temporary allocations with known lifetimes, as it provides:
Very fast allocation (just incrementing an offset)
Guaranteed alignment for all allocations
Bulk deallocation through reset()
No per-allocation overhead
Basic Usage#
// Create monotonic allocator with 4KB buffer on device
constexpr std::uint32_t ALIGNMENT = 256;
MonotonicAlloc<ALIGNMENT, DeviceAlloc> allocator(4096);
// Perform fast sequential allocations
void *block1 = allocator.allocate(512);
void *block2 = allocator.allocate(256);
// Check current offset (bytes used)
const auto offset = allocator.offset();
const auto total_size = allocator.size();
Resetting for Reuse#
constexpr std::uint32_t ALIGNMENT = 64;
MonotonicAlloc<ALIGNMENT, PinnedAlloc> allocator(2048);
// Allocate some memory
void *block1 = allocator.allocate(1000);
const auto offset_before = allocator.offset();
// Reset allocator to reuse memory
allocator.reset();
const auto offset_after = allocator.offset();
// Memory is available for reuse
void *block2 = allocator.allocate(1000);
Additional Examples#
For complete working examples with full setup and validation, see:
framework/memory/tests/memory_sample_tests.cpp- Documentation examples and basic usage patterns
API Reference#
-
using framework::memory::UniqueGdrHandle = std::unique_ptr<struct gdr, GdrHandleDeleter>#
Type alias for std::unique_ptr with GDRCopy handle deleter
This provides automatic RAII management of GDRCopy handles. The handle is automatically closed when the unique_ptr goes out of scope.
Note: We use a custom deleter that takes gdr_t by value, allowing unique_ptr<struct gdr, Deleter> to directly store the gdr_t handle without heap allocation. The struct gdr is opaque but that’s fine since we never dereference it - we only pass the pointer to gdr_close().
Usage:
auto gdr_handle = make_unique_gdr_handle(); // Use gdr_handle.get() to get the raw gdr_t (struct gdr*) // Automatically closed when gdr_handle goes out of scope
-
template<typename T>
using framework::memory::UniqueDevicePtr = std::unique_ptr<T, DeviceDeleter<T>># Type alias for std::unique_ptr with device memory deleter
This provides automatic RAII management of CUDA device memory. The memory is automatically freed when the unique_ptr goes out of scope.
- Template Parameters:
T – The type of the pointer being managed
-
template<typename T>
using framework::memory::UniquePinnedPtr = std::unique_ptr<T, PinnedDeleter<T>># Type alias for std::unique_ptr with pinned host memory deleter
This provides automatic RAII management of CUDA pinned host memory. The memory is automatically freed when the unique_ptr goes out of scope.
- Template Parameters:
T – The type of the pointer being managed
-
constexpr std::size_t framework::memory::GPU_MIN_PIN_SIZE = GPU_PAGE_SIZE#
Minimum GDRCopy pin size in bytes (64KB page alignment requirement) Note: GPU_PAGE_SIZE and GPU_PAGE_MASK are defined in gdrapi.h
-
inline UniqueGdrHandle framework::memory::make_unique_gdr_handle()#
Creates a unique_ptr managing a GDRCopy handle
Opens a GDRCopy handle using gdr_open() and wraps it in a unique_ptr with automatic cleanup.
- Throws:
std::runtime_error – if gdr_open() fails
- Returns:
UniqueGdrHandle managing the opened handle
-
template<typename T>
UniqueDevicePtr<T> framework::memory::make_unique_device( - const std::size_t count = 1,
Creates a unique_ptr managing device memory
Allocates device memory using DeviceAlloc and wraps it in a unique_ptr with automatic cleanup. (Note: cudaMalloc leaves device memory uninitialised.)
- Template Parameters:
T – The type of elements to allocate
- Parameters:
count – [in] Number of elements to allocate (default: 1)
- Throws:
CudaRuntimeException – if device allocation fails
std::overflow_error – if size calculation overflows
- Returns:
UniqueDevicePtr managing the allocated memory
-
template<typename T>
UniquePinnedPtr<T> framework::memory::make_unique_pinned( - const std::size_t count = 1,
Creates a unique_ptr managing pinned host memory
Allocates pinned host memory using PinnedAlloc and wraps it in a unique_ptr with automatic cleanup.
- Template Parameters:
T – The type of elements to allocate
- Parameters:
count – [in] Number of elements to allocate (default: 1)
- Throws:
CudaRuntimeException – if pinned allocation fails
std::overflow_error – if size calculation overflows
- Returns:
UniquePinnedPtr managing the allocated memory
-
template<typename T, class TAlloc>
class Buffer# - #include <buffer.hpp>
Generic buffer class for managing memory allocations with different allocator types
This class provides a RAII wrapper for memory buffers that can be allocated using different allocators (host, device, pinned, etc.). It supports copy and move semantics with automatic memory management and CUDA-aware memory transfers.
- Template Parameters:
T – The element type stored in the buffer
TAlloc – The allocator type used for memory allocation/deallocation
Public Types
Public Functions
-
Buffer() = default#
Default constructor creates an empty buffer
-
inline explicit Buffer(const std::size_t num_elements)#
Constructor that allocates memory for the specified number of elements
- Parameters:
num_elements – [in] Number of elements to allocate space for
- Throws:
std::bad_alloc – if allocation fails
-
inline ~Buffer()#
Destructor deallocates the buffer memory
-
inline void deallocate_buffer()#
Manually deallocate the buffer memory and reset internal state
This method can be called multiple times safely. After calling this method, the buffer will be in an empty state.
-
template<class TAlloc2>
inline explicit Buffer(
)# Cross-allocator copy constructor
Creates a new buffer by copying data from another buffer with a different allocator type. Uses CUDA memory copy operations to transfer data between different memory spaces.
- Template Parameters:
TAlloc2 – The allocator type of the source buffer
- Parameters:
other – [in] The source buffer to copy from
- Throws:
utils::CudaRuntimeException – if CUDA memory copy fails
-
inline Buffer(const Buffer &other)#
Copy constructor
Creates a new buffer by copying data from another buffer with the same allocator type.
- Parameters:
other – [in] The source buffer to copy from
- Throws:
utils::CudaRuntimeException – if CUDA memory copy fails
-
inline Buffer &operator=(const Buffer &other)#
Copy assignment operator
Copies data from another buffer with the same allocator type. Uses copy-and-swap idiom for strong exception safety.
- Parameters:
other – [in] The source buffer to copy from
- Throws:
utils::CudaRuntimeException – if CUDA memory copy fails
- Returns:
Reference to this buffer
-
inline Buffer(Buffer &&other) noexcept#
Move constructor
Transfers ownership of the buffer from another buffer, leaving the source buffer empty.
- Parameters:
other – [inout] The source buffer to move from (will be left empty)
-
template<class TAlloc2>
explicit Buffer(
) = delete# Cross-allocator move constructor (deleted)
Cross-allocator moves are prohibited because memory allocated with one allocator must be deallocated with the same allocator. Moving memory between allocators would violate this contract and lead to undefined behavior.
For cross-allocator operations, use copy semantics instead:
Buffer<T, DestAlloc> dest(source); // Copy constructor
- Template Parameters:
TAlloc2 – The allocator type of the source buffer
- Parameters:
other – [inout] The source buffer (this operation is not allowed)
-
inline explicit Buffer(const std::vector<T> &src_vec)#
Constructor from std::vector
Creates a buffer by copying data from a std::vector using CUDA memory operations.
- Parameters:
src_vec – [in] The source vector to copy data from
- Throws:
utils::CudaRuntimeException – if CUDA memory copy fails
-
inline Buffer &operator=(Buffer &&other) noexcept#
Move assignment operator
Transfers ownership of the buffer from another buffer, properly deallocating any existing memory in this buffer first.
- Parameters:
other – [inout] The source buffer to move from (will be left empty)
- Returns:
Reference to this buffer
-
inline ElementType *addr() noexcept#
Get mutable pointer to the buffer memory
- Returns:
Pointer to the first element of the buffer, or nullptr if empty
-
inline const ElementType *addr() const noexcept#
Get const pointer to the buffer memory
- Returns:
Const pointer to the first element of the buffer, or nullptr if empty
-
inline std::size_t size() const noexcept#
Get the number of elements in the buffer
- Returns:
Number of elements the buffer can hold
-
inline ElementType &operator[](const std::size_t idx)#
Indexed access operator for host-accessible buffers
Provides unchecked array-like access to buffer elements. Only enabled for allocators that allow host access (i.e., not device allocators).
Note
This operator is only available for host-accessible allocators
Note
No bounds checking is performed. Use at() for bounds-checked access
- Parameters:
idx – [in] Index of the element to access
- Returns:
Reference to the element at the specified index
- Pre:
idx < size()
-
inline const ElementType &operator[](const std::size_t idx) const#
Read-only element access operator
Provides unchecked const access to elements in the buffer. Only available for allocators that allow host access (i.e., not device allocators).
Note
This operator is only available for host-accessible allocators
Note
No bounds checking is performed. Use at() for bounds-checked access
- Parameters:
idx – [in] Index of the element to access
- Returns:
Const reference to the element at the specified index
- Pre:
idx < size()
-
inline ElementType &at(const std::size_t idx)#
Bounds-checked element access
Provides mutable access to elements in the buffer with bounds checking. Only available for allocators that allow host access.
Note
This function is only available for host-accessible allocators
- Parameters:
idx – [in] Index of the element to access
- Throws:
std::out_of_range – if idx >= size()
- Returns:
Reference to the element at the specified index
-
inline const ElementType &at(const std::size_t idx) const#
Bounds-checked read-only element access
Provides const access to elements in the buffer with bounds checking. Only available for allocators that allow host access.
Note
This function is only available for host-accessible allocators
- Parameters:
idx – [in] Index of the element to access
- Throws:
std::out_of_range – if idx >= size()
- Returns:
Const reference to the element at the specified index
-
template<typename AllocType>
class BufferImpl : public framework::memory::BufferWrapper# - #include <buffer.hpp>
Concrete implementation of BufferWrapper for a specific allocator type
This class is used to store a buffer of type Buffer<uint8_t, AllocType> in a single container. It is used to store different types of buffers in a single container.
-
class BufferWrapper#
- #include <buffer.hpp>
Abstract base class for buffer wrappers
This polymorphic design allows silent mixing of different allocator types, which can lead to memory corruption and undefined behavior. The type erasure hides critical allocator information needed for safe memory management.
Dangerous Example:
std::unique_ptr<BufferWrapper> device_wrapper = std::make_unique<BufferImpl<DeviceAlloc>>(1024); std::unique_ptr<BufferWrapper> pinned_wrapper = std::make_unique<BufferImpl<PinnedAlloc>>(1024); // Problem: Silent mixing of allocator types! device_wrapper = std::move(pinned_wrapper); // Now device_wrapper contains pinned memory but user expects device memory // Destructor will call DeviceAlloc::deallocate() on pinned memory -> UB
Recommended Alternative - Variant-Based Approach:
using buffer_variant = std::variant< Buffer<uint8_t, DeviceAlloc>, Buffer<uint8_t, PinnedAlloc> >; // Type-safe usage: buffer_variant device_buffer = Buffer<uint8_t, DeviceAlloc>(1024); buffer_variant pinned_buffer = Buffer<uint8_t, PinnedAlloc>(1024); // Compile-time error prevents dangerous mixing: // device_buffer = pinned_buffer; // ERROR: types don't match // Safe access with std::visit: std::visit([](auto& buf) { void* addr = buf.addr(); }, device_buffer);
Why Variant is Better:
Compile-time type safety prevents allocator mixing
No virtual function overhead
Clear type information preserved
std::visit provides type-safe polymorphic operations
No risk of memory corruption from allocator mismatches
If You Must Use This Class:
Never use
autowith BufferWrapper pointersAlways use explicit types:
std::unique_ptr<BufferWrapper>Document allocator types clearly in variable names
Consider runtime type checking for additional safety
Consider using std::variant<Buffer<T, Alloc>…> for type safety
Subclassed by framework::memory::BufferImpl< AllocType >
Public Functions
-
virtual ~BufferWrapper() = default#
-
BufferWrapper() = default#
Default constructor
-
BufferWrapper(const BufferWrapper&) = default#
Copy constructor
-
BufferWrapper(BufferWrapper&&) = default#
Move constructor
-
BufferWrapper &operator=(const BufferWrapper&) = default#
Copy assignment operator
- Returns:
Reference to this buffer wrapper
-
BufferWrapper &operator=(BufferWrapper&&) = default#
Move assignment operator
- Returns:
Reference to this buffer wrapper
-
virtual void *addr() = 0#
Get the address of the buffer
- Returns:
Pointer to the first element of the buffer
-
struct DeviceAlloc#
- #include <device_allocators.hpp>
Device memory allocator for CUDA GPU memory
Provides static methods for allocating and deallocating memory on the GPU device. Uses cudaMalloc/cudaFree for memory management.
Public Static Functions
-
static inline void *allocate(const std::size_t nbytes)#
Allocate memory on the GPU device
- Parameters:
nbytes – [in] Number of bytes to allocate
- Throws:
CudaRuntimeException – If allocation fails
- Returns:
Pointer to allocated device memory
-
static inline void deallocate(void *addr)#
Deallocate memory on the GPU device
- Parameters:
addr – [in] Pointer to device memory to deallocate
- Throws:
CudaRuntimeException – If deallocation fails
-
static inline void *allocate(const std::size_t nbytes)#
-
template<typename T>
struct DeviceDeleter# - #include <unique_ptr_utils.hpp>
Custom deleter for device memory allocated with CUDA
This deleter is designed to work with std::unique_ptr to provide RAII management of CUDA device memory. It automatically calls cudaFree() when the unique_ptr is destroyed.
- Template Parameters:
T – The type of the pointer being managed (can be array types)
-
struct GdrHandleDeleter#
- #include <gdrcopy_buffer.hpp>
Custom deleter for GDRCopy handle
This deleter is designed to work with std::unique_ptr to provide RAII management of GDRCopy handles. It automatically calls gdr_close() when the unique_ptr is destroyed.
The deleter takes gdr_t by value (not pointer) which allows unique_ptr to directly store the gdr_t handle without heap allocation.
Public Functions
-
inline void operator()(gdr_t handle) const noexcept#
Closes GDRCopy handle using gdr_close
Note
Best-effort cleanup; never throws exceptions from deleter
Note
gdr_close returns 0 on success, we ignore errors in destructor
- Parameters:
handle – [in] GDRCopy handle to be closed (gdr_t is struct gdr*)
-
inline void operator()(gdr_t handle) const noexcept#
-
class GpinnedBuffer#
- #include <gdrcopy_buffer.hpp>
RAII wrapper for GDRCopy pinned GPU memory.
Provides CPU-visible access to GPU memory via GDRCopy. The NIC can write directly to this memory, and the CPU can read it without GPU→CPU copies.
This is useful for:
Direct NIC-to-GPU memory access (DOCA GPUNetIO)
CPU polling of GPU-written status flags
Zero-copy data sharing between CPU and GPU
Usage with UniqueGdrHandle (recommended):
auto gdr_handle = make_unique_gdr_handle(); // RAII-managed handle GpinnedBuffer buf(gdr_handle.get(), sizeof(uint32_t)); // CPU writes to GPU memory *static_cast<uint32_t*>(buf.get_host_addr()) = 42; // Kernel reads from GPU memory uint32_t* device_ptr = static_cast<uint32_t*>(buf.get_device_addr()); kernel<<<>>>(..., device_ptr, ...); // Cleanup automatic - handle closed when gdr_handle goes out of scope
Legacy usage with raw gdr_t:
gdr_t handle = gdr_open(); GpinnedBuffer buf(handle, sizeof(uint32_t)); // ... use buffer ... gdr_close(handle); // Manual cleanup required
Public Functions
-
inline GpinnedBuffer(gdr_t gdr_handle, const std::size_t size_bytes)#
Construct a GDRCopy pinned buffer.
- Parameters:
gdr_handle – [in] Non-owning GDRCopy handle (gdr_t is already a pointer)
size_bytes – [in] Requested size in bytes (minimum GPU_MIN_PIN_SIZE)
- Throws:
std::invalid_argument – if gdr_handle is null or size_bytes is 0
std::runtime_error – if GDRCopy operations fail
-
inline ~GpinnedBuffer()#
Destructor - unmaps and unpins GDRCopy buffer.
-
GpinnedBuffer(const GpinnedBuffer&) = delete#
-
GpinnedBuffer &operator=(const GpinnedBuffer&) = delete#
-
GpinnedBuffer(GpinnedBuffer&&) = delete#
-
GpinnedBuffer &operator=(GpinnedBuffer&&) = delete#
-
inline void *get_host_addr() const#
Get host-side pointer for CPU access.
- Returns:
Host pointer (CPU-visible)
-
inline void *get_device_addr() const#
Get device-side pointer for GPU kernel access.
- Returns:
Device pointer (GPU-visible)
-
inline std::size_t get_size() const#
Get requested buffer size.
- Returns:
Original requested size in bytes
-
inline std::size_t get_size_free() const#
Get actual allocated size (page-aligned)
- Returns:
Actual pinned size in bytes
-
template<class TDstAlloc, class TSrcAlloc>
struct MemcpyHelper# Helper template struct to determine the appropriate CUDA memory copy kind based on allocator types
This template provides a type-safe way to determine the correct cudaMemcpyKind enum value based on the source and destination allocator types. It is specialized for different combinations of DeviceAlloc and PinnedAlloc allocators.
- Template Parameters:
TDstAlloc – Destination allocator type (DeviceAlloc or PinnedAlloc)
TSrcAlloc – Source allocator type (DeviceAlloc or PinnedAlloc)
-
template<>
struct MemcpyHelper<DeviceAlloc, DeviceAlloc># - #include <memcpy_helper.hpp>
Specialization for device-to-device memory copy operations
This specialization handles memory copy operations between two device memory locations. It provides the appropriate cudaMemcpyKind value for device-to-device transfers.
Public Static Attributes
-
static constexpr cudaMemcpyKind KIND = cudaMemcpyDeviceToDevice#
Memory copy kind for device-to-device transfers
-
static constexpr cudaMemcpyKind KIND = cudaMemcpyDeviceToDevice#
-
template<>
struct MemcpyHelper<DeviceAlloc, PinnedAlloc># - #include <memcpy_helper.hpp>
Specialization for host-to-device memory copy operations
This specialization handles memory copy operations from pinned host memory to device memory. It provides the appropriate cudaMemcpyKind value for host-to-device transfers.
Public Static Attributes
-
static constexpr cudaMemcpyKind KIND = cudaMemcpyHostToDevice#
Memory copy kind for host-to-device transfers.
-
static constexpr cudaMemcpyKind KIND = cudaMemcpyHostToDevice#
-
template<>
struct MemcpyHelper<PinnedAlloc, DeviceAlloc># - #include <memcpy_helper.hpp>
Specialization for device-to-host memory copy operations
This specialization handles memory copy operations from device memory to pinned host memory. It provides the appropriate cudaMemcpyKind value for device-to-host transfers.
Public Static Attributes
-
static constexpr cudaMemcpyKind KIND = cudaMemcpyDeviceToHost#
Memory copy kind for device-to-host transfers.
-
static constexpr cudaMemcpyKind KIND = cudaMemcpyDeviceToHost#
-
template<>
struct MemcpyHelper<PinnedAlloc, PinnedAlloc># - #include <memcpy_helper.hpp>
Specialization for host-to-host memory copy operations
This specialization handles memory copy operations between two pinned host memory locations. It provides the appropriate cudaMemcpyKind value for host-to-host transfers.
Public Static Attributes
-
static constexpr cudaMemcpyKind KIND = cudaMemcpyHostToHost#
Memory copy kind for host-to-host transfers.
-
static constexpr cudaMemcpyKind KIND = cudaMemcpyHostToHost#
-
template<std::uint32_t ALLOC_ALIGN_BYTES, typename TAlloc>
class MonotonicAlloc# - #include <monotonic_alloc.hpp>
A monotonic memory allocator that provides fast, aligned memory allocation from a pre-allocated buffer.
This allocator manages a contiguous block of memory and provides sub-allocations from it in a linear fashion. Once memory is allocated, it cannot be individually freed - only the entire allocator can be reset. This design makes it very fast for temporary allocations with known lifetimes.
Note: CUDA memory allocation alignment guarantees: • cudaMalloc: From the CUDA Programming Guide (§3.2 of 12.x): the returned pointer “is always at least 256-byte aligned.” • cudaHostAlloc / cudaMallocHost (pinned host memory): Guarantees alignment to at least the host page size (commonly 4 KB) and never less than 64 bytes.
- Template Parameters:
ALLOC_ALIGN_BYTES – Alignment requirement for all allocations (must be power of 2)
TAlloc – Allocator type that provides static allocate() and deallocate() methods
Public Functions
-
inline explicit MonotonicAlloc(const std::size_t bufsize)#
Constructs a monotonic allocator with the specified buffer size.
- Parameters:
bufsize – [in] Size of the buffer to allocate in bytes
- Throws:
std::bad_alloc – if the underlying allocation fails
-
inline MonotonicAlloc(MonotonicAlloc &&allocator) noexcept#
Move constructor - transfers ownership of the buffer from another allocator.
- Parameters:
allocator – [in] The allocator to move from (will be left in a valid but empty state)
- inline MonotonicAlloc &operator=(
- MonotonicAlloc &&allocator,
Move assignment operator - transfers ownership of the buffer from another allocator.
- Parameters:
allocator – [in] The allocator to move from (will be left in a valid but empty state)
- Returns:
Reference to this allocator
-
MonotonicAlloc &operator=(const MonotonicAlloc&) = delete#
-
MonotonicAlloc(const MonotonicAlloc&) = delete#
-
inline ~MonotonicAlloc()#
Destructor - deallocates the managed buffer if it exists.
-
inline void reset() noexcept#
Resets the allocator to its initial state, making all allocated memory available for reuse.
This does not deallocate the underlying buffer, just resets the allocation offset to zero. All previously returned pointers become invalid after this call.
-
inline void *allocate(const std::size_t nbytes)#
Allocates a block of memory from the linear buffer.
The returned memory is aligned according to ALLOC_ALIGN_BYTES. The allocation is performed by advancing the internal offset pointer, so this operation is very fast.
- Parameters:
nbytes – [in] Number of bytes to allocate
- Throws:
std::runtime_error – if the requested size would exceed the buffer capacity
- Returns:
Pointer to the allocated memory block
-
inline void memset(const int val, cudaStream_t strm = 0) const#
Sets the entire buffer to a specific value using CUDA memory operations.
Note
This method only works with device memory (cudaMalloc) or managed memory (cudaMallocManaged). It will not work with pinned host memory (cudaHostAlloc/ cudaMallocHost) as cudaMemsetAsync cannot operate on host memory.
- Parameters:
val – [in] The value to set each byte to
strm – [in] CUDA stream to use for the asynchronous operation (default: 0)
-
inline std::size_t size() const noexcept#
Gets the total size of the managed buffer.
- Returns:
Total buffer size in bytes
-
inline std::size_t offset() const noexcept#
Gets the current allocation offset within the buffer.
- Returns:
Current offset in bytes from the start of the buffer
-
inline void *address() const noexcept#
Gets the base address of the managed buffer.
- Returns:
Pointer to the start of the buffer, or nullptr if no buffer is allocated
-
struct PinnedAlloc#
- #include <device_allocators.hpp>
Pinned host memory allocator for CUDA
Provides static methods for allocating and deallocating pinned (page-locked) host memory. Pinned memory can be transferred to/from the GPU more efficiently than pageable memory. Uses cudaHostAlloc/cudaFreeHost for memory management.
Public Static Functions
-
static inline void *allocate(const std::size_t nbytes)#
Allocate pinned host memory
Note
No tracking of host pinned memory currently
- Parameters:
nbytes – [in] Number of bytes to allocate
- Throws:
CudaRuntimeException – If allocation fails
- Returns:
Pointer to allocated pinned host memory
-
static inline void deallocate(void *addr)#
Deallocate pinned host memory
- Parameters:
addr – [in] Pointer to pinned host memory to deallocate
- Throws:
CudaRuntimeException – If deallocation fails
-
static inline void *allocate(const std::size_t nbytes)#
-
template<typename T>
struct PinnedDeleter# - #include <unique_ptr_utils.hpp>
Custom deleter for pinned host memory allocated with CUDA
This deleter is designed to work with std::unique_ptr to provide RAII management of CUDA pinned host memory. It automatically calls cudaFreeHost() when the unique_ptr is destroyed.
- Template Parameters:
T – The type of the pointer being managed (can be array types)