Memory#

Memory management and buffer allocation utilities for CPU and GPU.

Overview#

The Memory library provides efficient memory management abstractions for CUDA-enabled applications. It simplifies memory allocation, buffer management, and data transfers between CPU and GPU with RAII semantics.

Key Features#

  • Type-Safe Buffers: Template-based buffer management with automatic cleanup

  • Multiple Allocators: Device (GPU), pinned host, and monotonic allocators

  • Smart Pointers: CUDA-aware unique_ptr utilities with custom deleters

  • Zero-Copy Support: Efficient pinned memory for CPU-GPU transfers

  • Fast Sequential Allocation: Monotonic allocators for deterministic performance

Core Concepts#

Buffers#

Buffer is a RAII wrapper for memory allocations that handles cleanup automatically. Buffers can use different allocators to control where memory is allocated (device, pinned host, etc.).

Buffers support:

  • Automatic memory deallocation on destruction

  • Copy construction between different memory spaces

  • Move semantics for zero-cost ownership transfer

Device Memory Buffer#

// Allocate buffer on GPU device
Buffer<float, DeviceAlloc> device_buffer(1024);

// Query buffer properties
const auto size = device_buffer.size();
auto *const addr = device_buffer.addr();

Pinned Host Memory Buffer#

// Allocate pinned host buffer for efficient CPU-GPU transfers
Buffer<int32_t, PinnedAlloc> pinned_buffer(512);

// Access data on host
pinned_buffer.addr()[0] = 42;
pinned_buffer.addr()[1] = 100;

const auto first_value = pinned_buffer.addr()[0];
const auto second_value = pinned_buffer.addr()[1];

Copying Between Memory Spaces#

// Create pinned buffer and initialize data
Buffer<float, PinnedAlloc> host_buffer(256);
host_buffer.addr()[0] = 3.14F;

// Copy data to device buffer
const Buffer<float, DeviceAlloc> device_buffer(host_buffer);

Move Semantics#

// Create buffer
Buffer<double, DeviceAlloc> buffer1(128);
auto *const original_addr = buffer1.addr();

// Transfer ownership via move
Buffer<double, DeviceAlloc> buffer2(std::move(buffer1));

// buffer1 is now empty, buffer2 owns the memory
auto *const moved_addr = buffer2.addr();

Allocators#

DeviceAlloc and PinnedAlloc are allocator types that provide static methods for memory allocation and deallocation. They can be used directly or with Buffers.

  • DeviceAlloc: Allocates memory on the GPU using cudaMalloc/cudaFree

  • PinnedAlloc: Allocates pinned host memory using cudaHostAlloc/cudaFreeHost

Direct Allocator Usage#

// Direct allocator usage
void *device_mem = DeviceAlloc::allocate(1024);
void *pinned_mem = PinnedAlloc::allocate(2048);

// Manual deallocation
DeviceAlloc::deallocate(device_mem);
PinnedAlloc::deallocate(pinned_mem);

Smart Pointers#

The library provides CUDA-aware smart pointer utilities that integrate with std::unique_ptr for automatic memory management.

Device Memory Smart Pointer#

// Allocate device memory with automatic cleanup
auto device_ptr = make_unique_device<float>(1024);

// Memory is automatically freed when device_ptr goes out of scope
auto *const ptr_value = device_ptr.get();

Pinned Memory Smart Pointer#

// Allocate pinned host memory with automatic cleanup
auto pinned_ptr = make_unique_pinned<int32_t>(2048);

// Access memory on host
pinned_ptr.get()[0] = 123;
const auto value = pinned_ptr.get()[0];

Monotonic Allocator#

MonotonicAlloc provides fast, sequential memory allocation from a pre-allocated buffer. This allocator is ideal for temporary allocations with known lifetimes, as it provides:

  • Very fast allocation (just incrementing an offset)

  • Guaranteed alignment for all allocations

  • Bulk deallocation through reset()

  • No per-allocation overhead

Basic Usage#

// Create monotonic allocator with 4KB buffer on device
constexpr std::uint32_t ALIGNMENT = 256;
MonotonicAlloc<ALIGNMENT, DeviceAlloc> allocator(4096);

// Perform fast sequential allocations
void *block1 = allocator.allocate(512);
void *block2 = allocator.allocate(256);

// Check current offset (bytes used)
const auto offset = allocator.offset();
const auto total_size = allocator.size();

Resetting for Reuse#

constexpr std::uint32_t ALIGNMENT = 64;
MonotonicAlloc<ALIGNMENT, PinnedAlloc> allocator(2048);

// Allocate some memory
void *block1 = allocator.allocate(1000);
const auto offset_before = allocator.offset();

// Reset allocator to reuse memory
allocator.reset();
const auto offset_after = allocator.offset();

// Memory is available for reuse
void *block2 = allocator.allocate(1000);

Additional Examples#

For complete working examples with full setup and validation, see:

  • framework/memory/tests/memory_sample_tests.cpp - Documentation examples and basic usage patterns

API Reference#

using framework::memory::UniqueGdrHandle = std::unique_ptr<struct gdr, GdrHandleDeleter>#

Type alias for std::unique_ptr with GDRCopy handle deleter

This provides automatic RAII management of GDRCopy handles. The handle is automatically closed when the unique_ptr goes out of scope.

Note: We use a custom deleter that takes gdr_t by value, allowing unique_ptr<struct gdr, Deleter> to directly store the gdr_t handle without heap allocation. The struct gdr is opaque but that’s fine since we never dereference it - we only pass the pointer to gdr_close().

Usage:

auto gdr_handle = make_unique_gdr_handle();
// Use gdr_handle.get() to get the raw gdr_t (struct gdr*)
// Automatically closed when gdr_handle goes out of scope

template<typename T>
using framework::memory::UniqueDevicePtr = std::unique_ptr<T, DeviceDeleter<T>>#

Type alias for std::unique_ptr with device memory deleter

This provides automatic RAII management of CUDA device memory. The memory is automatically freed when the unique_ptr goes out of scope.

Template Parameters:

T – The type of the pointer being managed

template<typename T>
using framework::memory::UniquePinnedPtr = std::unique_ptr<T, PinnedDeleter<T>>#

Type alias for std::unique_ptr with pinned host memory deleter

This provides automatic RAII management of CUDA pinned host memory. The memory is automatically freed when the unique_ptr goes out of scope.

Template Parameters:

T – The type of the pointer being managed

constexpr std::size_t framework::memory::GPU_MIN_PIN_SIZE = GPU_PAGE_SIZE#

Minimum GDRCopy pin size in bytes (64KB page alignment requirement) Note: GPU_PAGE_SIZE and GPU_PAGE_MASK are defined in gdrapi.h

inline UniqueGdrHandle framework::memory::make_unique_gdr_handle()#

Creates a unique_ptr managing a GDRCopy handle

Opens a GDRCopy handle using gdr_open() and wraps it in a unique_ptr with automatic cleanup.

Throws:

std::runtime_error – if gdr_open() fails

Returns:

UniqueGdrHandle managing the opened handle

template<typename T>
UniqueDevicePtr<T> framework::memory::make_unique_device(
const std::size_t count = 1,
)#

Creates a unique_ptr managing device memory

Allocates device memory using DeviceAlloc and wraps it in a unique_ptr with automatic cleanup. (Note: cudaMalloc leaves device memory uninitialised.)

Template Parameters:

T – The type of elements to allocate

Parameters:

count[in] Number of elements to allocate (default: 1)

Throws:
  • CudaRuntimeException – if device allocation fails

  • std::overflow_error – if size calculation overflows

Returns:

UniqueDevicePtr managing the allocated memory

template<typename T>
UniquePinnedPtr<T> framework::memory::make_unique_pinned(
const std::size_t count = 1,
)#

Creates a unique_ptr managing pinned host memory

Allocates pinned host memory using PinnedAlloc and wraps it in a unique_ptr with automatic cleanup.

Template Parameters:

T – The type of elements to allocate

Parameters:

count[in] Number of elements to allocate (default: 1)

Throws:
  • CudaRuntimeException – if pinned allocation fails

  • std::overflow_error – if size calculation overflows

Returns:

UniquePinnedPtr managing the allocated memory

template<typename T, class TAlloc>
class Buffer#
#include <buffer.hpp>

Generic buffer class for managing memory allocations with different allocator types

This class provides a RAII wrapper for memory buffers that can be allocated using different allocators (host, device, pinned, etc.). It supports copy and move semantics with automatic memory management and CUDA-aware memory transfers.

Template Parameters:
  • T – The element type stored in the buffer

  • TAlloc – The allocator type used for memory allocation/deallocation

Public Types

using ElementType = T#

Element type stored in the buffer.

using AllocatorType = TAlloc#

Allocator type used for memory management.

Public Functions

Buffer() = default#

Default constructor creates an empty buffer

inline explicit Buffer(const std::size_t num_elements)#

Constructor that allocates memory for the specified number of elements

Parameters:

num_elements[in] Number of elements to allocate space for

Throws:

std::bad_alloc – if allocation fails

inline ~Buffer()#

Destructor deallocates the buffer memory

inline void deallocate_buffer()#

Manually deallocate the buffer memory and reset internal state

This method can be called multiple times safely. After calling this method, the buffer will be in an empty state.

template<class TAlloc2>
inline explicit Buffer(
const Buffer<T, TAlloc2> &other,
)#

Cross-allocator copy constructor

Creates a new buffer by copying data from another buffer with a different allocator type. Uses CUDA memory copy operations to transfer data between different memory spaces.

Template Parameters:

TAlloc2 – The allocator type of the source buffer

Parameters:

other[in] The source buffer to copy from

Throws:

utils::CudaRuntimeException – if CUDA memory copy fails

inline Buffer(const Buffer &other)#

Copy constructor

Creates a new buffer by copying data from another buffer with the same allocator type.

Parameters:

other[in] The source buffer to copy from

Throws:

utils::CudaRuntimeException – if CUDA memory copy fails

inline Buffer &operator=(const Buffer &other)#

Copy assignment operator

Copies data from another buffer with the same allocator type. Uses copy-and-swap idiom for strong exception safety.

Parameters:

other[in] The source buffer to copy from

Throws:

utils::CudaRuntimeException – if CUDA memory copy fails

Returns:

Reference to this buffer

inline Buffer(Buffer &&other) noexcept#

Move constructor

Transfers ownership of the buffer from another buffer, leaving the source buffer empty.

Parameters:

other[inout] The source buffer to move from (will be left empty)

template<class TAlloc2>
explicit Buffer(
Buffer<T, TAlloc2> &&other,
) = delete#

Cross-allocator move constructor (deleted)

Cross-allocator moves are prohibited because memory allocated with one allocator must be deallocated with the same allocator. Moving memory between allocators would violate this contract and lead to undefined behavior.

For cross-allocator operations, use copy semantics instead:

  • Buffer<T, DestAlloc> dest(source); // Copy constructor

Template Parameters:

TAlloc2 – The allocator type of the source buffer

Parameters:

other[inout] The source buffer (this operation is not allowed)

inline explicit Buffer(const std::vector<T> &src_vec)#

Constructor from std::vector

Creates a buffer by copying data from a std::vector using CUDA memory operations.

Parameters:

src_vec[in] The source vector to copy data from

Throws:

utils::CudaRuntimeException – if CUDA memory copy fails

inline Buffer &operator=(Buffer &&other) noexcept#

Move assignment operator

Transfers ownership of the buffer from another buffer, properly deallocating any existing memory in this buffer first.

Parameters:

other[inout] The source buffer to move from (will be left empty)

Returns:

Reference to this buffer

inline ElementType *addr() noexcept#

Get mutable pointer to the buffer memory

Returns:

Pointer to the first element of the buffer, or nullptr if empty

inline const ElementType *addr() const noexcept#

Get const pointer to the buffer memory

Returns:

Const pointer to the first element of the buffer, or nullptr if empty

inline std::size_t size() const noexcept#

Get the number of elements in the buffer

Returns:

Number of elements the buffer can hold

inline ElementType &operator[](const std::size_t idx)#

Indexed access operator for host-accessible buffers

Provides unchecked array-like access to buffer elements. Only enabled for allocators that allow host access (i.e., not device allocators).

Note

This operator is only available for host-accessible allocators

Note

No bounds checking is performed. Use at() for bounds-checked access

Parameters:

idx[in] Index of the element to access

Returns:

Reference to the element at the specified index

Pre:

idx < size()

inline const ElementType &operator[](const std::size_t idx) const#

Read-only element access operator

Provides unchecked const access to elements in the buffer. Only available for allocators that allow host access (i.e., not device allocators).

Note

This operator is only available for host-accessible allocators

Note

No bounds checking is performed. Use at() for bounds-checked access

Parameters:

idx[in] Index of the element to access

Returns:

Const reference to the element at the specified index

Pre:

idx < size()

inline ElementType &at(const std::size_t idx)#

Bounds-checked element access

Provides mutable access to elements in the buffer with bounds checking. Only available for allocators that allow host access.

Note

This function is only available for host-accessible allocators

Parameters:

idx[in] Index of the element to access

Throws:

std::out_of_range – if idx >= size()

Returns:

Reference to the element at the specified index

inline const ElementType &at(const std::size_t idx) const#

Bounds-checked read-only element access

Provides const access to elements in the buffer with bounds checking. Only available for allocators that allow host access.

Note

This function is only available for host-accessible allocators

Parameters:

idx[in] Index of the element to access

Throws:

std::out_of_range – if idx >= size()

Returns:

Const reference to the element at the specified index

template<typename AllocType>
class BufferImpl : public framework::memory::BufferWrapper#
#include <buffer.hpp>

Concrete implementation of BufferWrapper for a specific allocator type

This class is used to store a buffer of type Buffer<uint8_t, AllocType> in a single container. It is used to store different types of buffers in a single container.

Public Functions

inline explicit BufferImpl(const std::size_t size)#

Constructor

Parameters:

size[in] The size of the buffer

inline virtual void *addr() override#

Get the address of the buffer

Returns:

Pointer to the first element of the buffer

class BufferWrapper#
#include <buffer.hpp>

Abstract base class for buffer wrappers

This polymorphic design allows silent mixing of different allocator types, which can lead to memory corruption and undefined behavior. The type erasure hides critical allocator information needed for safe memory management.

Dangerous Example:

std::unique_ptr<BufferWrapper> device_wrapper =
    std::make_unique<BufferImpl<DeviceAlloc>>(1024);
std::unique_ptr<BufferWrapper> pinned_wrapper =
    std::make_unique<BufferImpl<PinnedAlloc>>(1024);

// Problem: Silent mixing of allocator types!
device_wrapper = std::move(pinned_wrapper);
// Now device_wrapper contains pinned memory but user expects device memory
// Destructor will call DeviceAlloc::deallocate() on pinned memory -> UB

Recommended Alternative - Variant-Based Approach:

using buffer_variant = std::variant<
    Buffer<uint8_t, DeviceAlloc>,
    Buffer<uint8_t, PinnedAlloc>
>;

// Type-safe usage:
buffer_variant device_buffer = Buffer<uint8_t, DeviceAlloc>(1024);
buffer_variant pinned_buffer = Buffer<uint8_t, PinnedAlloc>(1024);

// Compile-time error prevents dangerous mixing:
// device_buffer = pinned_buffer;  // ERROR: types don't match

// Safe access with std::visit:
std::visit([](auto& buf) {
    void* addr = buf.addr();
}, device_buffer);

Why Variant is Better:

  • Compile-time type safety prevents allocator mixing

  • No virtual function overhead

  • Clear type information preserved

  • std::visit provides type-safe polymorphic operations

  • No risk of memory corruption from allocator mismatches

If You Must Use This Class:

  • Never use auto with BufferWrapper pointers

  • Always use explicit types: std::unique_ptr<BufferWrapper>

  • Document allocator types clearly in variable names

  • Consider runtime type checking for additional safety

Consider using std::variant<Buffer<T, Alloc>…> for type safety

Subclassed by framework::memory::BufferImpl< AllocType >

Public Functions

virtual ~BufferWrapper() = default#
BufferWrapper() = default#

Default constructor

BufferWrapper(const BufferWrapper&) = default#

Copy constructor

BufferWrapper(BufferWrapper&&) = default#

Move constructor

BufferWrapper &operator=(const BufferWrapper&) = default#

Copy assignment operator

Returns:

Reference to this buffer wrapper

BufferWrapper &operator=(BufferWrapper&&) = default#

Move assignment operator

Returns:

Reference to this buffer wrapper

virtual void *addr() = 0#

Get the address of the buffer

Returns:

Pointer to the first element of the buffer

struct DeviceAlloc#
#include <device_allocators.hpp>

Device memory allocator for CUDA GPU memory

Provides static methods for allocating and deallocating memory on the GPU device. Uses cudaMalloc/cudaFree for memory management.

Public Static Functions

static inline void *allocate(const std::size_t nbytes)#

Allocate memory on the GPU device

Parameters:

nbytes[in] Number of bytes to allocate

Throws:

CudaRuntimeException – If allocation fails

Returns:

Pointer to allocated device memory

static inline void deallocate(void *addr)#

Deallocate memory on the GPU device

Parameters:

addr[in] Pointer to device memory to deallocate

Throws:

CudaRuntimeException – If deallocation fails

template<typename T>
struct DeviceDeleter#
#include <unique_ptr_utils.hpp>

Custom deleter for device memory allocated with CUDA

This deleter is designed to work with std::unique_ptr to provide RAII management of CUDA device memory. It automatically calls cudaFree() when the unique_ptr is destroyed.

Template Parameters:

T – The type of the pointer being managed (can be array types)

Public Types

using PtrT = std::remove_all_extents_t<T>#

Pointer type after removing array extents

Public Functions

inline void operator()(PtrT *ptr) const noexcept#

Deletes device memory using cudaFree

Note

Best-effort cleanup; never throws exceptions from deleter

Parameters:

ptr[in] Pointer to device memory to be freed

struct GdrHandleDeleter#
#include <gdrcopy_buffer.hpp>

Custom deleter for GDRCopy handle

This deleter is designed to work with std::unique_ptr to provide RAII management of GDRCopy handles. It automatically calls gdr_close() when the unique_ptr is destroyed.

The deleter takes gdr_t by value (not pointer) which allows unique_ptr to directly store the gdr_t handle without heap allocation.

Public Functions

inline void operator()(gdr_t handle) const noexcept#

Closes GDRCopy handle using gdr_close

Note

Best-effort cleanup; never throws exceptions from deleter

Note

gdr_close returns 0 on success, we ignore errors in destructor

Parameters:

handle[in] GDRCopy handle to be closed (gdr_t is struct gdr*)

class GpinnedBuffer#
#include <gdrcopy_buffer.hpp>

RAII wrapper for GDRCopy pinned GPU memory.

Provides CPU-visible access to GPU memory via GDRCopy. The NIC can write directly to this memory, and the CPU can read it without GPU→CPU copies.

This is useful for:

  • Direct NIC-to-GPU memory access (DOCA GPUNetIO)

  • CPU polling of GPU-written status flags

  • Zero-copy data sharing between CPU and GPU

Usage with UniqueGdrHandle (recommended):

auto gdr_handle = make_unique_gdr_handle();  // RAII-managed handle
GpinnedBuffer buf(gdr_handle.get(), sizeof(uint32_t));

// CPU writes to GPU memory
*static_cast<uint32_t*>(buf.get_host_addr()) = 42;

// Kernel reads from GPU memory
uint32_t* device_ptr = static_cast<uint32_t*>(buf.get_device_addr());
kernel<<<>>>(..., device_ptr, ...);

// Cleanup automatic - handle closed when gdr_handle goes out of scope

Legacy usage with raw gdr_t:

gdr_t handle = gdr_open();
GpinnedBuffer buf(handle, sizeof(uint32_t));
// ... use buffer ...
gdr_close(handle);  // Manual cleanup required

Public Functions

inline GpinnedBuffer(gdr_t gdr_handle, const std::size_t size_bytes)#

Construct a GDRCopy pinned buffer.

Parameters:
  • gdr_handle[in] Non-owning GDRCopy handle (gdr_t is already a pointer)

  • size_bytes[in] Requested size in bytes (minimum GPU_MIN_PIN_SIZE)

Throws:
  • std::invalid_argument – if gdr_handle is null or size_bytes is 0

  • std::runtime_error – if GDRCopy operations fail

inline ~GpinnedBuffer()#

Destructor - unmaps and unpins GDRCopy buffer.

GpinnedBuffer(const GpinnedBuffer&) = delete#
GpinnedBuffer &operator=(const GpinnedBuffer&) = delete#
GpinnedBuffer(GpinnedBuffer&&) = delete#
GpinnedBuffer &operator=(GpinnedBuffer&&) = delete#
inline void *get_host_addr() const#

Get host-side pointer for CPU access.

Returns:

Host pointer (CPU-visible)

inline void *get_device_addr() const#

Get device-side pointer for GPU kernel access.

Returns:

Device pointer (GPU-visible)

inline std::size_t get_size() const#

Get requested buffer size.

Returns:

Original requested size in bytes

inline std::size_t get_size_free() const#

Get actual allocated size (page-aligned)

Returns:

Actual pinned size in bytes

template<class TDstAlloc, class TSrcAlloc>
struct MemcpyHelper#

Helper template struct to determine the appropriate CUDA memory copy kind based on allocator types

This template provides a type-safe way to determine the correct cudaMemcpyKind enum value based on the source and destination allocator types. It is specialized for different combinations of DeviceAlloc and PinnedAlloc allocators.

Template Parameters:
template<>
struct MemcpyHelper<DeviceAlloc, DeviceAlloc>#
#include <memcpy_helper.hpp>

Specialization for device-to-device memory copy operations

This specialization handles memory copy operations between two device memory locations. It provides the appropriate cudaMemcpyKind value for device-to-device transfers.

Public Static Attributes

static constexpr cudaMemcpyKind KIND = cudaMemcpyDeviceToDevice#

Memory copy kind for device-to-device transfers

template<>
struct MemcpyHelper<DeviceAlloc, PinnedAlloc>#
#include <memcpy_helper.hpp>

Specialization for host-to-device memory copy operations

This specialization handles memory copy operations from pinned host memory to device memory. It provides the appropriate cudaMemcpyKind value for host-to-device transfers.

Public Static Attributes

static constexpr cudaMemcpyKind KIND = cudaMemcpyHostToDevice#

Memory copy kind for host-to-device transfers.

template<>
struct MemcpyHelper<PinnedAlloc, DeviceAlloc>#
#include <memcpy_helper.hpp>

Specialization for device-to-host memory copy operations

This specialization handles memory copy operations from device memory to pinned host memory. It provides the appropriate cudaMemcpyKind value for device-to-host transfers.

Public Static Attributes

static constexpr cudaMemcpyKind KIND = cudaMemcpyDeviceToHost#

Memory copy kind for device-to-host transfers.

template<>
struct MemcpyHelper<PinnedAlloc, PinnedAlloc>#
#include <memcpy_helper.hpp>

Specialization for host-to-host memory copy operations

This specialization handles memory copy operations between two pinned host memory locations. It provides the appropriate cudaMemcpyKind value for host-to-host transfers.

Public Static Attributes

static constexpr cudaMemcpyKind KIND = cudaMemcpyHostToHost#

Memory copy kind for host-to-host transfers.

template<std::uint32_t ALLOC_ALIGN_BYTES, typename TAlloc>
class MonotonicAlloc#
#include <monotonic_alloc.hpp>

A monotonic memory allocator that provides fast, aligned memory allocation from a pre-allocated buffer.

This allocator manages a contiguous block of memory and provides sub-allocations from it in a linear fashion. Once memory is allocated, it cannot be individually freed - only the entire allocator can be reset. This design makes it very fast for temporary allocations with known lifetimes.

Note: CUDA memory allocation alignment guarantees: • cudaMalloc: From the CUDA Programming Guide (§3.2 of 12.x): the returned pointer “is always at least 256-byte aligned.” • cudaHostAlloc / cudaMallocHost (pinned host memory): Guarantees alignment to at least the host page size (commonly 4 KB) and never less than 64 bytes.

Template Parameters:
  • ALLOC_ALIGN_BYTES – Alignment requirement for all allocations (must be power of 2)

  • TAlloc – Allocator type that provides static allocate() and deallocate() methods

Public Functions

inline explicit MonotonicAlloc(const std::size_t bufsize)#

Constructs a monotonic allocator with the specified buffer size.

Parameters:

bufsize[in] Size of the buffer to allocate in bytes

Throws:

std::bad_alloc – if the underlying allocation fails

inline MonotonicAlloc(MonotonicAlloc &&allocator) noexcept#

Move constructor - transfers ownership of the buffer from another allocator.

Parameters:

allocator[in] The allocator to move from (will be left in a valid but empty state)

inline MonotonicAlloc &operator=(
MonotonicAlloc &&allocator,
) noexcept#

Move assignment operator - transfers ownership of the buffer from another allocator.

Parameters:

allocator[in] The allocator to move from (will be left in a valid but empty state)

Returns:

Reference to this allocator

MonotonicAlloc &operator=(const MonotonicAlloc&) = delete#
MonotonicAlloc(const MonotonicAlloc&) = delete#
inline ~MonotonicAlloc()#

Destructor - deallocates the managed buffer if it exists.

inline void reset() noexcept#

Resets the allocator to its initial state, making all allocated memory available for reuse.

This does not deallocate the underlying buffer, just resets the allocation offset to zero. All previously returned pointers become invalid after this call.

inline void *allocate(const std::size_t nbytes)#

Allocates a block of memory from the linear buffer.

The returned memory is aligned according to ALLOC_ALIGN_BYTES. The allocation is performed by advancing the internal offset pointer, so this operation is very fast.

Parameters:

nbytes[in] Number of bytes to allocate

Throws:

std::runtime_error – if the requested size would exceed the buffer capacity

Returns:

Pointer to the allocated memory block

inline void memset(const int val, cudaStream_t strm = 0) const#

Sets the entire buffer to a specific value using CUDA memory operations.

Note

This method only works with device memory (cudaMalloc) or managed memory (cudaMallocManaged). It will not work with pinned host memory (cudaHostAlloc/ cudaMallocHost) as cudaMemsetAsync cannot operate on host memory.

Parameters:
  • val[in] The value to set each byte to

  • strm[in] CUDA stream to use for the asynchronous operation (default: 0)

inline std::size_t size() const noexcept#

Gets the total size of the managed buffer.

Returns:

Total buffer size in bytes

inline std::size_t offset() const noexcept#

Gets the current allocation offset within the buffer.

Returns:

Current offset in bytes from the start of the buffer

inline void *address() const noexcept#

Gets the base address of the managed buffer.

Returns:

Pointer to the start of the buffer, or nullptr if no buffer is allocated

struct PinnedAlloc#
#include <device_allocators.hpp>

Pinned host memory allocator for CUDA

Provides static methods for allocating and deallocating pinned (page-locked) host memory. Pinned memory can be transferred to/from the GPU more efficiently than pageable memory. Uses cudaHostAlloc/cudaFreeHost for memory management.

Public Static Functions

static inline void *allocate(const std::size_t nbytes)#

Allocate pinned host memory

Note

No tracking of host pinned memory currently

Parameters:

nbytes[in] Number of bytes to allocate

Throws:

CudaRuntimeException – If allocation fails

Returns:

Pointer to allocated pinned host memory

static inline void deallocate(void *addr)#

Deallocate pinned host memory

Parameters:

addr[in] Pointer to pinned host memory to deallocate

Throws:

CudaRuntimeException – If deallocation fails

template<typename T>
struct PinnedDeleter#
#include <unique_ptr_utils.hpp>

Custom deleter for pinned host memory allocated with CUDA

This deleter is designed to work with std::unique_ptr to provide RAII management of CUDA pinned host memory. It automatically calls cudaFreeHost() when the unique_ptr is destroyed.

Template Parameters:

T – The type of the pointer being managed (can be array types)

Public Types

using PtrT = std::remove_all_extents_t<T>#

Pointer type after removing array extents

Public Functions

inline void operator()(PtrT *ptr) const noexcept#

Deletes pinned host memory using cudaFreeHost

Note

Best-effort cleanup; never throws exceptions from deleter

Parameters:

ptr[in] Pointer to pinned host memory to be freed