Holoscan SDK v4.0.0

Class MatXAllocator

class MatXAllocator

Wrap a holoscan::Allocator for use with MatX’s custom allocator interface.

MatX (v0.9.3+) detects custom allocators via SFINAE: any type providing allocate(size_t) -> void* and deallocate(void*, size_t) -> void is accepted. This class bridges the holoscan::Allocator API to satisfy that interface.

The adapter supports stream-aware allocation when the underlying allocator is a CudaAllocator (e.g., RMMAllocator, StreamOrderedAllocator). For non-CudaAllocator types (e.g., BlockMemoryPool), synchronous allocation is used, but stream-aware deallocation is still leveraged via the GXF-level free(ptr, stream) when a stream is bound.

Allocator behavior matrix (behavior depends on whether a CUDA stream is passed to MatXAllocator, not on the allocator class itself):

Allocator

Stream

Allocation

Deallocation

RMMAllocator / StreamOrderedAllocator Yes Async Async
RMMAllocator / StreamOrderedAllocator No Sync Sync
BlockMemoryPool Yes Sync Deferred (event)
BlockMemoryPool No Sync Sync
UnboundedAllocator Any Sync Sync

“Stream” = whether a non-null cudaStream_t is passed to MatXAllocator.

Example usage:

Copy
Copied!
            

// Inside an operator's compute() method, where allocator_ is a // Parameter<std::shared_ptr<Allocator>> registered in setup(): holoscan::MatXAllocator matx_alloc(allocator_.get(), cuda_stream); auto tensor = matx::make_tensor<float>({1024, 1024}, matx_alloc); // Direct construction via MetaParameter's implicit conversion also works: holoscan::MatXAllocator matx_alloc2(allocator_, cuda_stream);

Since

4.0.0

Note

Rows 1-2 refer to the same allocator type; the distinction is whether MatXAllocator is constructed with a non-null stream.

Note

“Sync” in the Allocation/Deallocation columns means not stream-ordered (no cudaMallocAsync/cudaFreeAsync). It does NOT mean that each allocation forces a GPU sync. For BlockMemoryPool, allocation from the preallocated pool is CPU bookkeeping only (mutex + stack).

Note

This class does NOT own the Allocator. The caller must ensure the Allocator outlives the MatXAllocator and any tensors allocated through it.

Note

Async allocation (CudaAllocator + stream) only supports device memory (MemoryStorageType::kDevice). Constructing with a non-kDevice storage type and a CudaAllocator + stream throws std::invalid_argument.

Note

MatX’s make_tensor with a custom allocator does NOT accept a CUDA stream parameter. To enable stream-ordered allocation, bind the stream when constructing the MatXAllocator. Use with_stream() to create allocators for different streams without reconstructing from scratch.

Public Functions

inline explicit MatXAllocator(Allocator *allocator, MemoryStorageType storage_type = MemoryStorageType::kDevice, cudaStream_t stream = nullptr)

Construct a MatXAllocator with full control over memory type and stream.

Parameters
  • allocator – Pointer to a holoscan::Allocator (must not be null).

  • storage_type – Memory type for allocations (default: kDevice). When using a CudaAllocator with a stream, only kDevice is supported.

  • stream – Optional CUDA stream for async allocation/deallocation. When non-null and the allocator supports it, async APIs are used.

Throws
  • std::invalid_argument – if allocator is null.

  • std::invalid_argument – if a CudaAllocator + stream is used with a non-kDevice storage type (async allocation is device-only).

inline MatXAllocator(Allocator *allocator, cudaStream_t stream)

Construct with device memory and a CUDA stream (convenience overload).

Equivalent to MatXAllocator(allocator, MemoryStorageType::kDevice, stream).

Parameters
  • allocator – Pointer to a holoscan::Allocator (must not be null).

  • stream – CUDA stream for async allocation/deallocation.

inline explicit MatXAllocator(const std::shared_ptr<Allocator> &allocator, MemoryStorageType storage_type = MemoryStorageType::kDevice, cudaStream_t stream = nullptr)

Construct from a std::shared_ptr<Allocator>.

Extract the raw pointer via shared_ptr::get() and delegate to the Allocator* constructor. The MatXAllocator does NOT retain or extend the lifetime of the shared_ptr — only the raw pointer is stored. The caller must ensure the Allocator outlives the MatXAllocator.

This overload enables ergonomic construction from Parameter<std::shared_ptr<Allocator>>:

Copy
Copied!
            

holoscan::MatXAllocator alloc(allocator_.get()); // shared_ptr holoscan::MatXAllocator alloc(allocator_); // implicit conversion

Parameters
  • allocator – Shared pointer to a holoscan::Allocator (must not be null).

  • storage_type – Memory type for allocations (default: kDevice).

  • stream – Optional CUDA stream for async operations.

Throws
  • std::invalid_argument – if the underlying pointer is null.

  • std::invalid_argument – if a CudaAllocator + stream is used with a non-kDevice storage type.

inline MatXAllocator(const std::shared_ptr<Allocator> &allocator, cudaStream_t stream)

Construct from a std::shared_ptr<Allocator> with a CUDA stream (convenience overload).

Equivalent to MatXAllocator(allocator.get(), MemoryStorageType::kDevice, stream).

Parameters
  • allocator – Shared pointer to a holoscan::Allocator (must not be null).

  • stream – CUDA stream for async allocation/deallocation.

MatXAllocator(const MatXAllocator&) = default
MatXAllocator &operator=(const MatXAllocator&) = default
MatXAllocator(MatXAllocator&&) = default
MatXAllocator &operator=(MatXAllocator&&) = default
inline void *allocate(size_t size)

Allocate memory (satisfy MatX’s allocator interface).

Dispatch strategy:

  1. If size is 0, return nullptr (no allocation needed).

  2. If the underlying allocator is a CudaAllocator and a stream is bound, use allocate_async(size, stream).

  3. Otherwise, use allocate(size, storage_type) (synchronous).

Parameters

size – Number of bytes to allocate. Zero returns nullptr without error.

Throws

std::bad_alloc – if allocation fails.

Returns

Pointer to allocated memory.

inline void deallocate(void *ptr, size_t size) noexcept

Deallocate memory (satisfy MatX’s allocator interface).

This method is noexcept to be safe when called from destructors (e.g., when a MatX tensor is destroyed). Any internal errors are logged but not propagated.

Dispatch strategy:

  1. If the underlying allocator is a CudaAllocator and a stream is bound, call the GXF-level free_async(ptr, stream) for status checking, with fallback to synchronous free(ptr) on failure.

  2. Else if a stream is bound (e.g., BlockMemoryPool), call GXF-level free(ptr, stream) for stream-aware deferred deallocation. Fall back to synchronous free(ptr) if unavailable.

  3. Otherwise, use free(ptr) (synchronous).

Note

The fallback from stream-aware free to synchronous free is safe for all current Holoscan allocators: BlockMemoryPool uses CUDA-event-based deferred free and UnboundedAllocator’s default free_abi(ptr, stream) delegates to free_abi(ptr). If a future allocator differentiates these, revisit this logic.

Note

The underlying allocator’s free() implementation must not throw. If it does, the exception is caught and logged but the pointer may be leaked. All current Holoscan allocators satisfy this requirement.

Note

If the GXF allocator handle is unavailable, deallocation falls back to allocator_->free() as a best-effort path. In this case, status cannot be queried from GXF.

Parameters
  • ptr – Pointer to memory to deallocate (null is a no-op).

  • size – Size of allocation (required by MatX interface, unused).

inline Allocator *allocator() const noexcept

Return the underlying Holoscan allocator.

inline MemoryStorageType storage_type() const noexcept

Return the configured memory storage type.

inline cudaStream_t stream() const noexcept

Return the bound CUDA stream (nullptr if none).

inline MatXAllocator with_stream(cudaStream_t stream) const

Create a copy of this allocator bound to a different CUDA stream.

Return a new MatXAllocator sharing the same underlying Allocator and storage type, but associated with a different stream. Useful in multi-stream pipelines where the same allocator serves multiple streams.

Parameters

stream – The CUDA stream to bind to the new allocator.

Throws

std::invalid_argument – if the new stream + storage_type combination is invalid (see primary constructor).

Returns

A new MatXAllocator bound to the given stream.

Previous Template Class Map
Next Class MemoryAvailableCondition
© Copyright 2022-2026, NVIDIA. Last updated on Mar 9, 2026