Generic interface for CUDA device memory allocation. More...

#include <nvneural/CudaTypes.h>

Inheritance diagram for nvneural::ICudaMemoryAllocator:

Public Member Functions
virtual NeuralResult	allocateMemoryBlock (MemoryHandle *pHandleOut, std::size_t byteCount, MemorySemantic semantic) noexcept=0
	Allocates a new memory block and returns a handle to it. More...

virtual NeuralResult	compactMemory () noexcept=0
	Signals the allocator to release unused memory blocks back to the system. More...

virtual NeuralResult	freeMemoryBlock (MemoryHandle handle) noexcept=0
	Frees a memory block. More...

virtual void *	getAddressForMemoryBlock (MemoryHandle handle) const noexcept=0
	Converts a memory handle to a GPU virtual address. More...

virtual std::size_t	getSizeForMemoryBlock (MemoryHandle handle) const noexcept=0
	Returns the buffer size associated with a memory handle. More...

virtual NeuralResult	lockMemoryBlock (MemoryHandle handle) noexcept=0
	Adds a lock to a preexisting memory block. More...

virtual NeuralResult	unlockMemoryBlock (MemoryHandle handle) noexcept=0
	Removes a lock from a preexisting memory block. More...

Public Member Functions inherited from nvneural::IRefObject
virtual RefCount	addRef () const noexcept=0
	Increments the object's reference count. More...

virtual const void *	queryInterface (TypeId interface) const noexcept=0
	This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.

virtual void *	queryInterface (TypeId interface) noexcept=0
	Retrieves a new object interface pointer. More...

virtual RefCount	release () const noexcept=0
	Decrements the object's reference count and destroy the object if the reference count reaches zero. More...

Static Public Attributes
static const IRefObject::TypeId	typeID = 0x121e6098096e5c97ul
	Interface TypeId for InterfaceOf purposes.

Static Public Attributes inherited from nvneural::IRefObject
static const TypeId	typeID = 0x14ecc3f9de638e1dul
	Interface TypeId for InterfaceOf purposes.

Additional Inherited Members
Public Types inherited from nvneural::IRefObject
using	RefCount = std::uint32_t
	Typedef used to track the number of active references to an object.

using	TypeId = std::uint64_t
	Every interface must define a unique TypeId. This should be randomized.

Protected Member Functions inherited from nvneural::IRefObject
virtual	~IRefObject ()=default
	A protected destructor prevents accidental stack-allocation of IRefObjects or use with other smart pointer classes like std::unique_ptr.

Detailed Description

Generic interface for CUDA device memory allocation.

NvNeural provides StandardCudaAllocator as a default implementation, but applications may wish to customize their memory allocation policies.

Alternately, applications inferencing multiple networks in round-robin fashion may wish to use a common pool of device memory to avoid leaving blocks free that other networks could plausibly consume during inference.

For this scenario, you can use a single allocator object but pass it to each backend using INetworkBackendCuda::setAllocator.

Member Function Documentation

◆ allocateMemoryBlock()

virtual NeuralResult nvneural::ICudaMemoryAllocator::allocateMemoryBlock	(	MemoryHandle *	pHandleOut,
		std::size_t	byteCount,
		MemorySemantic	semantic
	)

pure virtualnoexcept

Allocates a new memory block and returns a handle to it.

The returned memory has a lock count of 1; see lockMemoryBlock for details.

Parameters

pHandleOut	Variable receiving a new MemoryHandle object
byteCount	Size of buffer to allocate
semantic	Description of how the buffer will be used

◆ compactMemory()

virtual NeuralResult nvneural::ICudaMemoryAllocator::compactMemory ( )

pure virtualnoexcept

Signals the allocator to release unused memory blocks back to the system.

Existing MemoryHandles must not be invalidated by this operation; a cudaMalloc-based allocator should return unused blocks to the system with cudaFree, but (for example) an arena allocator should not clear its entire memory region. That would invalidate existing handles inside the region. Such an action should be in an allocator-type- specific interface.

The compactMemory function is not called by NvNeural; host applications should compact at times convenient to them.

◆ freeMemoryBlock()

virtual NeuralResult nvneural::ICudaMemoryAllocator::freeMemoryBlock ( MemoryHandle handle )

pure virtualnoexcept

Frees a memory block.

It is an error to free memory that has multiple outstanding locks. Freeing the null block is explicitly permitted and does nothing.

Parameters

handle Memory handle to free.

◆ getAddressForMemoryBlock()

virtual void* nvneural::ICudaMemoryAllocator::getAddressForMemoryBlock ( MemoryHandle handle ) const

pure virtualnoexcept

Converts a memory handle to a GPU virtual address.

Parameters

handle Handle to dereference

Returns: CUDA device pointer holding the block address. Can be used as a CUdeviceptr.

◆ getSizeForMemoryBlock()

virtual std::size_t nvneural::ICudaMemoryAllocator::getSizeForMemoryBlock ( MemoryHandle handle ) const

pure virtualnoexcept

Returns the buffer size associated with a memory handle.

Note that allocators are allowed to over-allocate buffers and recycle free blocks. It is not guaranteed that the value passed to allocateMemoryBlock is the same value returned by this function, though the return value from this function is guaranteed to be no smaller than the buffer size passed to allocateMemoryBlock.

Parameters

handle Handle to query

Returns: Size of the associated block in bytes, or 0 if handle is not valid

◆ lockMemoryBlock()

virtual NeuralResult nvneural::ICudaMemoryAllocator::lockMemoryBlock ( MemoryHandle handle )

pure virtualnoexcept

Adds a lock to a preexisting memory block.

Locking a memory block is treated similarly to adding a reference to an IRefObject, and allows for shared ownership of memory blocks where a single-ownership unique_ptr approach (one specific owner calls freeMemoryBlock) is impractical or would result in a separate reference table.

The typical scenario for memory locking is tensor memory inside INetwork; layer tensors may be consumed by multiple dependents and so the easiest way to ensure proper lifetime management is to add extra locks to the handle representing the layer's output tensor.

Parameters

handle Handle to lock

◆ unlockMemoryBlock()

virtual NeuralResult nvneural::ICudaMemoryAllocator::unlockMemoryBlock ( MemoryHandle handle )

pure virtualnoexcept

Removes a lock from a preexisting memory block.

Unlocking a memory block is treated similarly to releasing an IRefObject reference, and in fact can automatically free a memory block if the last reference is released.

Developers are encouraged to use either a single-ownership allocate/free model or a shared-ownership allocate/lock/.../unlock model.

Parameters

handle Handle to unlock

The documentation for this class was generated from the following file:

Core/Inc/nvneural/CudaTypes.h

Public Member Functions

Static Public Attributes

Additional Inherited Members

Detailed Description

Member Function Documentation

◆ allocateMemoryBlock()

◆ compactMemory()

◆ freeMemoryBlock()

◆ getAddressForMemoryBlock()

◆ getSizeForMemoryBlock()

◆ lockMemoryBlock()

◆ unlockMemoryBlock()