GPU Allocator
AllocatorFlag
- tensorrt.AllocatorFlag
Members:
RESIZABLE : TensorRT may call realloc() on this allocation
IGpuAllocator
- class tensorrt.IGpuAllocator(self: tensorrt.tensorrt.IGpuAllocator) None
Application-implemented class for controlling allocation on the GPU.
To implement a custom allocator, ensure that you explicitly instantiate the base class in
__init__()
:class MyAllocator(trt.IGpuAllocator): def __init__(self): trt.IGpuAllocator.__init__(self) ...
Note that all methods below (allocate, reallocate, deallocate, allocate_async, deallocate_async) must be overridden in the custom allocator, or else pybind11 would not be able to call the method from a custom allocator.
- __init__(self: tensorrt.tensorrt.IGpuAllocator) None
- allocate(self: tensorrt.tensorrt.IGpuAllocator, size: int, alignment: int, flags: int) capsule
[DEPRECATED] Deprecated in TensorRT 10.0. Please use allocate_async instead. A callback implemented by the application to handle acquisition of GPU memory. If an allocation request of size 0 is made,
None
should be returned.If an allocation request cannot be satisfied,
None
should be returned.- Parameters
size – The size of the memory required.
alignment – The required alignment of memory. Alignment will be zero or a power of 2 not exceeding the alignment guaranteed by cudaMalloc. Thus this allocator can be safely implemented with cudaMalloc/cudaFree. An alignment value of zero indicates any alignment is acceptable.
flags – Allocation flags. See
AllocatorFlag
- Returns
The address of the allocated memory
- allocate_async(self: tensorrt.tensorrt.IGpuAllocator, size: int, alignment: int, flags: int, stream: int) capsule
A callback implemented by the application to handle acquisition of GPU memory asynchronously. This is just a wrapper around a syncronous method allocate. For the asynchronous allocation please use the corresponding IGpuAsyncAllocator class. If an allocation request of size 0 is made,
None
should be returned.If an allocation request cannot be satisfied,
None
should be returned.- Parameters
size – The size of the memory required.
alignment – The required alignment of memory. Alignment will be zero or a power of 2 not exceeding the alignment guaranteed by cudaMalloc. Thus this allocator can be safely implemented with cudaMalloc/cudaFree. An alignment value of zero indicates any alignment is acceptable.
flags – Allocation flags. See
AllocatorFlag
stream – CUDA stream
- Returns
The address of the allocated memory
- deallocate(self: tensorrt.tensorrt.IGpuAllocator, memory: capsule) bool
[DEPRECATED] Deprecated in TensorRT 10.0. Please use dealocate_async instead; A callback implemented by the application to handle release of GPU memory.
TensorRT may pass a 0 to this function if it was previously returned by
allocate()
.- Parameters
memory – The memory address of the memory to release.
- Returns
True if the acquired memory is released successfully.
- deallocate_async(self: tensorrt.tensorrt.IGpuAllocator, memory: capsule, stream: int) bool
A callback implemented by the application to handle release of GPU memory asynchronously. This is just a wrapper around a syncronous method deallocate. For the asynchronous deallocation please use the corresponding IGpuAsyncAllocator class.
TensorRT may pass a 0 to this function if it was previously returned by
allocate()
.- Parameters
memory – The memory address of the memory to release.
stream – CUDA stream
- Returns
True if the acquired memory is released successfully.
- reallocate(self: tensorrt.tensorrt.IGpuAllocator, address: capsule, alignment: int, new_size: int) capsule
A callback implemented by the application to resize an existing allocation.
Only allocations which were allocated with AllocatorFlag.RESIZABLE will be resized.
Options are one of: - resize in place leaving min(old_size, new_size) bytes unchanged and return the original address - move min(old_size, new_size) bytes to a new location of sufficient size and return its address - return None, to indicate that the request could not be fulfilled.
If None is returned, TensorRT will assume that resize() is not implemented, and that the allocation at address is still valid.
This method is made available for use cases where delegating the resize strategy to the application provides an opportunity to improve memory management. One possible implementation is to allocate a large virtual device buffer and progressively commit physical memory with cuMemMap. CU_MEM_ALLOC_GRANULARITY_RECOMMENDED is suggested in this case.
TensorRT may call realloc to increase the buffer by relatively small amounts.
- Parameters
address – the address of the original allocation.
alignment – The alignment used by the original allocation.
new_size – The new memory size required.
- Returns
The address of the reallocated memory