tritonclient.utils.cuda_shared_memory#

Functions

`allocated_shared_memory_regions`()	Return all cuda shared memory regions that were allocated but not freed.
`as_shared_memory_tensor`(cuda_shm_handle, ...)
`create_shared_memory_region`(triton_shm_name, ...)	Creates a shared memory region with the specified name and size.
`destroy_shared_memory_region`(cuda_shm_handle)	Close a cuda shared memory region with the specified handle.
`get_contents_as_numpy`(cuda_shm_handle, ...)	Generates a numpy array using the data stored in the cuda shared memory region specified with the handle.
`get_raw_handle`(cuda_shm_handle)	Returns the underlying raw serialized cudaIPC handle in base64 encoding.
`set_shared_memory_region`(cuda_shm_handle, ...)	Copy the contents of the numpy array into the cuda shared memory region.
`set_shared_memory_region_from_dlpack`(...)

tritonclient.utils.cuda_shared_memory._get_or_create_global_cuda_stream(device_id)#

tritonclient.utils.cuda_shared_memory._is_device_supported(device: DLDevice)#

tritonclient.utils.cuda_shared_memory._support_uva(shm_device_id, ext_device_id)#

tritonclient.utils.cuda_shared_memory.allocated_shared_memory_regions()#

Return all cuda shared memory regions that were allocated but not freed.

Returns:: The list of cuda shared memory handles corresponding to the allocated regions.
Return type:: list

tritonclient.utils.cuda_shared_memory.as_shared_memory_tensor(cuda_shm_handle, datatype, shape)#

tritonclient.utils.cuda_shared_memory.create_shared_memory_region(triton_shm_name, byte_size, device_id)#

Creates a shared memory region with the specified name and size.

Parameters:

triton_shm_name (str) – The unique name of the cuda shared memory region to be created.
byte_size (int) – The size in bytes of the cuda shared memory region to be created.
device_id (int) – The GPU device ID of the cuda shared memory region to be created.

Returns:

cuda_shm_handle – The handle for the cuda shared memory region.

Return type:

CudaSharedMemoryRegion

Raises:

CudaSharedMemoryException – If unable to create the cuda shared memory region on the specified device.

tritonclient.utils.cuda_shared_memory.destroy_shared_memory_region(cuda_shm_handle)#

Close a cuda shared memory region with the specified handle.

Parameters:: cuda_shm_handle (CudaSharedMemoryRegion) – The handle for the cuda shared memory region.
Raises:: CudaSharedMemoryException – If unable to close the cuda shared memory region and free the device memory.

tritonclient.utils.cuda_shared_memory.get_contents_as_numpy(cuda_shm_handle, datatype, shape)#

Generates a numpy array using the data stored in the cuda shared memory region specified with the handle.

Parameters:

cuda_shm_handle (CudaSharedMemoryRegion) – The handle for the cuda shared memory region.
datatype (np.dtype) – The datatype of the array to be returned.
shape (list) – The list of int describing the shape of the array to be returned.

Returns:

The numpy array generated using contents from the specified shared memory region.

Return type:

np.array

tritonclient.utils.cuda_shared_memory.get_raw_handle(cuda_shm_handle)#

Returns the underlying raw serialized cudaIPC handle in base64 encoding.

Parameters:: cuda_shm_handle (CudaSharedMemoryRegion) – The handle for the cuda shared memory region.
Returns:: The raw serialized cudaIPC handle of underlying cuda shared memory in base64 encoding.
Return type:: bytes

tritonclient.utils.cuda_shared_memory.set_shared_memory_region(cuda_shm_handle, input_values)#

Copy the contents of the numpy array into the cuda shared memory region.

Parameters:

cuda_shm_handle (CudaSharedMemoryRegion) – The handle for the cuda shared memory region.
input_values (list) – The list of numpy arrays to be copied into the shared memory region.

Raises:

CudaSharedMemoryException – If unable to set values in the cuda shared memory region.

tritonclient.utils.cuda_shared_memory.set_shared_memory_region_from_dlpack(cuda_shm_handle, input_values)#