tritonclient.utils.cuda_shared_memory

tritonclient.utils.cuda_shared_memory#

Functions

allocated_shared_memory_regions()

Return all cuda shared memory regions that were allocated but not freed.

as_shared_memory_tensor(cuda_shm_handle, ...)

create_shared_memory_region(triton_shm_name, ...)

Creates a shared memory region with the specified name and size.

destroy_shared_memory_region(cuda_shm_handle)

Close a cuda shared memory region with the specified handle.

get_contents_as_numpy(cuda_shm_handle, ...)

Generates a numpy array using the data stored in the cuda shared memory region specified with the handle.

get_raw_handle(cuda_shm_handle)

Returns the underlying raw serialized cudaIPC handle in base64 encoding.

set_shared_memory_region(cuda_shm_handle, ...)

Copy the contents of the numpy array into the cuda shared memory region.

set_shared_memory_region_from_dlpack(...)

tritonclient.utils.cuda_shared_memory._get_or_create_global_cuda_stream(device_id)#
tritonclient.utils.cuda_shared_memory._is_device_supported(device: DLDevice)#
tritonclient.utils.cuda_shared_memory._support_uva(shm_device_id, ext_device_id)#
tritonclient.utils.cuda_shared_memory.allocated_shared_memory_regions()#

Return all cuda shared memory regions that were allocated but not freed.

Returns:

The list of cuda shared memory handles corresponding to the allocated regions.

Return type:

list

tritonclient.utils.cuda_shared_memory.as_shared_memory_tensor(cuda_shm_handle, datatype, shape)#
tritonclient.utils.cuda_shared_memory.create_shared_memory_region(triton_shm_name, byte_size, device_id)#

Creates a shared memory region with the specified name and size.

Parameters:
  • triton_shm_name (str) – The unique name of the cuda shared memory region to be created.

  • byte_size (int) – The size in bytes of the cuda shared memory region to be created.

  • device_id (int) – The GPU device ID of the cuda shared memory region to be created.

Returns:

cuda_shm_handle – The handle for the cuda shared memory region.

Return type:

CudaSharedMemoryRegion

Raises:

CudaSharedMemoryException – If unable to create the cuda shared memory region on the specified device.

tritonclient.utils.cuda_shared_memory.destroy_shared_memory_region(cuda_shm_handle)#

Close a cuda shared memory region with the specified handle.

Parameters:

cuda_shm_handle (CudaSharedMemoryRegion) – The handle for the cuda shared memory region.

Raises:

CudaSharedMemoryException – If unable to close the cuda shared memory region and free the device memory.

tritonclient.utils.cuda_shared_memory.get_contents_as_numpy(cuda_shm_handle, datatype, shape)#

Generates a numpy array using the data stored in the cuda shared memory region specified with the handle.

Parameters:
  • cuda_shm_handle (CudaSharedMemoryRegion) – The handle for the cuda shared memory region.

  • datatype (np.dtype) – The datatype of the array to be returned.

  • shape (list) – The list of int describing the shape of the array to be returned.

Returns:

The numpy array generated using contents from the specified shared memory region.

Return type:

np.array

tritonclient.utils.cuda_shared_memory.get_raw_handle(cuda_shm_handle)#

Returns the underlying raw serialized cudaIPC handle in base64 encoding.

Parameters:

cuda_shm_handle (CudaSharedMemoryRegion) – The handle for the cuda shared memory region.

Returns:

The raw serialized cudaIPC handle of underlying cuda shared memory in base64 encoding.

Return type:

bytes

tritonclient.utils.cuda_shared_memory.set_shared_memory_region(cuda_shm_handle, input_values)#

Copy the contents of the numpy array into the cuda shared memory region.

Parameters:
  • cuda_shm_handle (CudaSharedMemoryRegion) – The handle for the cuda shared memory region.

  • input_values (list) – The list of numpy arrays to be copied into the shared memory region.

Raises:

CudaSharedMemoryException – If unable to set values in the cuda shared memory region.

tritonclient.utils.cuda_shared_memory.set_shared_memory_region_from_dlpack(cuda_shm_handle, input_values)#