Multi-GPU Cagra

View as Markdown

Python module: cuvs.neighbors.mg.cagra

Index

1cdef class Index

Multi-GPU CAGRA index object. Stores the trained multi-GPU CAGRA index state which can be used to perform nearest neighbors searches across multiple GPUs.

Members

NameKind
trainedproperty

trained

1def trained(self)

IndexParams

1cdef class IndexParams(SingleGpuIndexParams)

Parameters to build multi-GPU CAGRA index for efficient search. Extends single-GPU IndexParams with multi-GPU specific parameters.

Parameters

NameTypeDescription
distribution_modestr, default = "sharded"Distribution mode for multi-GPU setup. Valid values: [“replicated”, “sharded”]
**kwargsAdditional parameters passed to single-GPU IndexParams

Note

CAGRA currently only supports “sqeuclidean” and “inner_product” metrics.

Constructor

1def __init__(self, *, distribution_mode="sharded", **kwargs)

Members

NameKind
get_handlemethod
distribution_modeproperty

get_handle

1def get_handle(self)

distribution_mode

1def distribution_mode(self)

SearchParams

1cdef class SearchParams(SingleGpuSearchParams)

Parameters to search multi-GPU CAGRA index.

Constructor

1def __init__(self, *, search_mode="load_balancer", merge_mode="merge_on_root_rank", n_rows_per_batch=1000, **kwargs)

Members

NameKind
get_handlemethod
search_modeproperty
search_modemethod
merge_modeproperty
merge_modemethod
n_rows_per_batchproperty
n_rows_per_batchmethod

get_handle

1def get_handle(self)

search_mode

1def search_mode(self)

Get the search mode for multi-GPU search.

search_mode

1def search_mode(self, value)

Set the search mode for multi-GPU search.

merge_mode

1def merge_mode(self)

Get the merge mode for multi-GPU search.

merge_mode

1def merge_mode(self, value)

Set the merge mode for multi-GPU search.

n_rows_per_batch

1def n_rows_per_batch(self)

Get the number of rows per batch for multi-GPU search.

n_rows_per_batch

1def n_rows_per_batch(self, value)

Set the number of rows per batch for multi-GPU search.

build

@auto_sync_multi_gpu_resources

1def build(IndexParams index_params, dataset, resources=None)

Build the multi-GPU CAGRA index from the dataset for efficient search.

Parameters

NameTypeDescription
index_paramscuvs.neighbors.cagra.IndexParams
datasetArray interface compliant matrix shape (n_samples, dim)Supported dtype [float32, float16, int8, uint8] IMPORTANT: For multi-GPU CAGRA, the dataset MUST be in host memory (CPU). If using CuPy/device arrays, transfer to host with array.get() or cp.asnumpy(array).
resourcescuvs.common.Resources, optional

Returns

NameTypeDescription
indexcuvs.neighbors.cagra.Index

Examples

1>>> import numpy as np
2>>> from cuvs.neighbors.mg import cagra
3>>> n_samples = 50000
4>>> n_features = 50
5>>> n_queries = 1000
6>>> k = 10
7>>> # For multi-GPU CAGRA, use host (NumPy) arrays
8>>> dataset = np.random.random_sample((n_samples, n_features)).astype(
9... np.float32)
10>>> build_params = cagra.IndexParams(metric="sqeuclidean")
11>>> index = cagra.build(build_params, dataset)
12>>> distances, neighbors = cagra.search(cagra.SearchParams(),
13... index, dataset, k)
14>>> # Results are already in host memory (NumPy arrays)

extend

@auto_sync_multi_gpu_resources

1def extend(Index index, new_vectors, new_indices=None, resources=None)

Extend the multi-GPU CAGRA index with new vectors.

Parameters

NameTypeDescription
indexcuvs.neighbors.cagra.Index
new_vectorsArray interface compliant matrix shape (n_new_vectors, dim)Supported dtype [float32, float16, int8, uint8] IMPORTANT: For multi-GPU CAGRA, new_vectors MUST be in host memory (CPU). If using CuPy/device arrays, transfer to host with array.get() or cp.asnumpy(array).
new_indicesArray interface compliant matrix shape (n_new_vectors,), optionalIf provided, these indices will be used for the new vectors. If not provided, indices will be automatically assigned. IMPORTANT: Must be in host memory (CPU) for multi-GPU CAGRA. Expected dtype: uint32
resourcescuvs.common.Resources, optional

Examples

1>>> import numpy as np
2>>> from cuvs.neighbors.mg import cagra
3>>> n_samples = 50000
4>>> n_features = 50
5>>> n_new_vectors = 1000
6>>> # For multi-GPU CAGRA, use host (NumPy) arrays
7>>> dataset = np.random.random_sample((n_samples, n_features)).astype(
8... np.float32)
9>>> new_vectors = np.random.random_sample(
10... (n_new_vectors, n_features)).astype(np.float32)
11>>> new_indices = np.arange(n_samples, n_samples + n_new_vectors,
12... dtype=np.uint32)
13>>> build_params = cagra.IndexParams(metric="sqeuclidean")
14>>> index = cagra.build(build_params, dataset)
15>>> cagra.extend(index, new_vectors, new_indices) # doctest: +SKIP

@auto_sync_multi_gpu_resources @auto_convert_output

1def search(SearchParams search_params, Index index, queries, k, neighbors=None, distances=None, resources=None)

Search the multi-GPU CAGRA index for the k-nearest neighbors of each query.

Parameters

NameTypeDescription
search_paramscuvs.neighbors.cagra.SearchParams
indexcuvs.neighbors.cagra.Index
queriesArray interface compliant matrix shape (n_queries, dim)Supported dtype [float32, float16, int8, uint8] IMPORTANT: For multi-GPU CAGRA, queries MUST be in host memory (CPU). If using CuPy/device arrays, transfer to host with array.get() or cp.asnumpy(array).
kintThe number of neighbors to search for each query.
neighborsArray interface compliant matrix shape (n_queries, k), optionalIf provided, this array will be filled with the indices of the k-nearest neighbors. If not provided, a new host array will be allocated. IMPORTANT: Must be in host memory (CPU) for multi-GPU CAGRA. Expected dtype: int64
distancesArray interface compliant matrix shape (n_queries, k), optionalIf provided, this array will be filled with the distances to the k-nearest neighbors. If not provided, a new host array will be allocated. IMPORTANT: Must be in host memory (CPU) for multi-GPU CAGRA.
resourcescuvs.common.Resources, optional

Returns

NameTypeDescription
distancesnumpy.ndarrayThe distances to the k-nearest neighbors for each query (in host memory).
neighborsnumpy.ndarrayThe indices of the k-nearest neighbors for each query (in host memory).

Examples

1>>> import numpy as np
2>>> from cuvs.neighbors.mg import cagra
3>>> n_samples = 50000
4>>> n_features = 50
5>>> n_queries = 1000
6>>> k = 10
7>>> # For multi-GPU CAGRA, use host (NumPy) arrays
8>>> dataset = np.random.random_sample((n_samples, n_features)).astype(
9... np.float32)
10>>> queries = np.random.random_sample((n_queries, n_features)).astype(
11... np.float32)
12>>> build_params = cagra.IndexParams(metric="sqeuclidean")
13>>> index = cagra.build(build_params, dataset)
14>>> distances, neighbors = cagra.search(cagra.SearchParams(),
15... index, queries, k)
16>>> # Results are already in host memory (NumPy arrays)

save

@auto_sync_multi_gpu_resources

1def save(Index index, filename, resources=None)

Serialize the multi-GPU CAGRA index to a file.

Parameters

NameTypeDescription
indexcuvs.neighbors.cagra.Index
filenamestrThe filename to serialize the index to.
resourcescuvs.common.Resources, optional

Examples

1>>> import numpy as np
2>>> from cuvs.neighbors.mg import cagra
3>>> n_samples = 50000
4>>> n_features = 50
5>>> # For multi-GPU CAGRA, use host (NumPy) arrays
6>>> dataset = np.random.random_sample((n_samples, n_features)).astype(
7... np.float32)
8>>> build_params = cagra.IndexParams(metric="sqeuclidean")
9>>> index = cagra.build(build_params, dataset)
10>>> cagra.save(index, "index.bin")

load

@auto_sync_multi_gpu_resources

1def load(filename, resources=None)

Deserialize the multi-GPU CAGRA index from a file.

Parameters

NameTypeDescription
filenamestrThe filename to deserialize the index from.
resourcescuvs.common.Resources, optional

Returns

NameTypeDescription
indexIndexThe deserialized index.

Examples

1>>> from cuvs.neighbors.mg import cagra
2>>> index = cagra.load("index.bin") # doctest: +SKIP

distribute

@auto_sync_multi_gpu_resources

1def distribute(filename, resources=None)

Distribute a single-GPU CAGRA index across multiple GPUs from a file.

Parameters

NameTypeDescription
filenamestrThe filename to distribute the index from.
resourcescuvs.common.Resources, optional

Returns

NameTypeDescription
indexIndexThe distributed index.

Examples

1>>> from cuvs.neighbors.mg import cagra
2>>> index = cagra.distribute("single_gpu_index.bin") # doctest: +SKIP