Tiered Index
Python module: cuvs.neighbors.tiered_index
Index
1 cdef class Index
Tiered Index object.
Members
| Name | Kind |
|---|---|
trained | property |
trained
1 def trained(self)
IndexParams
1 cdef class IndexParams
Parameters to build index for Tiered Index nearest neighbor search
Parameters
| Name | Type | Description |
|---|---|---|
metric | str, default = "sqeuclidean" | String denoting the metric type. Valid values for metric: [“sqeuclidean”, “inner_product”, “euclidean”, “cosine”], where - sqeuclidean is the euclidean distance without the square root operation, i.e.: distance(a,b) = \sum_i (a_i - b_i)^2, - euclidean is the euclidean distance - inner product distance is defined as distance(a, b) = \sum_i a_i * b_i. - cosine distance is defined as distance(a, b) = 1 - \sum_i a_i * b_i / ( ||a||_2 * ||b||_2). |
algo | str, default = "cagra" | The algorithm to use for the ANN portion of the tiered index |
upstream_params | object, optional | The IndexParams for the upstream ANN object to use (ie the Cagra IndexParams for cagra etc) |
min_ann_rows | int | The minimum number of rows necessary to create an ann index |
create_ann_index_on_extend | bool | Whether or not to create a new ann index on extend, if the number of rows in the incremental (bfknn) portion is above min_ann_rows |
Constructor
1 def __init__(self, *, metric="sqeuclidean", algo="cagra", upstream_params=None, min_ann_rows=None, create_ann_index_on_extend=None,)
Members
| Name | Kind |
|---|---|
metric | property |
algo | property |
min_ann_rows | property |
create_ann_index_on_extend | property |
upstream_params | property |
metric
1 def metric(self)
algo
1 def algo(self)
min_ann_rows
1 def min_ann_rows(self)
create_ann_index_on_extend
1 def create_ann_index_on_extend(self)
upstream_params
1 def upstream_params(self)
build
@auto_sync_resources
1 def build(IndexParams index_params, dataset, resources=None)
Build the Tiered index from the dataset for efficient search.
Parameters
| Name | Type | Description |
|---|---|---|
index_params | cuvs.neighbors.tiered_index.IndexParams | |
dataset | CUDA array interface compliant matrix shape (n_samples, dim) | Supported dtype [float32] |
resources | cuvs.common.Resources, optional |
Returns
| Name | Type | Description |
|---|---|---|
index | cuvs.neighbors.tiered_index.Index |
Examples
1 >>> import cupy as cp 2 >>> from cuvs.neighbors import cagra, tiered_index 3 >>> n_samples = 50000 4 >>> n_features = 50 5 >>> n_queries = 1000 6 >>> k = 10 7 >>> dataset = cp.random.random_sample((n_samples, n_features), 8 ... dtype=cp.float32) 9 >>> build_params = tiered_index.IndexParams(metric="sqeuclidean", 10 ... algo="cagra") 11 >>> index = tiered_index.build(build_params, dataset) 12 >>> distances, neighbors = tiered_index.search(cagra.SearchParams(), 13 ... index, dataset, k) 14 >>> distances = cp.asarray(distances) 15 >>> neighbors = cp.asarray(neighbors)
extend
@auto_sync_resources
1 def extend(Index index, new_vectors, resources=None)
Extend an existing index with new vectors.
The input array can be either CUDA array interface compliant matrix or array interface compliant matrix in host memory.
Parameters
| Name | Type | Description |
|---|---|---|
index | tiered_index.Index | Trained tiered_index object. |
new_vectors | array interface compliant matrix shape (n_samples, dim) | Supported dtype [float32] |
resources | cuvs.common.Resources, optional |
Returns
| Name | Type | Description |
|---|---|---|
index | cuvs.neighbors.tiered_index.Index |
Examples
1 >>> import cupy as cp 2 >>> from cuvs.neighbors import tiered_index 3 >>> n_samples = 50000 4 >>> n_features = 50 5 >>> n_queries = 1000 6 >>> dataset = cp.random.random_sample((n_samples, n_features), 7 ... dtype=cp.float32) 8 >>> index = tiered_index.build(tiered_index.IndexParams(), dataset) 9 >>> n_rows = 100 10 >>> more_data = cp.random.random_sample((n_rows, n_features), 11 ... dtype=cp.float32) 12 >>> index = tiered_index.extend(index, more_data)
search
@auto_sync_resources
@auto_convert_output
1 def search(search_params, Index index, queries, k, neighbors=None, distances=None, resources=None, filter=None)
Find the k nearest neighbors for each query.
Parameters
| Name | Type | Description |
|---|---|---|
search_params | SearchParams for the upstream ANN index | |
index | cuvs.neighbors.tiered_index.Index | Trained Tiered index. |
queries | CUDA array interface compliant matrix shape (n_samples, dim) | Supported dtype [float32] |
k | int | The number of neighbors. |
neighbors | Optional CUDA array interface compliant matrix shape | (n_queries, k), dtype int64_t. If supplied, neighbor indices will be written here in-place. (default None) |
distances | Optional CUDA array interface compliant matrix shape | (n_queries, k) If supplied, the distances to the neighbors will be written here in-place. (default None) |
filter | Optional cuvs.neighbors.cuvsFilter can be used to filter | neighbors based on a given bitset. (default None) |
resources | cuvs.common.Resources, optional |
Examples
1 >>> import cupy as cp 2 >>> from cuvs.neighbors import cagra, tiered_index 3 >>> n_samples = 50000 4 >>> n_features = 50 5 >>> n_queries = 1000 6 >>> dataset = cp.random.random_sample((n_samples, n_features), 7 ... dtype=cp.float32) 8 >>> # Build the index 9 >>> index = tiered_index.build(tiered_index.IndexParams(algo="cagra"), 10 ... dataset) 11 >>> 12 >>> # Search using the built index 13 >>> queries = cp.random.random_sample((n_queries, n_features), 14 ... dtype=cp.float32) 15 >>> k = 10 16 >>> search_params = cagra.SearchParams() 17 >>> 18 >>> distances, neighbors = tiered_index.search(search_params, index, 19 ... queries, k)