Neighbors Tiered Index

Python module: cuvs.neighbors.tiered_index

Index

1 cdef class Index

Tiered Index object.

Members

Name	Kind
`trained`	property

trained

1 def trained(self)

IndexParams

1 cdef class IndexParams

Parameters to build index for Tiered Index nearest neighbor search

Parameters

Name	Type	Description
`metric`	`str, default = "sqeuclidean"`	String denoting the metric type. Valid values for metric: [“sqeuclidean”, “inner_product”, “euclidean”, “cosine”], where - sqeuclidean is the euclidean distance without the square root operation, i.e.: distance(a,b) = \sum_i (a_i - b_i)^2, - euclidean is the euclidean distance - inner product distance is defined as distance(a, b) = \sum_i a_i * b_i. - cosine distance is defined as distance(a, b) = 1 - \sum_i a_i * b_i / ( \|\|a\|\|_2 * \|\|b\|\|_2).
`algo`	`str, default = "cagra"`	The algorithm to use for the ANN portion of the tiered index
`upstream_params`	`object, optional`	The IndexParams for the upstream ANN object to use (ie the Cagra IndexParams for cagra etc)
`min_ann_rows`	`int`	The minimum number of rows necessary to create an ann index
`create_ann_index_on_extend`	`bool`	Whether or not to create a new ann index on extend, if the number of rows in the incremental (bfknn) portion is above min_ann_rows

Constructor

1 def __init__(self, *, metric="sqeuclidean", algo="cagra", upstream_params=None, min_ann_rows=None, create_ann_index_on_extend=None,)

Members

Name	Kind
`metric`	property
`algo`	property
`min_ann_rows`	property
`create_ann_index_on_extend`	property
`upstream_params`	property

metric

1 def metric(self)

algo

1 def algo(self)

min_ann_rows

1 def min_ann_rows(self)

create_ann_index_on_extend

1 def create_ann_index_on_extend(self)

upstream_params

1 def upstream_params(self)

build

@auto_sync_resources

1 def build(IndexParams index_params, dataset, resources=None)

Build the Tiered index from the dataset for efficient search.

Parameters

Name	Type	Description
`index_params`	`cuvs.neighbors.tiered_index.IndexParams`
`dataset`	`CUDA array interface compliant matrix shape (n_samples, dim)`	Supported dtype [float32]
`resources`	`cuvs.common.Resources, optional`

Returns

Name	Type	Description
`index`	`cuvs.neighbors.tiered_index.Index`

Examples

1 >>> import cupy as cp
2 >>> from cuvs.neighbors import cagra, tiered_index
3 >>> n_samples = 50000
4 >>> n_features = 50
5 >>> n_queries = 1000
6 >>> k = 10
7 >>> dataset = cp.random.random_sample((n_samples, n_features),
8 ...                                   dtype=cp.float32)
9 >>> build_params = tiered_index.IndexParams(metric="sqeuclidean",
10 ...                                         algo="cagra")
11 >>> index = tiered_index.build(build_params, dataset)
12 >>> distances, neighbors = tiered_index.search(cagra.SearchParams(),
13 ...                                            index, dataset, k)
14 >>> distances = cp.asarray(distances)
15 >>> neighbors = cp.asarray(neighbors)

extend

@auto_sync_resources

1 def extend(Index index, new_vectors, resources=None)

Extend an existing index with new vectors.

The input array can be either CUDA array interface compliant matrix or array interface compliant matrix in host memory.

Parameters

Name	Type	Description
`index`	`tiered_index.Index`	Trained tiered_index object.
`new_vectors`	`array interface compliant matrix shape (n_samples, dim)`	Supported dtype [float32]
`resources`	`cuvs.common.Resources, optional`

Returns

Name	Type	Description
`index`	`cuvs.neighbors.tiered_index.Index`

Examples

1 >>> import cupy as cp
2 >>> from cuvs.neighbors import tiered_index
3 >>> n_samples = 50000
4 >>> n_features = 50
5 >>> n_queries = 1000
6 >>> dataset = cp.random.random_sample((n_samples, n_features),
7 ...                                   dtype=cp.float32)
8 >>> index = tiered_index.build(tiered_index.IndexParams(), dataset)
9 >>> n_rows = 100
10 >>> more_data = cp.random.random_sample((n_rows, n_features),
11 ...                                     dtype=cp.float32)
12 >>> index = tiered_index.extend(index, more_data)

search

@auto_sync_resources @auto_convert_output

1 def search(search_params, Index index, queries, k, neighbors=None, distances=None, resources=None, filter=None)

Find the k nearest neighbors for each query.

Parameters

Name	Type	Description
`search_params`	`SearchParams for the upstream ANN index`
`index`	`cuvs.neighbors.tiered_index.Index`	Trained Tiered index.
`queries`	`CUDA array interface compliant matrix shape (n_samples, dim)`	Supported dtype [float32]
`k`	`int`	The number of neighbors.
`neighbors`	`Optional CUDA array interface compliant matrix shape`	(n_queries, k), dtype int64_t. If supplied, neighbor indices will be written here in-place. (default None)
`distances`	`Optional CUDA array interface compliant matrix shape`	(n_queries, k) If supplied, the distances to the neighbors will be written here in-place. (default None)
`filter`	`Optional cuvs.neighbors.cuvsFilter can be used to filter`	neighbors based on a given bitset. (default None)
`resources`	`cuvs.common.Resources, optional`

Examples

1 >>> import cupy as cp
2 >>> from cuvs.neighbors import cagra, tiered_index
3 >>> n_samples = 50000
4 >>> n_features = 50
5 >>> n_queries = 1000
6 >>> dataset = cp.random.random_sample((n_samples, n_features),
7 ...                                   dtype=cp.float32)
8 >>> # Build the index
9 >>> index = tiered_index.build(tiered_index.IndexParams(algo="cagra"),
10 ...                            dataset)
11 >>>
12 >>> # Search using the built index
13 >>> queries = cp.random.random_sample((n_queries, n_features),
14 ...                                   dtype=cp.float32)
15 >>> k = 10
16 >>> search_params = cagra.SearchParams()
17 >>>
18 >>> distances, neighbors = tiered_index.search(search_params, index,
19 ...                                            queries, k)