Neighbors

View as Markdown

Python module: cuvs.neighbors

refine

@auto_sync_resources @auto_convert_output

1def refine(dataset, queries, candidates, k=None, metric="sqeuclidean", indices=None, distances=None, resources=None)

Refine nearest neighbor search.

Refinement is an operation that follows an approximate NN search. The approximate search has already selected n_candidates neighbor candidates for each query. We narrow it down to k neighbors. For each query, we calculate the exact distance between the query and its n_candidates neighbor candidate, and select the k nearest ones.

Input arrays can be either CUDA array interface compliant matrices or array interface compliant matrices in host memory. All array must be in the same memory space.

Parameters

NameTypeDescription
datasetarray interface compliant matrix, shape (n_samples, dim)Supported dtype [float32, int8, uint8, float16]
queriesarray interface compliant matrix, shape (n_queries, dim)Supported dtype [float32, int8, uint8, float16]
candidatesarray interface compliant matrix, shape (n_queries, k0)Supported dtype int64
kintNumber of neighbors to search (k <= k0). Optional if indices or distances arrays are given (in which case their second dimension is k).
metricstrName of distance metric to use, default =“sqeuclidean”
indicesOptional array interface compliant matrix shape (n_queries, k).If supplied, neighbor indices will be written here in-place. (default None). Supported dtype int64.
distancesOptional array interface compliant matrix shape (n_queries, k).If supplied, neighbor indices will be written here in-place. (default None) Supported dtype float.
resourcescuvs.common.Resources, optional

Examples

1>>> import cupy as cp
2>>> from cuvs.common import Resources
3>>> from cuvs.neighbors import ivf_pq, refine
4>>> n_samples = 50000
5>>> n_features = 50
6>>> n_queries = 1000
7>>> dataset = cp.random.random_sample((n_samples, n_features),
8... dtype=cp.float32)
9>>> resources = Resources()
10>>> index_params = ivf_pq.IndexParams(n_lists=1024,
11... metric="sqeuclidean",
12... pq_dim=10)
13>>> index = ivf_pq.build(index_params, dataset, resources=resources)
14>>> # Search using the built index
15>>> queries = cp.random.random_sample((n_queries, n_features),
16... dtype=cp.float32)
17>>> k = 40
18>>> _, candidates = ivf_pq.search(ivf_pq.SearchParams(), index,
19... queries, k, resources=resources)
20>>> k = 10
21>>> distances, neighbors = refine(dataset, queries, candidates, k)