Neighbors Brute Force

Python module: cuvs.neighbors.brute_force

Index

1 cdef class Index

Brute Force index object. This object stores the trained Brute Force which can be used to perform nearest neighbors searches.

Members

Name	Kind
`trained`	property

trained

1 def trained(self)

build

@auto_sync_resources

1 def build(dataset, metric="sqeuclidean", metric_arg=2.0, resources=None)

Build the Brute Force index from the dataset for efficient search.

Parameters

Name	Type	Description
`dataset`	`CUDA array interface compliant matrix shape (n_samples, dim)`	Supported dtype [float32, float16]
`metric`	`Distance metric to use. Default is sqeuclidean`
`metric_arg`	`value of 'p' for Minkowski distances`
`resources`	`cuvs.common.Resources, optional`

Returns

Name	Type	Description
`index`	`cuvs.neighbors.brute_force.Index`

Examples

1 >>> import cupy as cp
2 >>> from cuvs.neighbors import brute_force
3 >>> n_samples = 50000
4 >>> n_features = 50
5 >>> n_queries = 1000
6 >>> k = 10
7 >>> dataset = cp.random.random_sample((n_samples, n_features),
8 ...                                   dtype=cp.float32)
9 >>> index = brute_force.build(dataset, metric="cosine")
10 >>> distances, neighbors = brute_force.search(index, dataset, k)
11 >>> distances = cp.asarray(distances)
12 >>> neighbors = cp.asarray(neighbors)

search

@auto_sync_resources @auto_convert_output

1 def search(Index index, queries, k, neighbors=None, distances=None, resources=None, prefilter=None)

Find the k nearest neighbors for each query.

Parameters

Name	Type	Description
`index`	`Index`	Trained Brute Force index.
`queries`	`CUDA array interface compliant matrix shape (n_samples, dim)`	Supported dtype [float32, float16]
`k`	`int`	The number of neighbors.
`neighbors`	`Optional CUDA array interface compliant matrix shape`	(n_queries, k), dtype int64_t. If supplied, neighbor indices will be written here in-place. (default None)
`distances`	`Optional CUDA array interface compliant matrix shape`	(n_queries, k) If supplied, the distances to the neighbors will be written here in-place. (default None)
`prefilter`	`Optional, cuvs.neighbors.cuvsFilter`	An optional filter to exclude certain query-neighbor pairs using a bitmap or bitset. The filter function should have a row-major layout with logical shape `(n_prefilter_rows, n_samples)`, where: - `n_prefilter_rows == n_queries` when using a bitmap filter. - `n_prefilter_rows == 1` when using a bitset prefilter. Each bit in `n_samples` determines whether `queries[i]` should be considered for distance computation with the index. (default None)
`resources`	`cuvs.common.Resources, optional`

Examples

1 >>> # Example without pre-filter
2 >>> import cupy as cp
3 >>> from cuvs.neighbors import brute_force
4 >>> n_samples = 50000
5 >>> n_features = 50
6 >>> n_queries = 1000
7 >>> dataset = cp.random.random_sample((n_samples, n_features),
8 ...                                   dtype=cp.float32)
9 >>> # Build index
10 >>> index = brute_force.build(dataset, metric="sqeuclidean")
11 >>> # Search using the built index
12 >>> queries = cp.random.random_sample((n_queries, n_features),
13 ...                                   dtype=cp.float32)
14 >>> k = 10
15 >>> # Using a pooling allocator reduces overhead of temporary array
16 >>> # creation during search. This is useful if multiple searches
17 >>> # are performed with same query size.
18 >>> distances, neighbors = brute_force.search(index, queries, k)
19 >>> neighbors = cp.asarray(neighbors)
20 >>> distances = cp.asarray(distances)

1 >>> # Example with pre-filter
2 >>> import numpy as np
3 >>> import cupy as cp
4 >>> from cuvs.neighbors import brute_force, filters
5 >>> n_samples = 50000
6 >>> n_features = 50
7 >>> n_queries = 1000
8 >>> dataset = cp.random.random_sample((n_samples, n_features),
9 ...                                   dtype=cp.float32)
10 >>> # Build index
11 >>> index = brute_force.build(dataset, metric="sqeuclidean")
12 >>> # Search using the built index
13 >>> queries = cp.random.random_sample((n_queries, n_features),
14 ...                                   dtype=cp.float32)
15 >>> # Build filters
16 >>> n_bitmap = np.ceil(n_samples * n_queries / 32).astype(int)
17 >>> # Create your own bitmap as the filter by replacing the random one.
18 >>> bitmap = cp.random.randint(1, 100, size=(n_bitmap,), dtype=cp.uint32)
19 >>> bitmap_prefilter = filters.from_bitmap(bitmap)
20 >>>
21 >>> # or Build bitset prefilter:
22 >>> # n_bitset = np.ceil(n_samples * 1 / 32).astype(int)
23 >>> # # Create your own bitset as the filter by replacing the random one.
24 >>> # bitset = cp.random.randint(1, 100, size=(n_bitset,), dtype=cp.uint32)
25 >>> # bitset_prefilter = filters.from_bitset(bitset)
26 >>>
27 >>> k = 10
28 >>> # Using a pooling allocator reduces overhead of temporary array
29 >>> # creation during search. This is useful if multiple searches
30 >>> # are performed with same query size.
31 >>> distances, neighbors = brute_force.search(index, queries, k,
32 ...                                           prefilter=bitmap_prefilter)
33 >>> neighbors = cp.asarray(neighbors)
34 >>> distances = cp.asarray(distances)

save

@auto_sync_resources

1 def save(filename, Index index, bool include_dataset=True, resources=None)

Saves the index to a file.

The serialization format can be subject to changes, therefore loading an index saved with a previous version of cuvs is not guaranteed to work.

Parameters

Name	Type	Description
`filename`	`string`	Name of the file.
`index`	`Index`	Trained Brute Force index.
`resources`	`cuvs.common.Resources, optional`

Examples

1 >>> import cupy as cp
2 >>> from cuvs.neighbors import brute_force
3 >>> n_samples = 50000
4 >>> n_features = 50
5 >>> dataset = cp.random.random_sample((n_samples, n_features),
6 ...                                   dtype=cp.float32)
7 >>> # Build index
8 >>> index = brute_force.build(dataset)
9 >>> # Serialize and deserialize the brute_force index built
10 >>> brute_force.save("my_index.bin", index)
11 >>> index_loaded = brute_force.load("my_index.bin")

load

@auto_sync_resources

1 def load(filename, resources=None)

Loads index from file.

The serialization format can be subject to changes, therefore loading an index saved with a previous version of cuvs is not guaranteed to work.

Parameters

Name	Type	Description
`filename`	`string`	Name of the file.
`resources`	`cuvs.common.Resources, optional`

Returns

Name	Type	Description
`index`	`Index`

Examples

1 >>> import cupy as cp
2 >>> from cuvs.neighbors import brute_force
3 >>> n_samples = 50000
4 >>> n_features = 50
5 >>> dataset = cp.random.random_sample((n_samples, n_features),
6 ...                                   dtype=cp.float32)
7 >>> # Build index
8 >>> index = brute_force.build(dataset)
9 >>> # Serialize and deserialize the brute_force index built
10 >>> brute_force.save("my_index.bin", index)
11 >>> index_loaded = brute_force.load("my_index.bin")