Brute-force

Brute-force, or a flat index, is the simplest nearest-neighbor index. It stores the dataset and checks every vector during search.

Brute-force works well when exact results matter, the dataset is small enough to scan, or filters remove most of the dataset before search.

Example API Usage

Building an index

C

C++

Python

Java

Rust

Go

1 #include <cuvs/neighbors/brute_force.h>
2 
3 cuvsResources_t res;
4 cuvsBruteForceIndex_t index;
5 DLManagedTensor *dataset;
6 
7 load_dataset(dataset);
8 
9 cuvsResourcesCreate(&res);
10 cuvsBruteForceIndexCreate(&index);
11 
12 cuvsBruteForceBuild(res, dataset, L2Expanded, 0.0f, index);
13 
14 cuvsBruteForceIndexDestroy(index);
15 cuvsResourcesDestroy(res);

Searching an index

C

C++

Python

Java

Rust

Go

1 #include <cuvs/neighbors/brute_force.h>
2 
3 cuvsResources_t res;
4 cuvsBruteForceIndex_t index;
5 DLManagedTensor *queries;
6 DLManagedTensor *neighbors;
7 DLManagedTensor *distances;
8 
9 load_queries(queries);
10 allocate_outputs(neighbors, distances);
11 
12 cuvsResourcesCreate(&res);
13 
14 // ... build or load index ...
15 cuvsFilter prefilter = {0, NO_FILTER};
16 cuvsBruteForceSearch(res, index, queries, neighbors, distances, prefilter);
17 
18 cuvsResourcesDestroy(res);

Saving and loading an index

Serialize a Brute-force index when you want to reuse the same stored dataset without rebuilding it. The examples below save the dataset with the index so the loaded index can be searched immediately.

Rust and Go do not currently expose Brute-force save/load wrappers.

C

C++

Python

Java

1 #include <cuvs/neighbors/brute_force.h>
2 
3 cuvsResources_t res;
4 cuvsBruteForceIndex_t index;
5 cuvsBruteForceIndex_t loaded_index;
6 
7 cuvsResourcesCreate(&res);
8 cuvsBruteForceIndexCreate(&index);
9 cuvsBruteForceIndexCreate(&loaded_index);
10 
11 // ... build index ...
12 cuvsBruteForceSerialize(res, "/tmp/cuvs-bruteforce.bin", index);
13 cuvsBruteForceDeserialize(res, "/tmp/cuvs-bruteforce.bin", loaded_index);
14 
15 cuvsBruteForceIndexDestroy(loaded_index);
16 cuvsBruteForceIndexDestroy(index);
17 cuvsResourcesDestroy(res);

How Brute-force works

Brute-force stores the vectors and compares each query against every vector in the dataset. There is no partitioning step, graph traversal, or approximation step.

The build step is mostly a data setup step. The search step does the real work: for each query, cuVS computes distances to every vector, keeps the best k candidates, and returns exact nearest neighbors.

When to use Brute-force

Use brute-force when exact nearest neighbors are required.

Use brute-force when the dataset fits comfortably in device memory and search latency is acceptable.

Use brute-force as a ground-truth baseline for tuning approximate indexes.

Use brute-force for heavily filtered queries. If a filter excludes most vectors, brute-force can search only the remaining candidates while IVF and graph indexes may skip useful vectors before the filter is applied.

Use an approximate index instead when the dataset is large enough that scanning every vector is too slow.

Using Filters

Brute-force supports filtered search. A filter tells the search which vectors are allowed before distances are computed.

This is useful when queries should only search a small subset of the dataset. For example, if a filter removes 90%-99% of the vectors, brute-force may do much less work while still returning exact results within the allowed subset.

Unlike partitioned or graph-based indexes, brute-force does not need to guess which part of the index might contain the answer. If a vector passes the filter, it can be considered.

The examples below use a bitmap filter. A bitmap can express a different allow list for each query. A bit value of 1 means the query may consider that vector; a bit value of 0 means the vector is filtered out for that query.

These examples cover the Brute-force bindings that currently expose filters. The Rust and Go Brute-force wrappers currently search without a filter argument.

C

C++

Python

Java

1 #include <cuvs/neighbors/brute_force.h>
2 
3 cuvsResources_t res;
4 cuvsBruteForceIndex_t index;
5 DLManagedTensor *dataset;
6 DLManagedTensor *queries;
7 DLManagedTensor *neighbors;
8 DLManagedTensor *distances;
9 
10 cuvsResourcesCreate(&res);
11 cuvsBruteForceIndexCreate(&index);
12 
13 load_dataset(dataset);
14 load_queries(queries);
15 allocate_outputs(neighbors, distances);
16 
17 cuvsBruteForceBuild(res, dataset, L2Expanded, 0.0f, index);
18 
19 // Create a device uint32 bitmap with one bit per (query, vector) pair.
20 // Bit 1 means the pair is allowed; bit 0 means it is filtered out.
21 DLManagedTensor *bitmap = make_device_bitmap(allowed_query_vector_pairs, n_queries, n_vectors);
22 
23 cuvsFilter prefilter;
24 prefilter.type = BITMAP;
25 prefilter.addr = (uintptr_t)bitmap;
26 
27 cuvsBruteForceSearch(res, index, queries, neighbors, distances, prefilter);
28 
29 cuvsBruteForceIndexDestroy(index);
30 cuvsResourcesDestroy(res);

Configuration parameters

Build parameters

Name	Default	Description
`metric`	`L2Expanded` / `sqeuclidean`	Distance metric used for search.
`metric_arg`	`2.0`	Extra argument for metrics that need one, such as Minkowski distance.

Search parameters

The C++ brute_force::search_params struct currently has no tunable fields.

Filters are passed as search function arguments in bindings that expose filtered search, not as fields in brute_force::search_params.

Tuning Considerations

Brute-force has very few tuning knobs. The main decisions are distance metric, query batch size, output k, and whether a filter can reduce the amount of work.

Brute-force is exact, but ties can make result order look different across runs. If many vectors have the same distance near the cutoff, increasing k can make comparisons against ground truth more stable.

For large batches, memory allocation can affect performance. Reuse output buffers and memory resources when the API supports it.

Memory footprint

Brute-force memory is dominated by the dataset, optional vector norms, current query batch, and output buffers. It does not build an additional graph or partitioned index.

Variables:

N: Number of database vectors.
D: Vector dimension.
B: Bytes per dataset value, such as 4 for fp32.
Q: Query batch size.
K: Number of neighbors returned per query.
S_idx: Bytes per returned neighbor ID.
R_norm: 1 when the selected distance metric stores vector norms, otherwise 0.

Scratch and maximum vectors

The scratch term covers temporary distance tiles, allocator padding, CUDA library workspaces, and memory held by the active memory resource. Use H = 0.10 for a first search estimate. If you can measure a representative run, use:

H_{\text{measured}} = \frac{\text{observed\_peak} - \text{formula\_without\_scratch}} {\text{formula\_without\_scratch}}

Then set:

M_{\text{usable}} = (M_{\text{free}} - M_{\text{other}}) \cdot (1 - H)

The capacity variables in this subsection are:

M_free: Free memory in the relevant memory space before the operation starts. Use device memory for GPU-resident formulas and host memory for formulas explicitly marked as host memory.
M_other: Memory reserved for arrays, memory pools, concurrent work, or application buffers that are not included in the formula.
H: Scratch headroom fraction reserved for temporary buffers and allocator overhead.
M_usable: Memory budget left for the formula after subtracting M_other and reserving headroom.
observed_peak: Peak memory observed during a smaller representative run.
formula_without_scratch: Value of the selected peak formula with explicit scratch terms removed and without applying headroom.
peak_without_scratch(count): The selected peak formula rewritten as a function of the count being estimated, excluding scratch and headroom. The count is usually N for rows or vectors and B for K-selection batch rows.
B_per_row / B_per_vector: Bytes added by one more row or vector in the selected formula. For linear formulas, add the coefficients of the count being estimated after fixed values such as D, K, Q, and L are substituted.
B_fixed: Bytes in the selected formula that do not change with the estimated count, such as codebooks, centroids, fixed query batches, capped training buffers, or metadata.
N_max / B_max: Estimated largest row, vector, or batch-row count that fits in M_usable.

For fixed D, Q, and K, solve:

N_{\max} = \left\lfloor \frac{M_{\text{usable}} - B_{\text{fixed}}} {B_{\text{per\_vector}}} \right\rfloor

where B_per_vector = D * B + R_norm * 4. B_fixed includes query and result buffers.

Index footprint

The raw vectors use:

\text{dataset\_size} = N \cdot D \cdot B

Vector norms, for distance metrics that use them, use:

\text{norms\_size} = R_{\text{norm}} \cdot N \cdot 4

Search peak memory usage

Search adds the query batch and output buffers:

\begin{aligned} \text{query\_size} &= Q \cdot D \cdot B \\ \text{result\_size} &= Q \cdot K \cdot (S_{\text{idx}} + 4) \end{aligned}

The total search peak is approximately:

\begin{aligned} \text{search\_peak} \approx&\ \text{dataset\_size} + \text{norms\_size} \\ &+ \text{query\_size} + \text{result\_size} + \text{scratch} \end{aligned}