All Neighbors

View as Markdown

Source header: cuvs/neighbors/all_neighbors.hpp

The all-neighbors algorithm parameters.

GraphBuildParams

The all-neighbors algorithm parameters.

1using GraphBuildParams = std::variant<graph_build_params::ivf_pq_params,
2graph_build_params::nn_descent_params,
3graph_build_params::brute_force_params>;

neighbors::all_neighbors::all_neighbors_params

Parameters used to build an all-neighbors graph (find nearest neighbors for all the

training vectors). For scalability, the all-neighbors graph construction algorithm partitions a set of training vectors into overlapping clusters, computes a local knn graph on each cluster, and merges the local graphs into a single global graph. Device memory usage and accuracy can be configured by changing the overlap_factor and n_clusters. The algorithm used to build each local graph is also configurable.

1struct all_neighbors_params { ... };

Fields

NameTypeDescription
graph_build_paramsGraphBuildParamsParameters for knn graph building algorithm Approximate nearest neighbors methods or a brute force approach are supported to build the knn graph. Currently supported options are ‘IVF-PQ’, ‘NN Descent’, or ‘Brute Force’. IVF-PQ is more accurate, but slower compared to NN Descent. Note that ‘Brute Force’ can also be approximate if n_clusters > 1. Set ivf_pq_params, nn_descent_params, or brute_force_params to select the graph build algorithm and control their parameters.
overlap_factorsize_tNumber of nearest clusters each data point will be assigned to in the batching algorithm. Start with overlap_factor = 2 and gradually increase (2->3->4 …) for better accuracy at the cost of device memory usage.
n_clusterssize_tNumber of total clusters (aka batches) to split the data into. If set to 1, algorithm creates an all-neighbors graph without batching. Start with n_clusters = 4 and increase (4 → 8 → 16…) for less device memory usage at the cost of accuracy. This is independent from overlap_factor as long as overlap_factor < n_clusters. The ratio of overlap_factor / n_clusters determines device memory usage. Approximately (overlap_factor / n_clusters) * num_rows_in_entire_data number of rows will be put on device memory at once. E.g. between (overlap_factor / n_clusters) = 2/10 and 2/20, the latter will use less device memory. Larger overlap_factor results in better accuracy of the final all-neighbors knn graph. E.g. While using similar device memory, (overlap_factor / n_clusters) = 4/20 will have better accuracy than 2/10 at the cost of performance.
metriccuvs::distance::DistanceTypeMetric used.

The all-neighbors knn graph build

neighbors::all_neighbors::build

Builds an approximate all-neighbors knn graph (find nearest neighbors for all the

1void build(
2const raft::resources& handle,
3const all_neighbors_params& params,
4raft::host_matrix_view<const float, int64_t, row_major> dataset,
5raft::device_matrix_view<int64_t, int64_t, row_major> indices,
6std::optional<raft::device_matrix_view<float, int64_t, row_major>> distances = std::nullopt,
7std::optional<raft::device_vector_view<float, int64_t, row_major>> core_distances = std::nullopt,
8float alpha = 1.0);

training vectors)

Usage example:

compute core_distances. If core_distances is given, the resulting indices and distances will be mutual reachability space.

Parameters

NameDirectionTypeDescription
handleinconst raft::resources&raft::resources is an object managing resources
paramsinconst all_neighbors_params&an instance of all_neighbors::all_neighbors_params that are parameters to build all-neighbors knn graph
datasetinraft::host_matrix_view<const float, int64_t, row_major>raft::host_matrix_view input dataset expected to be located in host memory
indicesoutraft::device_matrix_view<int64_t, int64_t, row_major>nearest neighbor indices of shape [n_row x k]
distancesoutstd::optional<raft::device_matrix_view<float, int64_t, row_major>>nearest neighbor distances [n_row x k] Default: std::nullopt.
core_distancesoutstd::optional<raft::device_vector_view<float, int64_t, row_major>>array for core distances of size [n_row]. Requires distances matrix to Default: std::nullopt.
alphainfloatdistance scaling parameter as used in robust single linkage. Default: 1.0.

Returns

void

Additional overload: neighbors::all_neighbors::build

Builds an approximate all-neighbors knn graph (find nearest neighbors for all the training

1void build(
2const raft::resources& handle,
3const all_neighbors_params& params,
4raft::device_matrix_view<const float, int64_t, row_major> dataset,
5raft::device_matrix_view<int64_t, int64_t, row_major> indices,
6std::optional<raft::device_matrix_view<float, int64_t, row_major>> distances = std::nullopt,
7std::optional<raft::device_vector_view<float, int64_t, row_major>> core_distances = std::nullopt,
8float alpha = 1.0);

vectors) params.n_clusters should be 1 for data on device. To use a larger params.n_clusters for efficient device memory usage, put data on host RAM.

Usage example:

compute core_distances. If core_distances is given, the resulting indices and distances will be mutual reachability space.

Parameters

NameDirectionTypeDescription
handleinconst raft::resources&raft::resources is an object managing resources
paramsinconst all_neighbors_params&an instance of all_neighbors::all_neighbors_params that are parameters to build all-neighbors knn graph
datasetinraft::device_matrix_view<const float, int64_t, row_major>raft::device_matrix_view input dataset expected to be located in device memory
indicesoutraft::device_matrix_view<int64_t, int64_t, row_major>nearest neighbor indices of shape [n_row x k]
distancesoutstd::optional<raft::device_matrix_view<float, int64_t, row_major>>nearest neighbor distances [n_row x k] Default: std::nullopt.
core_distancesoutstd::optional<raft::device_vector_view<float, int64_t, row_major>>array for core distances of size [n_row]. Requires distances matrix to Default: std::nullopt.
alphainfloatdistance scaling parameter as used in robust single linkage. Default: 1.0.

Returns

void