Cagra
Source header: cuvs/neighbors/cagra.hpp
Types
neighbors::graph_build_params::ace_params
Specialized parameters for ACE (Augmented Core Extraction) graph build
Fields
CAGRA index build parameters
neighbors::vpq_params
Parameters for VPQ compression.
Fields
neighbors::cagra::hnsw_heuristic_type
A strategy for selecting the graph build parameters based on similar HNSW index
parameters.
Define how cagra::index_params::from_hnsw_params should construct a graph to construct a graph that is to be converted to (used by) a CPU HNSW index.
neighbors::cagra::index_params::from_hnsw_params
Create a CAGRA index parameters compatible with HNSW index
- IMPORTANT NOTE *
The reference HNSW index and the corresponding from-CAGRA generated HNSW index will NOT produce exactly the same recalls and QPS for the same parameter ef. The graphs are different internally. Depending on the selected heuristics, the CAGRA-produced graph’s QPS-Recall curve may be shifted along the curve right or left. See the heuristics descriptions for more details.
Usage example:
Parameters
Returns
static cagra::index_params
CAGRA index search parameters
neighbors::cagra::search_algo
CAGRA index search parameters
Values
CAGRA index extend parameters
neighbors::cagra::extend_params
CAGRA index extend parameters
Fields
CAGRA index type
neighbors::cagra::index
CAGRA index.
The index stores the dataset and a kNN graph in device memory.
neighbors::cagra::index::metric
Distance metric used for clustering.
Returns
neighbors::cagra::index::size
Total length of the index (number of vectors).
Returns
IdxT
neighbors::cagra::index::dim
Dimensionality of the data.
Returns
uint32_t
neighbors::cagra::index::graph_degree
Graph degree
Returns
uint32_t
neighbors::cagra::index::data
Dataset [size, dim]
Returns
const cuvs::neighbors::dataset<int64_t>&
neighbors::cagra::index::graph
neighborhood graph [size, graph-degree]
Returns
raft::device_matrix_view<const graph_index_type, int64_t, raft::row_major>
neighbors::cagra::index::source_indices
Mapping from internal graph node indices to the original user-provided indices.
Returns
std::optional<raft::device_vector_view<const index_type, int64_t>>
neighbors::cagra::index::dataset_fd
Get the dataset file descriptor (for disk-backed index)
Returns
const std::optional<cuvs::util::file_descriptor>&
neighbors::cagra::index::graph_fd
Get the graph file descriptor (for disk-backed index)
Returns
const std::optional<cuvs::util::file_descriptor>&
neighbors::cagra::index::mapping_fd
Get the mapping file descriptor (for disk-backed index)
Returns
const std::optional<cuvs::util::file_descriptor>&
neighbors::cagra::index::dataset_norms
Dataset norms for cosine distance [size]
Returns
std::optional<raft::device_vector_view<const float, int64_t>>
neighbors::cagra::index::index
Parameters
Returns
void
Additional overload: neighbors::cagra::index::index
Construct an empty index.
Parameters
Returns
void
Additional overload: neighbors::cagra::index::index
Construct an index from dataset and knn_graph arrays
If the dataset and graph is already in GPU memory, then the index is just a thin wrapper around these that stores a non-owning a reference to the arrays.
The constructor also accepts host arrays. In that case they are copied to the device, and the device arrays will be owned by the index.
In case the dasates rows are not 16 bytes aligned, then we create a padded copy in device memory to ensure alignment for vectorized load.
Usage examples:
-
Cagra index is normally created by the cagra::build In the above example, we have passed a host dataset to build. The returned index will own a device copy of the dataset and the knn_graph. In contrast, if we pass the dataset as a device_mdspan to build, then it will only store a reference to it.
-
Constructing index using existing knn-graph
Parameters
Returns
void
neighbors::cagra::index::update_dataset
Replace the dataset with a new dataset.
If the new dataset rows are aligned on 16 bytes, then only a reference is stored to the dataset. It is the caller’s responsibility to ensure that dataset stays alive as long as the index. It is expected that the same set of vectors are used for update_dataset and index build.
Note: This will clear any precomputed dataset norms.
Parameters
Returns
void
Additional overload: neighbors::cagra::index::update_dataset
Set the dataset reference explicitly to a device matrix view with padding.
Parameters
Returns
void
Additional overload: neighbors::cagra::index::update_dataset
Replace the dataset with a new dataset.
We create a copy of the dataset on the device. The index manages the lifetime of this copy. It is expected that the same set of vectors are used for update_dataset and index build.
Note: This will clear any precomputed dataset norms.
Parameters
Returns
void
Additional overload: neighbors::cagra::index::update_dataset
Replace the dataset with a new dataset. It is expected that the same set of vectors are used
for update_dataset and index build.
Note: This will clear any precomputed dataset norms.
Parameters
Returns
std::enable_if_t<std::is_base_of_v<cuvs::neighbors::dataset<dataset_index_type>, DatasetT>>
neighbors::cagra::index::update_graph
Replace the graph with a new graph.
Since the new graph is a device array, we store a reference to that, and it is the caller’s responsibility to ensure that knn_graph stays alive as long as the index.
Parameters
Returns
void
Additional overload: neighbors::cagra::index::update_graph
Replace the graph with a new graph.
We create a copy of the graph on the device. The index manages the lifetime of this copy.
Parameters
Returns
void
neighbors::cagra::index::update_source_indices
Replace the source indices with a new source indices taking the ownership of the passed vector.
Parameters
Returns
void
Additional overload: neighbors::cagra::index::update_source_indices
Copy the provided source indices into the index.
Parameters
Returns
void
Additional overload: neighbors::cagra::index::update_dataset
Update the dataset from a disk file using a file descriptor.
This method configures the index to use a disk-based dataset. The dataset file should contain a numpy header followed by vectors in row-major format. The number of rows and dimensionality are read from the numpy header.
Parameters
Returns
void
Additional overload: neighbors::cagra::index::update_graph
Update the graph from a disk file using a file descriptor.
This method configures the index to use a disk-based graph. The graph file should contain a numpy header followed by neighbor indices in row-major format. The number of rows and graph degree are read from the numpy header.
Parameters
Returns
void
neighbors::cagra::index::update_mapping
Update the dataset mapping from a disk file using a file descriptor.
This method configures the index to use a disk-based dataset mapping. The mapping file should contain a numpy header followed by index mappings.
Parameters
Returns
void
CAGRA index build functions
neighbors::cagra::build
Build the index from the dataset for efficient search.
The build consist of two steps: build an intermediate knn-graph, and optimize it to create the final graph. The index_params struct controls the node degree of these graphs.
The following distance metrics are supported:
- L2
- InnerProduct (currently only supported with IVF-PQ as the build algorithm)
- CosineExpanded
- L1 (currently only supported with NN-Descent and Iterative Search as the build algorithm)
Usage example:
Parameters
Returns
cuvs::neighbors::cagra::index<float, uint32_t>
Additional overload: neighbors::cagra::build
Build the index from the dataset for efficient search.
The build consist of two steps: build an intermediate knn-graph, and optimize it to create the final graph. The index_params struct controls the node degree of these graphs.
The following distance metrics are supported:
- L2
- InnerProduct (currently only supported with IVF-PQ as the build algorithm)
- CosineExpanded
- L1 (currently only supported with NN-Descent and Iterative Search as the build algorithm)
Usage example:
Parameters
Returns
cuvs::neighbors::cagra::index<float, uint32_t>
Additional overload: neighbors::cagra::build
Build the index from the dataset for efficient search.
The build consist of two steps: build an intermediate knn-graph, and optimize it to create the final graph. The index_params struct controls the node degree of these graphs.
The following distance metrics are supported:
- L2
- InnerProduct (currently only supported with IVF-PQ as the build algorithm)
- CosineExpanded (dataset norms are computed as float regardless of input data type)
- L1 (currently only supported with NN-Descent and Iterative Search as the build algorithm)
Usage example:
Parameters
Returns
cuvs::neighbors::cagra::index<half, uint32_t>
Additional overload: neighbors::cagra::build
Build the index from the dataset for efficient search.
The build consist of two steps: build an intermediate knn-graph, and optimize it to create the final graph. The index_params struct controls the node degree of these graphs.
The following distance metrics are supported:
- L2
- CosineExpanded (dataset norms are computed as float regardless of input data type)
- L1 (currently only supported with NN-Descent and Iterative Search as the build algorithm)
Usage example:
Parameters
Returns
cuvs::neighbors::cagra::index<half, uint32_t>
Additional overload: neighbors::cagra::build
Build the index from the dataset for efficient search.
The build consist of two steps: build an intermediate knn-graph, and optimize it to create the final graph. The index_params struct controls the node degree of these graphs.
The following distance metrics are supported:
- L2
- CosineExpanded (dataset norms are computed as float regardless of input data type)
- L1 (currently only supported with NN-Descent and Iterative Search as the build algorithm)
- BitwiseHamming (currently only supported with NN-Descent and Iterative Search as the build algorithm, and only for int8_t and uint8_t data types)
Usage example:
Parameters
Returns
cuvs::neighbors::cagra::index<int8_t, uint32_t>
Additional overload: neighbors::cagra::build
Build the index from the dataset for efficient search.
The build consist of two steps: build an intermediate knn-graph, and optimize it to create the final graph. The index_params struct controls the node degree of these graphs.
The following distance metrics are supported:
- L2
- InnerProduct (currently only supported with IVF-PQ as the build algorithm)
- CosineExpanded (dataset norms are computed as float regardless of input data type)
- L1 (currently only supported with NN-Descent and Iterative Search as the build algorithm)
- BitwiseHamming (currently only supported with NN-Descent and Iterative Search as the build algorithm, and only for int8_t and uint8_t data types)
Usage example:
Parameters
Returns
cuvs::neighbors::cagra::index<int8_t, uint32_t>
Additional overload: neighbors::cagra::build
Build the index from the dataset for efficient search.
The build consist of two steps: build an intermediate knn-graph, and optimize it to create the final graph. The index_params struct controls the node degree of these graphs.
The following distance metrics are supported:
- L2
- InnerProduct (currently only supported with IVF-PQ as the build algorithm)
- CosineExpanded (dataset norms are computed as float regardless of input data type)
- L1 (currently only supported with NN-Descent and Iterative Search as the build algorithm)
- BitwiseHamming (currently only supported with NN-Descent and Iterative Search as the build algorithm, and only for int8_t and uint8_t data types)
Usage example:
Parameters
Returns
cuvs::neighbors::cagra::index<uint8_t, uint32_t>
Additional overload: neighbors::cagra::build
Build the index from the dataset for efficient search.
The build consist of two steps: build an intermediate knn-graph, and optimize it to create the final graph. The index_params struct controls the node degree of these graphs.
The following distance metrics are supported:
- L2
- InnerProduct (currently only supported with IVF-PQ as the build algorithm)
- CosineExpanded (dataset norms are computed as float regardless of input data type)
- L1 (currently only supported with NN-Descent and Iterative Search as the build algorithm)
- BitwiseHamming (currently only supported with NN-Descent and Iterative Search as the build algorithm, and only for int8_t and uint8_t data types)
Usage example:
Parameters
Returns
cuvs::neighbors::cagra::index<uint8_t, uint32_t>
CAGRA extend functions
neighbors::cagra::extend
Add new vectors to a CAGRA index
Usage example:
part. The data will be copied from the current index in this function. The num rows must be the sum of the original and additional datasets, cols must be the dimension of the dataset, and the stride must be the same as the original index dataset. This view will be stored in the output index. It is the caller’s responsibility to ensure that dataset stays alive as long as the index. This option is useful when users want to manage the memory space for the dataset themselves. The data will be copied from the current index in this function. The num rows must be the sum of the original and additional datasets and cols must be the graph degree. This view will be stored in the output index. It is the caller’s responsibility to ensure that dataset stays alive as long as the index. This option is useful when users want to manage the memory space for the graph themselves.
Parameters
Returns
void
Additional overload: neighbors::cagra::extend
Add new vectors to a CAGRA index
Usage example:
part. The data will be copied from the current index in this function. The num rows must be the sum of the original and additional datasets, cols must be the dimension of the dataset, and the stride must be the same as the original index dataset. This view will be stored in the output index. It is the caller’s responsibility to ensure that dataset stays alive as long as the index. This option is useful when users want to manage the memory space for the dataset themselves. The data will be copied from the current index in this function. The num rows must be the sum of the original and additional datasets and cols must be the graph degree. This view will be stored in the output index. It is the caller’s responsibility to ensure that dataset stays alive as long as the index. This option is useful when users want to manage the memory space for the graph themselves.
Parameters
Returns
void
Additional overload: neighbors::cagra::extend
Add new vectors to a CAGRA index
Usage example:
part. The data will be copied from the current index in this function. The num rows must be the sum of the original and additional datasets, cols must be the dimension of the dataset, and the stride must be the same as the original index dataset. This view will be stored in the output index. It is the caller’s responsibility to ensure that dataset stays alive as long as the index. This option is useful when users want to manage the memory space for the dataset themselves. The data will be copied from the current index in this function. The num rows must be the sum of the original and additional datasets and cols must be the graph degree. This view will be stored in the output index. It is the caller’s responsibility to ensure that dataset stays alive as long as the index. This option is useful when users want to manage the memory space for the graph themselves.
Parameters
Returns
void
Additional overload: neighbors::cagra::extend
Add new vectors to a CAGRA index
Usage example:
part. The data will be copied from the current index in this function. The num rows must be the sum of the original and additional datasets, cols must be the dimension of the dataset, and the stride must be the same as the original index dataset. This view will be stored in the output index. It is the caller’s responsibility to ensure that dataset stays alive as long as the index. This option is useful when users want to manage the memory space for the dataset themselves. The data will be copied from the current index in this function. The num rows must be the sum of the original and additional datasets and cols must be the graph degree. This view will be stored in the output index. It is the caller’s responsibility to ensure that dataset stays alive as long as the index. This option is useful when users want to manage the memory space for the graph themselves.
Parameters
Returns
void
Additional overload: neighbors::cagra::extend
Add new vectors to a CAGRA index
Usage example:
part. The data will be copied from the current index in this function. The num rows must be the sum of the original and additional datasets, cols must be the dimension of the dataset, and the stride must be the same as the original index dataset. This view will be stored in the output index. It is the caller’s responsibility to ensure that dataset stays alive as long as the index. This option is useful when users want to manage the memory space for the dataset themselves. The data will be copied from the current index in this function. The num rows must be the sum of the original and additional datasets and cols must be the graph degree. This view will be stored in the output index. It is the caller’s responsibility to ensure that dataset stays alive as long as the index. This option is useful when users want to manage the memory space for the graph themselves.
Parameters
Returns
void
Additional overload: neighbors::cagra::extend
Add new vectors to a CAGRA index
Usage example:
part. The data will be copied from the current index in this function. The num rows must be the sum of the original and additional datasets, cols must be the dimension of the dataset, and the stride must be the same as the original index dataset. This view will be stored in the output index. It is the caller’s responsibility to ensure that dataset stays alive as long as the index. This option is useful when users want to manage the memory space for the dataset themselves. The data will be copied from the current index in this function. The num rows must be the sum of the original and additional datasets and cols must be the graph degree. This view will be stored in the output index. It is the caller’s responsibility to ensure that dataset stays alive as long as the index. This option is useful when users want to manage the memory space for the graph themselves.
Parameters
Returns
void
Additional overload: neighbors::cagra::extend
Add new vectors to a CAGRA index
Usage example:
part. The data will be copied from the current index in this function. The num rows must be the sum of the original and additional datasets, cols must be the dimension of the dataset, and the stride must be the same as the original index dataset. This view will be stored in the output index. It is the caller’s responsibility to ensure that dataset stays alive as long as the index. This option is useful when users want to manage the memory space for the dataset themselves. The data will be copied from the current index in this function. The num rows must be the sum of the original and additional datasets and cols must be the graph degree. This view will be stored in the output index. It is the caller’s responsibility to ensure that dataset stays alive as long as the index. This option is useful when users want to manage the memory space for the graph themselves.
Parameters
Returns
void
Additional overload: neighbors::cagra::extend
Add new vectors to a CAGRA index
Usage example:
part. The data will be copied from the current index in this function. The num rows must be the sum of the original and additional datasets, cols must be the dimension of the dataset, and the stride must be the same as the original index dataset. This view will be stored in the output index. It is the caller’s responsibility to ensure that dataset stays alive as long as the index. This option is useful when users want to manage the memory space for the dataset themselves. The data will be copied from the current index in this function. The num rows must be the sum of the original and additional datasets and cols must be the graph degree. This view will be stored in the output index. It is the caller’s responsibility to ensure that dataset stays alive as long as the index. This option is useful when users want to manage the memory space for the graph themselves.
Parameters
Returns
void
CAGRA search functions
none_sample_filter
Search ANN using the constructed index.
CAGRA serialize functions
neighbors::cagra::serialize
Save the index to file.
Experimental, both the API and the serialization format are subject to change.
Parameters
Returns
void
neighbors::cagra::deserialize
Load index from file.
Experimental, both the API and the serialization format are subject to change.
Parameters
Returns
void
Additional overload: neighbors::cagra::serialize
Write the index to an output stream
Experimental, both the API and the serialization format are subject to change.
Parameters
Returns
void
Additional overload: neighbors::cagra::deserialize
Load index from input stream
Experimental, both the API and the serialization format are subject to change.
Parameters
Returns
void
Additional overload: neighbors::cagra::serialize
Save the index to file.
Experimental, both the API and the serialization format are subject to change.
Parameters
Returns
void
Additional overload: neighbors::cagra::deserialize
Load index from file.
Experimental, both the API and the serialization format are subject to change.
Parameters
Returns
void
Additional overload: neighbors::cagra::serialize
Write the index to an output stream
Experimental, both the API and the serialization format are subject to change.
Parameters
Returns
void
Additional overload: neighbors::cagra::deserialize
Load index from input stream
Experimental, both the API and the serialization format are subject to change.
Parameters
Returns
void
Additional overload: neighbors::cagra::serialize
Save the index to file.
Experimental, both the API and the serialization format are subject to change.
Parameters
Returns
void
Additional overload: neighbors::cagra::deserialize
Load index from file.
Experimental, both the API and the serialization format are subject to change.
Parameters
Returns
void
Additional overload: neighbors::cagra::serialize
Write the index to an output stream
Experimental, both the API and the serialization format are subject to change.
Parameters
Returns
void
Additional overload: neighbors::cagra::deserialize
Load index from input stream
Experimental, both the API and the serialization format are subject to change.
Parameters
Returns
void
Additional overload: neighbors::cagra::serialize
Save the index to file.
Experimental, both the API and the serialization format are subject to change.
Parameters
Returns
void
Additional overload: neighbors::cagra::deserialize
Load index from file.
Experimental, both the API and the serialization format are subject to change.
Parameters
Returns
void
Additional overload: neighbors::cagra::serialize
Write the index to an output stream
Experimental, both the API and the serialization format are subject to change.
Parameters
Returns
void
Additional overload: neighbors::cagra::deserialize
Load index from input stream
Experimental, both the API and the serialization format are subject to change.
Parameters
Returns
void
neighbors::cagra::serialize_to_hnswlib
Write the CAGRA built index as a base layer HNSW index to an output stream
NOTE: The saved index can only be read by the hnswlib wrapper in cuVS, as the serialization format is not compatible with the original hnswlib.
Experimental, both the API and the serialization format are subject to change.
Parameters
Returns
void
Additional overload: neighbors::cagra::serialize_to_hnswlib
Save a CAGRA build index in hnswlib base-layer-only serialized format
NOTE: The saved index can only be read by the hnswlib wrapper in cuVS, as the serialization format is not compatible with the original hnswlib.
Experimental, both the API and the serialization format are subject to change.
Parameters
Returns
void
Additional overload: neighbors::cagra::serialize_to_hnswlib
Write the CAGRA built index as a base layer HNSW index to an output stream
NOTE: The saved index can only be read by the hnswlib wrapper in cuVS, as the serialization format is not compatible with the original hnswlib.
Experimental, both the API and the serialization format are subject to change.
Parameters
Returns
void
Additional overload: neighbors::cagra::serialize_to_hnswlib
Save a CAGRA build index in hnswlib base-layer-only serialized format
NOTE: The saved index can only be read by the hnswlib wrapper in cuVS, as the serialization format is not compatible with the original hnswlib.
Experimental, both the API and the serialization format are subject to change.
Parameters
Returns
void
Additional overload: neighbors::cagra::serialize_to_hnswlib
Write the CAGRA built index as a base layer HNSW index to an output stream
NOTE: The saved index can only be read by the hnswlib wrapper in cuVS, as the serialization format is not compatible with the original hnswlib.
Experimental, both the API and the serialization format are subject to change.
Parameters
Returns
void
Additional overload: neighbors::cagra::serialize_to_hnswlib
Save a CAGRA build index in hnswlib base-layer-only serialized format
NOTE: The saved index can only be read by the hnswlib wrapper in cuVS, as the serialization format is not compatible with the original hnswlib.
Experimental, both the API and the serialization format are subject to change.
Parameters
Returns
void
Additional overload: neighbors::cagra::serialize_to_hnswlib
Write the CAGRA built index as a base layer HNSW index to an output stream
NOTE: The saved index can only be read by the hnswlib wrapper in cuVS, as the serialization format is not compatible with the original hnswlib.
Experimental, both the API and the serialization format are subject to change.
Parameters
Returns
void
Additional overload: neighbors::cagra::serialize_to_hnswlib
Save a CAGRA build index in hnswlib base-layer-only serialized format
NOTE: The saved index can only be read by the hnswlib wrapper in cuVS, as the serialization format is not compatible with the original hnswlib.
Experimental, both the API and the serialization format are subject to change.
Parameters
Returns
void