Cagra

View as Markdown

Python module: cuvs.neighbors.cagra

AceParams

1cdef class AceParams

Parameters for ACE (Augmented Core Extraction) graph building algorithm.

ACE enables building indexes for datasets too large to fit in GPU memory by partitioning the dataset using balanced k-means and building sub-indexes for each partition independently.

Parameters

NameTypeDescription
npartitionsint, default = 0Number of partitions for ACE partitioned build. When set to 0 (default), the number of partitions is automatically derived based on available host and GPU memory to maximize partition size while ensuring the build fits in memory.

Small values might improve recall but potentially degrade performance and increase memory usage. Partitions should not be too small to prevent issues in KNN graph construction. The partition size is on average 2 * (n_rows / npartitions) * dim * sizeof(T). 2 is because of the core and augmented vectors. Please account for imbalance in the partition sizes (up to 3x in our tests).

If the specified number of partitions results in partitions that exceed available memory, the value will be automatically increased to fit memory constraints and a warning will be issued.
ef_constructionint, default = 120The index quality for the ACE build. Bigger values increase the index quality. At some point, increasing this will no longer improve the quality.
build_dirstr, default = "/tmp/ace_build"Directory to store ACE build artifacts (e.g., KNN graph, optimized graph). Used when use_disk is true or when the graph does not fit in host and GPU memory. This should be the fastest disk in the system and hold enough space for twice the dataset, final graph, and label mapping.
use_diskbool, default = FalseWhether to use disk-based storage for ACE build. When true, enables disk-based operations for memory-efficient graph construction.
max_host_memory_gbfloat, default = 0Maximum host memory to use for ACE build in GiB. When set to 0 (default), uses available host memory. Useful for testing or when running alongside other memory-intensive processes.
max_gpu_memory_gbfloat, default = 0Maximum GPU memory to use for ACE build in GiB. When set to 0 (default), uses available GPU memory. Useful for testing or when running alongside other memory-intensive processes.

Constructor

1def __init__(self, *, npartitions=0, ef_construction=120, build_dir="/tmp/ace_build", use_disk=False, max_host_memory_gb=0, max_gpu_memory_gb=0)

Members

NameKind
npartitionsproperty
ef_constructionproperty
build_dirproperty
use_diskproperty
max_host_memory_gbproperty
max_gpu_memory_gbproperty
get_handlemethod

npartitions

1def npartitions(self)

ef_construction

1def ef_construction(self)

build_dir

1def build_dir(self)

use_disk

1def use_disk(self)

max_host_memory_gb

1def max_host_memory_gb(self)

max_gpu_memory_gb

1def max_gpu_memory_gb(self)

get_handle

1def get_handle(self)

CompressionParams

1cdef class CompressionParams

Parameters for VPQ Compression

Parameters

NameTypeDescription
pq_bitsintThe bit length of the vector element after compression by PQ. Possible values: [4, 5, 6, 7, 8]. The smaller the ‘pq_bits’, the smaller the index size and the better the search performance, but the lower the recall.
pq_dimintThe dimensionality of the vector after compression by PQ. When zero, an optimal value is selected using a heuristic.
vq_n_centersintVector Quantization (VQ) codebook size - number of “coarse cluster centers”. When zero, an optimal value is selected using a heuristic.
kmeans_n_itersintThe number of iterations searching for kmeans centers (both VQ & PQ phases).
vq_kmeans_trainset_fractionfloatThe fraction of data to use during iterative kmeans building (VQ phase). When zero, an optimal value is selected using a heuristic.
pq_kmeans_trainset_fractionfloatThe fraction of data to use during iterative kmeans building (PQ phase). When zero, an optimal value is selected using a heuristic.

Constructor

1def __init__(self, *, pq_bits=8, pq_dim=0, vq_n_centers=0, kmeans_n_iters=25, vq_kmeans_trainset_fraction=0.0, pq_kmeans_trainset_fraction=0.0)

Members

NameKind
pq_bitsproperty
pq_dimproperty
vq_n_centersproperty
kmeans_n_itersproperty
vq_kmeans_trainset_fractionproperty
pq_kmeans_trainset_fractionproperty
get_handlemethod

pq_bits

1def pq_bits(self)

pq_dim

1def pq_dim(self)

vq_n_centers

1def vq_n_centers(self)

kmeans_n_iters

1def kmeans_n_iters(self)

vq_kmeans_trainset_fraction

1def vq_kmeans_trainset_fraction(self)

pq_kmeans_trainset_fraction

1def pq_kmeans_trainset_fraction(self)

get_handle

1def get_handle(self)

ExtendParams

1cdef class ExtendParams

Supplemental parameters to extend CAGRA Index

Parameters

NameTypeDescription
max_chunk_sizeintThe additional dataset is divided into chunks and added to the graph. This is the knob to adjust the tradeoff between the recall and operation throughput. Large chunk sizes can result in high throughput, but use more working memory (O(max_chunk_size*degree^2)). This can also degrade recall because no edges are added between the nodes in the same chunk. Auto select when 0.

Constructor

1def __init__(self, *, max_chunk_size=None)

Members

NameKind
max_chunk_sizeproperty

max_chunk_size

1def max_chunk_size(self)

Index

1cdef class Index

Members

NameKind
trainedproperty
dimproperty
graph_degreeproperty
dtypeproperty
datasetproperty
graphproperty

trained

1def trained(self)

dim

1def dim(self)

graph_degree

1def graph_degree(self)

dtype

1def dtype(self)

dataset

1def dataset(self)

graph

1def graph(self)

IndexParams

1cdef class IndexParams

Parameters to build index for CAGRA nearest neighbor search

Parameters

NameTypeDescription
metricstr, default = "sqeuclidean"String denoting the metric type, valid values for metric are [“sqeuclidean”, “inner_product”, “cosine”], where:

- sqeuclidean is the euclidean distance without the square root operation, i.e.: distance(a,b) = \sum_i (a_i - b_i)^2
- inner_product distance is defined as distance(a, b) = \sum_i a_i * b_i.
- cosine distance is defined as distance(a, b) = 1 - \sum_i a_i * b_i / ( ||a||_2 * ||b||_2).
intermediate_graph_degreeint, default = 128
graph_degreeint, default = 64
build_algostr, default = "ivf_pq"string denoting the graph building algorithm to use. Valid values for algo: [“ivf_pq”, “nn_descent”, “iterative_cagra_search”, “ace”], where

- ivf_pq will use the IVF-PQ algorithm for building the knn graph
- nn_descent (experimental) will use the NN-Descent algorithm for building the knn graph. It is expected to be generally faster than ivf_pq.
- iterative_cagra_search will iteratively build the knn graph using CAGRA’s search() and optimize()
- ace will use ACE (Augmented Core Extraction) for building indices for datasets too large to fit in GPU memory
compressionCompressionParams, optionalIf compression is desired should be a CompressionParams object. If None compression will be disabled.
ivf_pq_build_paramscuvs.neighbors.ivf_pq.IndexParams, optionalParameters for IVF-PQ algorithm. If provided, it will be used for building the graph.
ivf_pq_search_paramscuvs.neighbors.ivf_pq.SearchParams, optionalParameters for IVF-PQ search. If provided, it will be used for searching the graph.
ace_paramsAceParams, optionalParameters for ACE algorithm. If provided, it will be used for building the graph with ACE partitioning.
refinement_ratefloat, default = 1.0

Constructor

1def __init__(self, *, metric="sqeuclidean", intermediate_graph_degree=128, graph_degree=64, build_algo="ivf_pq", nn_descent_niter=20, compression=None, ivf_pq_build_params: ivf_pq.IndexParams = None, ivf_pq_search_params: ivf_pq.SearchParams = None, ace_params: AceParams = None, refinement_rate: float = 1.0)

Members

NameKind
get_handlemethod
metricproperty
intermediate_graph_degreeproperty
graph_degreeproperty
build_algoproperty
nn_descent_niterproperty
refinement_rateproperty

get_handle

1def get_handle(self)

metric

1def metric(self)

intermediate_graph_degree

1def intermediate_graph_degree(self)

graph_degree

1def graph_degree(self)

build_algo

1def build_algo(self)

nn_descent_niter

1def nn_descent_niter(self)

refinement_rate

1def refinement_rate(self)

SearchParams

1cdef class SearchParams

CAGRA search parameters

Parameters

NameTypeDescription
max_queriesint, default = 0Maximum number of queries to search at the same time (batch size). Auto select when 0.
itopk_sizeint, default = 64Number of intermediate search results retained during the search. This is the main knob to adjust trade off between accuracy and search speed. Higher values improve the search accuracy.
max_iterationsint, default = 0Upper limit of search iterations. Auto select when 0.
algostr, default = "auto"String denoting the search algorithm to use Valid values for algo: [“auto”, “single_cta”, “multi_cta”], where:

- auto will automatically select the best value based on query size
- single_cta is better when query contains larger number of vectors (e.g >10)
- multi_cta is better when query contains only a few vectors
team_sizeint, default = 0Number of threads used to calculate a single distance. 4, 8, 16, or 32.
search_widthint, default = 1Number of graph nodes to select as the starting point for the search in each iteration.
min_iterationsint, default = 0Lower limit of search iterations.
thread_block_sizeint, default = 0Thread block size. 0, 64, 128, 256, 512, 1024. Auto selection when 0.
hashmap_modestr, default = "auto"String denoting the type of hash map to use. It’s usually better to allow the algorithm to select this value, Valid values for hashmap_mode: [“auto”, “small”, “hash”], where:

- auto will automatically select the best value based on algo
- small will use the small shared memory hash table with resetting.
- hash will use a single hash table in global memory.
hashmap_min_bitlenint, default = 0Upper limit of hashmap fill rate. More than 0.1, less than 0.9.
hashmap_max_fill_ratefloat, default = 0.5Upper limit of hashmap fill rate. More than 0.1, less than 0.9.
num_random_samplingsint, default = 1Number of iterations of initial random seed node selection. 1 or more.
rand_xor_maskint, default = 0x128394Bit mask used for initial random seed node selection.
persistentbool, default = falseWhether to use the persistent version of the kernel
persistent_lifetimefloatPersistent kernel: time in seconds before the kernel stops if no requests are received.
persistent_device_usagefloatSets the fraction of maximum grid size used by persistent kernel.

Constructor

1def __init__(self, *, max_queries=0, itopk_size=64, max_iterations=0, algo="auto", team_size=0, search_width=1, min_iterations=0, thread_block_size=0, hashmap_mode="auto", hashmap_min_bitlen=0, hashmap_max_fill_rate=0.5, num_random_samplings=1, rand_xor_mask=0x128394, persistent=False, persistent_lifetime=None, persistent_device_usage=None )

Members

NameKind
get_handlemethod
max_queriesproperty
itopk_sizeproperty
max_iterationsproperty
algoproperty
team_sizeproperty
search_widthproperty
min_iterationsproperty
thread_block_sizeproperty
hashmap_modeproperty
hashmap_min_bitlenproperty
hashmap_max_fill_rateproperty
num_random_samplingsproperty
rand_xor_maskproperty

get_handle

1def get_handle(self)

max_queries

1def max_queries(self)

itopk_size

1def itopk_size(self)

max_iterations

1def max_iterations(self)

algo

1def algo(self)

team_size

1def team_size(self)

search_width

1def search_width(self)

min_iterations

1def min_iterations(self)

thread_block_size

1def thread_block_size(self)

hashmap_mode

1def hashmap_mode(self)

hashmap_min_bitlen

1def hashmap_min_bitlen(self)

hashmap_max_fill_rate

1def hashmap_max_fill_rate(self)

num_random_samplings

1def num_random_samplings(self)

rand_xor_mask

1def rand_xor_mask(self)

build

@auto_sync_resources

1def build(IndexParams index_params, dataset, resources=None)

Build the CAGRA index from the dataset for efficient search.

The build performs two different steps- first an intermediate knn-graph is constructed, then it’s optimized it to create the final graph. The index_params object controls the node degree of these graphs.

It is required that both the dataset and the optimized graph fit the GPU memory.

Note: When using ACE (Augmented Core Extraction) build algorithm, the dataset must be in host memory (CPU). The ACE algorithm is designed for datasets too large to fit in GPU memory.

The following distance metrics are supported:

  • L2
  • InnerProduct
  • Cosine

Parameters

NameTypeDescription
index_paramsIndexParams object
datasetCUDA array interface compliant matrix shape (n_samples, dim)Supported dtype [float, half, int8, uint8] Note: For ACE build algorithm, the dataset MUST be in host memory. Use NumPy arrays or call .get() on CuPy arrays before passing.
resourcescuvs.common.Resources, optional

Returns

NameTypeDescription
indexcuvs.cagra.Index

Examples

1>>> import cupy as cp
2>>> from cuvs.neighbors import cagra
3>>> n_samples = 50000
4>>> n_features = 50
5>>> n_queries = 1000
6>>> k = 10
7>>> dataset = cp.random.random_sample((n_samples, n_features),
8... dtype=cp.float32)
9>>> build_params = cagra.IndexParams(metric="sqeuclidean")
10>>> index = cagra.build(build_params, dataset)
11>>> queries = cp.random.random_sample((n_queries, n_features),
12... dtype=cp.float32)
13>>> distances, neighbors = cagra.search(cagra.SearchParams(),
14... index, queries,
15... k)
16>>> distances = cp.asarray(distances)
17>>> neighbors = cp.asarray(neighbors)

extend

@auto_sync_resources

1def extend(ExtendParams params, Index index, additional_dataset, resources=None)

Extend a CAGRA index with additional vectors

Parameters

NameTypeDescription
paramsExtendParams object
indexIndexExisting cagra index to extend
additional_datasetCUDA array interface compliant matrix shapeSupported dtype [float, half, int8, uint8]
resourcescuvs.common.Resources, optional

from_graph

@auto_sync_resources

1def from_graph(graph, dataset, metric="sqeuclidean", resources=None)

Construct a cagra index from an existing graph and dataset

Parameters

NameTypeDescription
graphArray interface compliant matrix with shape (n_samples, graph_degree)
datasetArray interface compliant matrix shape (n_samples, dim)Supported dtype [float32, float16, int8, uint8]
metricstr
resourcescuvs.common.Resources, optional

Returns

NameTypeDescription
indexcuvs.cagra.Index

load

@auto_sync_resources

1def load(filename, resources=None)

Loads index from file.

Saving / loading the index is experimental. The serialization format is subject to change, therefore loading an index saved with a previous version of cuvs is not guaranteed to work.

Parameters

NameTypeDescription
filenamestringName of the file.
resourcescuvs.common.Resources, optional

Returns

NameTypeDescription
indexIndex

save

@auto_sync_resources

1def save(filename, Index index, bool include_dataset=True, resources=None)

Saves the index to a file.

Saving / loading the index is experimental. The serialization format is subject to change.

Parameters

NameTypeDescription
filenamestringName of the file.
indexIndexTrained CAGRA index.
include_datasetboolWhether or not to write out the dataset along with the index. Including the dataset in the serialized index will use extra disk space, and might not be desired if you already have a copy of the dataset on disk. If this option is set to false, you will have to call index.update_dataset(dataset) after loading the index.
resourcescuvs.common.Resources, optional

Examples

1>>> import cupy as cp
2>>> from cuvs.neighbors import cagra
3>>> n_samples = 50000
4>>> n_features = 50
5>>> dataset = cp.random.random_sample((n_samples, n_features),
6... dtype=cp.float32)
7>>> # Build index
8>>> index = cagra.build(cagra.IndexParams(), dataset)
9>>> # Serialize and deserialize the cagra index built
10>>> cagra.save("my_index.bin", index)
11>>> index_loaded = cagra.load("my_index.bin")

@auto_sync_resources @auto_convert_output

1def search(SearchParams search_params, Index index, queries, k, neighbors=None, distances=None, resources=None, filter=None)

Find the k nearest neighbors for each query.

Parameters

NameTypeDescription
search_paramsSearchParams
indexIndexTrained CAGRA index.
queriesCUDA array interface compliant matrix shape (n_samples, dim)Supported dtype [float, int8, uint8]
kintThe number of neighbors.
neighborsOptional CUDA array interface compliant matrix shape(n_queries, k), dtype int64_t. If supplied, neighbor indices will be written here in-place. (default None)
distancesOptional CUDA array interface compliant matrix shape(n_queries, k) If supplied, the distances to the neighbors will be written here in-place. (default None)
filterOptional cuvs.neighbors.cuvsFilter can be used to filterneighbors based on a given bitset. (default None)
resourcescuvs.common.Resources, optional

Examples

1>>> import cupy as cp
2>>> from cuvs.neighbors import cagra
3>>> n_samples = 50000
4>>> n_features = 50
5>>> n_queries = 1000
6>>> dataset = cp.random.random_sample((n_samples, n_features),
7... dtype=cp.float32)
8>>> # Build index
9>>> index = cagra.build(cagra.IndexParams(), dataset)
10>>> # Search using the built index
11>>> queries = cp.random.random_sample((n_queries, n_features),
12... dtype=cp.float32)
13>>> k = 10
14>>> search_params = cagra.SearchParams(
15... max_queries=100,
16... itopk_size=64
17... )
18>>> # Using a pooling allocator reduces overhead of temporary array
19>>> # creation during search. This is useful if multiple searches
20>>> # are performed with same query size.
21>>> distances, neighbors = cagra.search(search_params, index, queries,
22... k)
23>>> neighbors = cp.asarray(neighbors)
24>>> distances = cp.asarray(distances)