cuVS Bench Parameter Tuning Guide
This guide outlines the various parameter settings that can be specified in cuVS Benchmarks yaml configuration files and explains the impact they have on corresponding algorithms to help inform their settings for benchmarking across desired levels of recall.
Benchmark modes
When you run benchmarks with BenchmarkOrchestrator.run_benchmark(), you can choose how parameters are explored:
Sweep mode (default)
Pass mode="sweep" or omit mode. The orchestrator builds the full Cartesian product of all build and search parameter lists defined in the algorithm YAML (see Creating and customizing algorithm configurations). Every valid combination (after constraint filtering) is run. Use this for exhaustive comparison across the configured parameter grid.
Tune mode
Pass mode="tune" to perform hyperparameter optimization using Optuna instead of running every combination. You must pass:
- constraints (dict): The optimization target and optional bounds. One metric must be
"maximize"or"minimize"(the goal). Others can set hard limits with{"min": X}or{"max": X}. Examples:{"recall": "maximize", "latency": {"max": 10}}or{"latency": "minimize", "recall": {"min": 0.95}}. - n_trials (int, optional): Maximum number of Optuna trials (default 100). Ignored in sweep mode.
Example:
The parameter tables below describe the build and search knobs that sweep mode varies and that tune mode can optimize.
cuVS Indexes
cuvs_brute_force
Use cuVS brute-force index for exact search. Brute-force has no further build or search parameters.
cuvs_ivf_flat
IVF-flat uses an inverted-file index, which partitions the vectors into a series of clusters, or lists, storing them in an interleaved format which is optimized for fast distance computation. The searching of an IVF-flat index reduces the total vectors in the index to those within some user-specified nearest clusters called probes.
IVF-flat is a simple algorithm which won’t save any space, but it provides competitive search times even at higher levels of recall.
cuvs_ivf_pq
IVF-pq is an inverted-file index, which partitions the vectors into a series of clusters, or lists, in a similar way to IVF-flat above. The difference is that IVF-PQ uses product quantization to also compress the vectors, giving the index a smaller memory footprint. Unfortunately, higher levels of compression can also shrink recall, which a refinement step can improve when the original vectors are still available.
cuvs_cagra
CAGRA uses a graph-based index, which creates an intermediate, approximate kNN graph using IVF-PQ and then further refining and optimizing to create a final kNN graph. This kNN graph is used by CAGRA as an index for search.
The graph_memory_type or internal_dataset_memory_type options can be useful for large datasets that do not fit the device memory. Setting internal_dataset_memory_type other than device has negative impact on search speed. Using host_huge_page option is only supported on systems with Heterogeneous Memory Management or on platforms that natively support GPU access to system allocated memory, for example Grace Hopper.
To fine tune CAGRA index building we can customize IVF-PQ index builder options using the following settings. These take effect only if graph_build_algo == "IVF_PQ". It is recommended to experiment using a separate IVF-PQ index to find the config that gives the largest QPS for large batch. Recall does not need to be very high, since CAGRA further optimizes the kNN neighbor graph. Some of the default values are derived from the dataset size which is assumed to be [n_vecs, dim].
Alternatively, if graph_build_algo == "NN_DESCENT", then we can customize the following parameters
cuvs_cagra_hnswlib
This is a benchmark that enables interoperability between CAGRA built HNSW search. It uses the CAGRA built graph as the base layer of an hnswlib index to search queries only within the base layer (this is enabled with a simple patch to hnswlib).
build : Same as build of CAGRA
search : Same as search of Hnswlib
cuvs_vamana
Benchmark for building an in-memory Vamana graph based index on the GPU and interoperability with DiskANN for search.
FAISS Indexes
faiss_gpu_flat
Use FAISS flat index on the GPU, which performs an exact search using brute-force and doesn’t have any further build or search parameters.
faiss_gpu_ivf_flat
IVF-flat uses an inverted-file index, which partitions the vectors into a series of clusters, or lists, storing them in an interleaved format which is optimized for fast distance computation. The searching of an IVF-flat index reduces the total vectors in the index to those within some user-specified nearest clusters called probes.
IVF-flat is a simple algorithm which won’t save any space, but it provides competitive search times even at higher levels of recall.
faiss_gpu_ivf_pq
IVF-pq is an inverted-file index, which partitions the vectors into a series of clusters, or lists, in a similar way to IVF-flat above. The difference is that IVF-PQ uses product quantization to also compress the vectors, giving the index a smaller memory footprint. Unfortunately, higher levels of compression can also shrink recall, which a refinement step can improve when the original vectors are still available.
faiss_cpu_flat
Use FAISS flat index on the CPU, which performs an exact search using brute-force and doesn’t have any further build or search parameters.
faiss_cpu_ivf_flat
Use FAISS IVF-Flat index on CPU
faiss_cpu_ivf_pq
Use FAISS IVF-PQ index on CPU
HNSW
cuvs_hnsw
cuVS HNSW builds an HNSW index using the ACE (Augmented Core Extraction) algorithm, which enables GPU-accelerated HNSW index construction for datasets too large to fit in GPU memory.
hnswlib
Please refer to HNSW algorithm parameters guide from hnswlib to learn more about these arguments.
DiskANN
diskann_memory
Use DiskANN in-memory index for approximate search.