Usage | cuVS

This page shows how to configure algorithms, run cuVS Bench from the command line, use Docker containers, and read the build and search results. The command-line runner uses the cuVS Bench orchestrator and the default cpp_gbench backend, so the end-to-end workflow can stay fully CLI-based. For dataset formats, built-in datasets, custom dataset descriptors, and ground-truth generation, see Benchmark Datasets.

Creating and customizing algorithm configurations

Algorithm YAML files define the build and search parameter sweeps for each algorithm. Dataset YAML files are covered in Benchmark Datasets.

Algorithm configs live in ${CUVS_HOME}/python/cuvs_bench/cuvs_bench/config/algos. A cuvs_cagra config looks like this:

1 name: cuvs_cagra
2 constraints:
3   build: cuvs_bench.config.algos.constraints.cuvs_cagra_build
4   search: cuvs_bench.config.algos.constraints.cuvs_cagra_search
5 groups:
6   base:
7     build:
8       graph_degree: [32, 64, 96, 128]
9       intermediate_graph_degree: [32, 64, 96, 128]
10       graph_build_algo: ["NN_DESCENT"]
11     search:
12       itopk: [32, 64, 128, 256, 512]
13       search_width: [1, 2, 4, 8, 16, 32, 64]
14 
15   test:
16     build:
17       graph_degree: [32]
18       intermediate_graph_degree: [32]
19       graph_build_algo: ["NN_DESCENT"]
20     search:
21       itopk: [32]
22       search_width: [1, 2]
23 
24   persistent:
25     build:
26       graph_degree: [32, 64, 96]
27       intermediate_graph_degree: [128]
28       graph_build_algo: ["IVF_PQ"]
29     search:
30       persistent: [true]
31       persistent_device_usage: [0.95]
32       algo: ["single_cta"]
33       itopk: [32, 64, 128, 256, 512]
34       max_iterations: [0, 16]
35       search_width: [1, 2, 4, 8]

Each config has three main fields:

Field	Purpose
`name`	Algorithm name.
`constraints`	Optional Python functions that reject invalid build or search parameter combinations.
`groups`	Named parameter sweeps. Each group expands the cross product of its `build` and `search` values.

Create a custom YAML file with a base group to override the default benchmark parameters. For parameter guidance, see the ANN Algorithm Parameter Tuning Guide.

Library	Algorithms
FAISS_GPU	`faiss_gpu_flat`, `faiss_gpu_ivf_flat`, `faiss_gpu_ivf_pq`, `faiss_gpu_cagra`
FAISS_CPU	`faiss_cpu_flat`, `faiss_cpu_ivf_flat`, `faiss_cpu_ivf_pq`, `faiss_cpu_hnsw_flat`
GGNN	`ggnn`
HNSWLIB	`hnswlib`
DiskANN	`diskann_memory`, `diskann_ssd`
cuVS	`cuvs_brute_force`, `cuvs_cagra`, `cuvs_ivf_flat`, `cuvs_ivf_pq`, `cuvs_cagra_hnswlib`, `cuvs_vamana`

Multi-GPU algorithms

cuVS Bench includes single-node multi-GPU versions of IVF-Flat, IVF-PQ, and CAGRA.

Index type	Multi-GPU algo name
IVF-Flat	`cuvs_mg_ivf_flat`
IVF-PQ	`cuvs_mg_ivf_pq`
CAGRA	`cuvs_mg_cagra`

Smaller-scale benchmarks (<1M to 10M vectors)

Use cuvs_bench.get_dataset to prepare a built-in dataset. By default, datasets are stored under RAPIDS_DATASET_ROOT_DIR when that environment variable is set, or under a local datasets directory otherwise.

$ # (1) Prepare dataset.
$ python -m cuvs_bench.get_dataset --dataset deep-image-96-angular --normalize
$ 
$ # (2) Build and search index.
$ python -m cuvs_bench.run --dataset deep-image-96-inner --algorithms cuvs_cagra --batch-size 10 -k 10 --build --search
$ 
$ # (3) Export data.
$ python -m cuvs_bench.run --data-export --dataset deep-image-96-inner
$ 
$ # (4) Plot results.
$ python -m cuvs_bench.plot --dataset deep-image-96-inner

For available built-in datasets and ground-truth details, see Benchmark Datasets.

Large-scale benchmarks (>10M vectors)

cuvs_bench.get_dataset does not download billion-scale datasets. Prepare large datasets first using Benchmark Datasets, then run the same build, search, export, and plot workflow. Datasets at this scale are best suited to large-memory GPUs such as A100 or H100.

The following example prepares ground truth for the Yandex Deep-1B dataset and then runs benchmarks on a 100M-vector subset:

$ mkdir -p datasets/deep-1B
$ 
$ # (1) Prepare dataset.
$ # Download the Yandex DEEP "Ground Truth" file manually.
$ # Suppose the file name is deep_new_groundtruth.public.10K.bin.
$ python -m cuvs_bench.split_groundtruth --groundtruth datasets/deep-1B/deep_new_groundtruth.public.10K.bin
$ 
$ # The split step creates groundtruth.neighbors.ibin and groundtruth.distances.fbin.

$ # (2) Build and search index.
$ python -m cuvs_bench.run --dataset deep-1B --algorithms cuvs_cagra --batch-size 10 -k 10 --build --search
$ 
$ # (3) Export data.
$ python -m cuvs_bench.run --data-export --dataset deep-1B
$ 
$ # (4) Plot results.
$ python -m cuvs_bench.plot --dataset deep-1B

Use python -m cuvs_bench.split_groundtruth --help to see the full CLI help for the ground-truth split command.

Running with Docker containers

Docker images can run the full workflow or open a shell for manual commands. See Installation for image and tag guidance.

End-to-end run on GPU

Set DATA_FOLDER to a local directory. Datasets are stored in $DATA_FOLDER/datasets and results in $DATA_FOLDER/result.

$ export DATA_FOLDER=path/to/store/datasets/and/results
$ docker run --gpus all --rm -it -u $(id -u)                      \
>     -v $DATA_FOLDER:/data/benchmarks                            \
>     rapidsai/cuvs-bench:26.06a-cuda12-py3.13              \
>     "--dataset deep-image-96-angular"                           \
>     "--normalize"                                               \
>     "--algorithms cuvs_cagra,cuvs_ivf_pq --batch-size 10 -k 10" \
>     ""

Argument	Description
`rapidsai/cuvs-bench:26.06a-cuda12-py3.13`	Image to use. See Installation for available tags.
`"--dataset deep-image-96-angular"`	Dataset name.
`"--normalize"`	Normalizes the dataset before benchmarking.
`"--algorithms cuvs_cagra,hnswlib --batch-size 10 -k 10"`	Arguments passed to the benchmark run script.
`""`	Optional arguments passed to the plot script.

The -u $(id -u) flag lets the container user match the host user, so files written to the mounted volume keep usable permissions.

End-to-end run on CPU

Use the CPU image and omit --gpus all on systems without a GPU.

$ export DATA_FOLDER=path/to/store/datasets/and/results
$ docker run  --rm -it -u $(id -u)                  \
>     -v $DATA_FOLDER:/data/benchmarks              \
>     rapidsai/cuvs-bench-cpu:26.06a-py3.13     \
>      "--dataset deep-image-96-angular"            \
>      "--normalize"                                \
>      "--algorithms hnswlib --batch-size 10 -k 10" \
>      ""

Manual container workflow

All cuvs-bench images include the Conda packages. Start a shell when you want to run individual commands yourself:

$ export DATA_FOLDER=path/to/store/datasets/and/results
$ docker run --gpus all --rm -it -u $(id -u)          \
>     --entrypoint /bin/bash                          \
>     --workdir /data/benchmarks                      \
>     -v $DATA_FOLDER:/data/benchmarks                \
>     rapidsai/cuvs-bench:26.06a-cuda12-py3.13

Inside the container, run the same Python modules directly:

$ (base) root@00b068fbb862:/data/benchmarks# python -m cuvs_bench.get_dataset --dataset deep-image-96-angular --normalize

Containers can also run in detached mode.

Evaluating results

Build benchmarks report:

Name	Description
Benchmark	Name that identifies the benchmark instance.
Time	Wall time spent training the index.
CPU	CPU time spent training the index.
Iterations	Number of iterations, usually 1.
GPU	GPU time spent building.
index_size	Number of vectors used to train the index.

Search benchmarks report:

Name	Description
Benchmark	Name that identifies the benchmark instance.
Time	Wall time for a single batch divided by the number of threads.
CPU	Average CPU time, not including idle time while waiting for GPU synchronization.
Iterations	Total number of batches, equal to `total_queries / n_queries`.
GPU	GPU latency for a single batch. In throughput mode, this is averaged across threads.
Latency	Batch latency from wall-clock time. In throughput mode, this is averaged across threads.
Recall	Fraction of returned neighbors that match ground truth. Present only when ground truth is configured.
items_per_second	Total throughput, or queries per second, approximately `total_queries / end_to_end`.
k	Number of neighbors requested per query.
end_to_end	Total time to run all batches across all iterations.
n_queries	Number of query vectors in each batch.
total_queries	Total query count, equal to `iterations * n_queries`.

Time and end_to_end are measured differently, so end_to_end = Time * Iterations is only approximate. Output tables may also include the hyper-parameters for each benchmarked configuration. Recall can fluctuate when fewer queries are processed than the benchmark contains, because processed query count depends on iteration count.

Summary

cuVS Bench usage has three main steps: configure datasets and algorithm sweeps, run build and search through Python or Docker, and compare the reported build and search measurements. Start with built-in datasets for smaller tests, prepare large datasets separately for scale testing, and use the result tables to compare quality, latency, throughput, build time, and resource behavior across parameter settings.