Usage

View as Markdown

This page shows how to configure algorithms, run cuVS Bench from the command line, use Docker containers, and read the build and search results. The command-line runner uses the cuVS Bench orchestrator and the default cpp_gbench backend, so the end-to-end workflow can stay fully CLI-based. For dataset formats, built-in datasets, custom dataset descriptors, and ground-truth generation, see Benchmark Datasets.

Creating and customizing algorithm configurations

Algorithm YAML files define the build and search parameter sweeps for each algorithm. Dataset YAML files are covered in Benchmark Datasets.

Algorithm configs live in ${CUVS_HOME}/python/cuvs_bench/cuvs_bench/config/algos. A cuvs_cagra config looks like this:

1name: cuvs_cagra
2constraints:
3 build: cuvs_bench.config.algos.constraints.cuvs_cagra_build
4 search: cuvs_bench.config.algos.constraints.cuvs_cagra_search
5groups:
6 base:
7 build:
8 graph_degree: [32, 64, 96, 128]
9 intermediate_graph_degree: [32, 64, 96, 128]
10 graph_build_algo: ["NN_DESCENT"]
11 search:
12 itopk: [32, 64, 128, 256, 512]
13 search_width: [1, 2, 4, 8, 16, 32, 64]
14
15 test:
16 build:
17 graph_degree: [32]
18 intermediate_graph_degree: [32]
19 graph_build_algo: ["NN_DESCENT"]
20 search:
21 itopk: [32]
22 search_width: [1, 2]
23
24 persistent:
25 build:
26 graph_degree: [32, 64, 96]
27 intermediate_graph_degree: [128]
28 graph_build_algo: ["IVF_PQ"]
29 search:
30 persistent: [true]
31 persistent_device_usage: [0.95]
32 algo: ["single_cta"]
33 itopk: [32, 64, 128, 256, 512]
34 max_iterations: [0, 16]
35 search_width: [1, 2, 4, 8]

Each config has three main fields:

FieldPurpose
nameAlgorithm name.
constraintsOptional Python functions that reject invalid build or search parameter combinations.
groupsNamed parameter sweeps. Each group expands the cross product of its build and search values.

Create a custom YAML file with a base group to override the default benchmark parameters. For parameter guidance, see the ANN Algorithm Parameter Tuning Guide.

LibraryAlgorithms
FAISS_GPUfaiss_gpu_flat, faiss_gpu_ivf_flat, faiss_gpu_ivf_pq, faiss_gpu_cagra
FAISS_CPUfaiss_cpu_flat, faiss_cpu_ivf_flat, faiss_cpu_ivf_pq, faiss_cpu_hnsw_flat
GGNNggnn
HNSWLIBhnswlib
DiskANNdiskann_memory, diskann_ssd
cuVScuvs_brute_force, cuvs_cagra, cuvs_ivf_flat, cuvs_ivf_pq, cuvs_cagra_hnswlib, cuvs_vamana

Multi-GPU algorithms

cuVS Bench includes single-node multi-GPU versions of IVF-Flat, IVF-PQ, and CAGRA.

Index typeMulti-GPU algo name
IVF-Flatcuvs_mg_ivf_flat
IVF-PQcuvs_mg_ivf_pq
CAGRAcuvs_mg_cagra

Smaller-scale benchmarks (<1M to 10M vectors)

Use cuvs_bench.get_dataset to prepare a built-in dataset. By default, datasets are stored under RAPIDS_DATASET_ROOT_DIR when that environment variable is set, or under a local datasets directory otherwise.

$# (1) Prepare dataset.
$python -m cuvs_bench.get_dataset --dataset deep-image-96-angular --normalize
$
$# (2) Build and search index.
$python -m cuvs_bench.run --dataset deep-image-96-inner --algorithms cuvs_cagra --batch-size 10 -k 10 --build --search
$
$# (3) Export data.
$python -m cuvs_bench.run --data-export --dataset deep-image-96-inner
$
$# (4) Plot results.
$python -m cuvs_bench.plot --dataset deep-image-96-inner

For available built-in datasets and ground-truth details, see Benchmark Datasets.

Large-scale benchmarks (>10M vectors)

cuvs_bench.get_dataset does not download billion-scale datasets. Prepare large datasets first using Benchmark Datasets, then run the same build, search, export, and plot workflow. Datasets at this scale are best suited to large-memory GPUs such as A100 or H100.

The following example prepares ground truth for the Yandex Deep-1B dataset and then runs benchmarks on a 100M-vector subset:

$mkdir -p datasets/deep-1B
$
$# (1) Prepare dataset.
$# Download the Yandex DEEP "Ground Truth" file manually.
$# Suppose the file name is deep_new_groundtruth.public.10K.bin.
$python -m cuvs_bench.split_groundtruth --groundtruth datasets/deep-1B/deep_new_groundtruth.public.10K.bin
$
$# The split step creates groundtruth.neighbors.ibin and groundtruth.distances.fbin.
$# (2) Build and search index.
$python -m cuvs_bench.run --dataset deep-1B --algorithms cuvs_cagra --batch-size 10 -k 10 --build --search
$
$# (3) Export data.
$python -m cuvs_bench.run --data-export --dataset deep-1B
$
$# (4) Plot results.
$python -m cuvs_bench.plot --dataset deep-1B

Use python -m cuvs_bench.split_groundtruth --help to see the full CLI help for the ground-truth split command.

Running with Docker containers

Docker images can run the full workflow or open a shell for manual commands. See Installation for image and tag guidance.

End-to-end run on GPU

Set DATA_FOLDER to a local directory. Datasets are stored in $DATA_FOLDER/datasets and results in $DATA_FOLDER/result.

$export DATA_FOLDER=path/to/store/datasets/and/results
$docker run --gpus all --rm -it -u $(id -u) \
> -v $DATA_FOLDER:/data/benchmarks \
> rapidsai/cuvs-bench:26.06a-cuda12-py3.13 \
> "--dataset deep-image-96-angular" \
> "--normalize" \
> "--algorithms cuvs_cagra,cuvs_ivf_pq --batch-size 10 -k 10" \
> ""
ArgumentDescription
rapidsai/cuvs-bench:26.06a-cuda12-py3.13Image to use. See Installation for available tags.
"--dataset deep-image-96-angular"Dataset name.
"--normalize"Normalizes the dataset before benchmarking.
"--algorithms cuvs_cagra,hnswlib --batch-size 10 -k 10"Arguments passed to the benchmark run script.
""Optional arguments passed to the plot script.

The -u $(id -u) flag lets the container user match the host user, so files written to the mounted volume keep usable permissions.

End-to-end run on CPU

Use the CPU image and omit --gpus all on systems without a GPU.

$export DATA_FOLDER=path/to/store/datasets/and/results
$docker run --rm -it -u $(id -u) \
> -v $DATA_FOLDER:/data/benchmarks \
> rapidsai/cuvs-bench-cpu:26.06a-py3.13 \
> "--dataset deep-image-96-angular" \
> "--normalize" \
> "--algorithms hnswlib --batch-size 10 -k 10" \
> ""

Manual container workflow

All cuvs-bench images include the Conda packages. Start a shell when you want to run individual commands yourself:

$export DATA_FOLDER=path/to/store/datasets/and/results
$docker run --gpus all --rm -it -u $(id -u) \
> --entrypoint /bin/bash \
> --workdir /data/benchmarks \
> -v $DATA_FOLDER:/data/benchmarks \
> rapidsai/cuvs-bench:26.06a-cuda12-py3.13

Inside the container, run the same Python modules directly:

$(base) root@00b068fbb862:/data/benchmarks# python -m cuvs_bench.get_dataset --dataset deep-image-96-angular --normalize

Containers can also run in detached mode.

Evaluating results

Build benchmarks report:

NameDescription
BenchmarkName that identifies the benchmark instance.
TimeWall time spent training the index.
CPUCPU time spent training the index.
IterationsNumber of iterations, usually 1.
GPUGPU time spent building.
index_sizeNumber of vectors used to train the index.

Search benchmarks report:

NameDescription
BenchmarkName that identifies the benchmark instance.
TimeWall time for a single batch divided by the number of threads.
CPUAverage CPU time, not including idle time while waiting for GPU synchronization.
IterationsTotal number of batches, equal to total_queries / n_queries.
GPUGPU latency for a single batch. In throughput mode, this is averaged across threads.
LatencyBatch latency from wall-clock time. In throughput mode, this is averaged across threads.
RecallFraction of returned neighbors that match ground truth. Present only when ground truth is configured.
items_per_secondTotal throughput, or queries per second, approximately total_queries / end_to_end.
kNumber of neighbors requested per query.
end_to_endTotal time to run all batches across all iterations.
n_queriesNumber of query vectors in each batch.
total_queriesTotal query count, equal to iterations * n_queries.

Time and end_to_end are measured differently, so end_to_end = Time * Iterations is only approximate. Output tables may also include the hyper-parameters for each benchmarked configuration. Recall can fluctuate when fewer queries are processed than the benchmark contains, because processed query count depends on iteration count.

Summary

cuVS Bench usage has three main steps: configure datasets and algorithm sweeps, run build and search through Python or Docker, and compare the reported build and search measurements. Start with built-in datasets for smaller tests, prepare large datasets separately for scale testing, and use the result tables to compare quality, latency, throughput, build time, and resource behavior across parameter settings.