Many state-of-the-art implementations of vector search, vector preprocessing, vector compression, and vector clustering algorithms influenced the creation of NVIDIA cuVS. These papers describe core algorithms and GPU primitives used throughout NVIDIA cuVS, from graph-based approximate nearest-neighbor search to clustering, sparse neighborhood methods, top-k selection, and filtered vector search.
Use this page when citing the research behind NVIDIA cuVS algorithms or when looking for deeper technical background on the methods implemented in the library.
CAGRA introduces a GPU-accelerated graph construction and approximate nearest-neighbor search algorithm. It is the main research foundation for NVIDIA cuVS CAGRA, a graph-based vector search index optimized for fast GPU index build and high-throughput GPU search.
This paper studies GPU top-k selection and introduces AIR top-K and GridSelect. Efficient top-k selection is a core primitive for nearest-neighbor search because search algorithms often need to keep only the best candidate neighbors out of a much larger set.
This paper adapts NN-Descent to GPU architecture for fast approximate k-nearest-neighbor graph construction. It provides background for NVIDIA cuVS NN-Descent and for workflows that use k-NN graphs as intermediate structures.
cuSLINK reformulates single-linkage agglomerative clustering for the GPU. It connects clustering with nearest-neighbor graph construction, spanning trees, and dendrogram extraction, which makes it relevant to NVIDIA cuVS clustering and graph-building routines.
This paper presents GPU semiring primitives for sparse vector operations and neighborhood methods. These primitives provide background for sparse-distance and sparse-neighborhood workflows that can appear in vector search, preprocessing, and machine-learning pipelines.
VecFlow studies filtered approximate nearest-neighbor search on GPUs. It is useful background for NVIDIA cuVS filtered-search work and for systems that combine vector indexes with structured metadata filters.