References

Many state-of-the-art implementations of vector search, vector preprocessing, vector compression, and vector clustering algorithms influenced the creation of NVIDIA cuVS. These papers describe core algorithms and GPU primitives used throughout NVIDIA cuVS, from graph-based approximate nearest-neighbor search to clustering, sparse neighborhood methods, top-k selection, and filtered vector search.

Use this page when citing the research behind NVIDIA cuVS algorithms or when looking for deeper technical background on the methods implemented in the library.

CAGRA: Highly Parallel Graph Construction and Approximate Nearest Neighbor Search for GPUs

Paper

CAGRA introduces a GPU-accelerated graph construction and approximate nearest-neighbor search algorithm. It is the main research foundation for NVIDIA cuVS CAGRA, a graph-based vector search index optimized for fast GPU index build and high-throughput GPU search.

1 @inproceedings{ootomo2024cagra,
2   author    = {Ootomo, Hiroyuki and Naruse, Akira and Nolet, Corey and Wang, Ray and Feher, Tamas and Wang, Yong},
3   title     = {CAGRA: Highly Parallel Graph Construction and Approximate Nearest Neighbor Search for GPUs},
4   booktitle = {2024 IEEE 40th International Conference on Data Engineering (ICDE)},
5   pages     = {4236--4247},
6   year      = {2024},
7   doi       = {10.1109/ICDE60146.2024.00323}
8 }

Parallel Top-K Algorithms on GPU: A Comprehensive Study and New Methods

Paper

This paper studies GPU top-k selection and introduces AIR top-K and GridSelect. Efficient top-k selection is a core primitive for nearest-neighbor search because search algorithms often need to keep only the best candidate neighbors out of a much larger set.

1 @inproceedings{zhang2023parallelTopK,
2   author    = {Zhang, Jingrong and Naruse, Akira and Li, Xipeng and Wang, Yong},
3   title     = {Parallel Top-K Algorithms on GPU: A Comprehensive Study and New Methods},
4   booktitle = {Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis},
5   series    = {SC '23},
6   pages     = {76:1--76:13},
7   year      = {2023},
8   publisher = {Association for Computing Machinery},
9   address   = {New York, NY, USA},
10   doi       = {10.1145/3581784.3607062}
11 }

Fast k-NN Graph Construction by GPU Based NN-Descent

Paper

This paper adapts NN-Descent to GPU architecture for fast approximate k-nearest-neighbor graph construction. It provides background for NVIDIA cuVS NN-Descent and for workflows that use k-NN graphs as intermediate structures.

1 @inproceedings{wang2021fastKnnGraph,
2   author    = {Wang, Hui and Zhao, Wan-Lei and Zeng, Xiangxiang and Yang, Jianye},
3   title     = {Fast K-NN Graph Construction by GPU Based NN-Descent},
4   booktitle = {Proceedings of the 30th ACM International Conference on Information and Knowledge Management},
5   series    = {CIKM '21},
6   pages     = {1929--1938},
7   year      = {2021},
8   publisher = {Association for Computing Machinery},
9   address   = {New York, NY, USA},
10   doi       = {10.1145/3459637.3482344}
11 }

cuSLINK: Single-linkage Agglomerative Clustering on the GPU

Paper

cuSLINK reformulates single-linkage agglomerative clustering for the GPU. It connects clustering with nearest-neighbor graph construction, spanning trees, and dendrogram extraction, which makes it relevant to NVIDIA cuVS clustering and graph-building routines.

1 @misc{nolet2023cuslink,
2   title         = {cuSLINK: Single-linkage Agglomerative Clustering on the GPU},
3   author        = {Corey J. Nolet and Divye Gala and Alex Fender and Mahesh Doijade and Joe Eaton and Edward Raff and John Zedlewski and Brad Rees and Tim Oates},
4   year          = {2023},
5   eprint        = {2306.16354},
6   archivePrefix = {arXiv},
7   primaryClass  = {cs.LG}
8 }

GPU Semiring Primitives for Sparse Neighborhood Methods

Paper

This paper presents GPU semiring primitives for sparse vector operations and neighborhood methods. These primitives provide background for sparse-distance and sparse-neighborhood workflows that can appear in vector search, preprocessing, and machine-learning pipelines.

1 @article{nolet2021semiring,
2   author  = {Nolet, Corey J. and Gala, Divye and Raff, Edward and Eaton, Joe and Rees, Brad and Zedlewski, John and Oates, Tim},
3   title   = {GPU Semiring Primitives for Sparse Neighborhood Methods},
4   journal = {arXiv preprint arXiv:2104.06357},
5   year    = {2021}
6 }

VecFlow: A High-Performance Vector Data Management System for Filtered-Search on GPUs

Paper

VecFlow studies filtered approximate nearest-neighbor search on GPUs. It is useful background for NVIDIA cuVS filtered-search work and for systems that combine vector indexes with structured metadata filters.

1 @article{vecflow2025,
2   author  = {Xi, Jingyi and Mo, Chenghao and Karsin, Ben and Chirkin, Artem and Li, Mingqin and Zhang, Minjia},
3   title   = {VecFlow: A High-Performance Vector Data Management System for Filtered-Search on GPUs},
4   journal = {arXiv preprint arXiv:2506.00812},
5   year    = {2025}
6 }