API Guide | cuVS

Use these pages to find task-focused NVIDIA cuVS API examples for clustering, vector indexing, preprocessing, and supporting routines.

NVIDIA cuVS is written in C++ at its core and wrapped by a stable C API layer. The Python, Java, Rust, and Go bindings use that C layer so they can share the same ABI boundary; see Compatibility for why that matters. These API guides are intended for general use and include examples for supported programming languages where possible, but some guides document C++ concepts explicitly because all NVIDIA cuVS algorithm implementations are C++ at the core.

Common Types

Array Types: choose between dense arrays and sparse arrays for NVIDIA cuVS APIs.
Dense Arrays: pass dense vectors, matrices, and outputs into NVIDIA cuVS APIs across supported languages.
Memory Management: configure RMM device, pool, pinned host, host, and managed memory resources for NVIDIA cuVS workflows.
Multi-GPU: initialize multi-GPU resources and understand RAFT/NCCL communication setup.
Resources: reuse CUDA streams, library handles, stream pools, and workspace resources across NVIDIA cuVS calls.
Sparse Arrays: use CSR and COO sparse matrix views with NVIDIA cuVS C++ APIs that accept sparse inputs.

Clustering Guide

K-Means: partition vectors into a fixed number of clusters, often as part of scalable vector-search systems.
Single-linkage: build hierarchical clusters from nearest-neighbor relationships.
Spectral Clustering: use graph structure and spectral methods to identify clusters with more complex shapes.

Indexing Guide

Brute-force: run exact nearest-neighbor search by comparing each query with every vector.
CAGRA: build and search GPU-optimized graph indexes for high-throughput ANN search.
NN-Descent: build approximate nearest-neighbor graphs with an iterative algorithm.
IVF-Flat: partition vectors into inverted-file lists while storing full-precision vectors.
IVF-PQ: combine inverted-file partitioning with product quantization for compact indexes.
ScaNN: combine partitioning, quantization, and refinement for high-quality approximate search.
Vamana: build graph indexes for large-scale and disk-backed search workflows.
All-neighbors: compute all-neighbors graph structures.

Preprocessing Guide

Binary Quantizer: compress vectors into binary representations for compact storage and fast comparisons.
PCA: reduce dimensionality with a linear projection while preserving as much variance as possible.
Product Quantization: split vectors into subvectors and encode each part with compact codebooks.
Scalar Quantizer: compress each vector dimension independently with scalar quantization.
Spectral Embedding: create lower-dimensional embeddings from graph structure.

Other APIs

Dynamic Batching: collect many concurrent small ANN searches into larger GPU search batches.
K-selection: select the top k values or nearest candidates from larger result sets.
Pairwise Distances: compute distances between vectors for analysis, validation, or algorithm building blocks.