API Guide
Use these pages to find task-focused NVIDIA cuVS API examples for clustering, vector indexing, preprocessing, and supporting routines.
NVIDIA cuVS is written in C++ at its core and wrapped by a stable C API layer. The Python, Java, Rust, and Go bindings use that C layer so they can share the same ABI boundary; see Compatibility for why that matters. These API guides are intended for general use and include examples for supported programming languages where possible, but some guides document C++ concepts explicitly because all NVIDIA cuVS algorithm implementations are C++ at the core.
Common Types
- Array Types: choose between dense arrays and sparse arrays for NVIDIA cuVS APIs.
- Dense Arrays: pass dense vectors, matrices, and outputs into NVIDIA cuVS APIs across supported languages.
- Memory Management: configure RMM device, pool, pinned host, host, and managed memory resources for NVIDIA cuVS workflows.
- Multi-GPU: initialize multi-GPU resources and understand RAFT/NCCL communication setup.
- Resources: reuse CUDA streams, library handles, stream pools, and workspace resources across NVIDIA cuVS calls.
- Sparse Arrays: use CSR and COO sparse matrix views with NVIDIA cuVS C++ APIs that accept sparse inputs.
Clustering Guide
- K-Means: partition vectors into a fixed number of clusters, often as part of scalable vector-search systems.
- Single-linkage: build hierarchical clusters from nearest-neighbor relationships.
- Spectral Clustering: use graph structure and spectral methods to identify clusters with more complex shapes.
Indexing Guide
- Brute-force: run exact nearest-neighbor search by comparing each query with every vector.
- CAGRA: build and search GPU-optimized graph indexes for high-throughput ANN search.
- NN-Descent: build approximate nearest-neighbor graphs with an iterative algorithm.
- IVF-Flat: partition vectors into inverted-file lists while storing full-precision vectors.
- IVF-PQ: combine inverted-file partitioning with product quantization for compact indexes.
- ScaNN: combine partitioning, quantization, and refinement for high-quality approximate search.
- Vamana: build graph indexes for large-scale and disk-backed search workflows.
- All-neighbors: compute all-neighbors graph structures.
Preprocessing Guide
- Binary Quantizer: compress vectors into binary representations for compact storage and fast comparisons.
- PCA: reduce dimensionality with a linear projection while preserving as much variance as possible.
- Product Quantization: split vectors into subvectors and encode each part with compact codebooks.
- Scalar Quantizer: compress each vector dimension independently with scalar quantization.
- Spectral Embedding: create lower-dimensional embeddings from graph structure.
Other APIs
- Dynamic Batching: collect many concurrent small ANN searches into larger GPU search batches.
- K-selection: select the top
kvalues or nearest candidates from larger result sets. - Pairwise Distances: compute distances between vectors for analysis, validation, or algorithm building blocks.