User Guide

View as Markdown

Use these guides when you are ready to apply cuVS APIs, benchmark algorithms, or integrate cuVS into a larger product.

API Guide

  • API Guide: find task-focused cuVS API examples for clustering, vector indexing, preprocessing, and supporting routines.

Clustering Guide

  • K-Means: partition vectors into a fixed number of clusters, often as part of scalable vector-search systems.
  • Single-linkage: build hierarchical clusters from nearest-neighbor relationships.
  • Spectral Clustering: use graph structure and spectral methods to identify clusters with more complex shapes.

Indexing Guide

  • Brute-force: run exact nearest-neighbor search by comparing each query with every vector.
  • CAGRA: build and search GPU-optimized graph indexes for high-throughput ANN search.
  • NN-Descent: build approximate nearest-neighbor graphs with an iterative algorithm.
  • IVF-Flat: partition vectors into inverted-file lists while storing full-precision vectors.
  • IVF-PQ: combine inverted-file partitioning with product quantization for compact indexes.
  • ScaNN: combine partitioning, quantization, and refinement for high-quality approximate search.
  • Vamana: build graph indexes for large-scale and disk-backed search workflows.
  • All-neighbors: compute all-neighbors graph structures.

Preprocessing Guide

  • Binary Quantizer: compress vectors into binary representations for compact storage and fast comparisons.
  • PCA: reduce dimensionality with a linear projection while preserving as much variance as possible.
  • Product Quantization: split vectors into subvectors and encode each part with compact codebooks.
  • Scalar Quantizer: compress each vector dimension independently with scalar quantization.
  • Spectral Embedding: create lower-dimensional embeddings from graph structure.

Other APIs

  • Pairwise Distances: compute distances between vectors for analysis, validation, or algorithm building blocks.
  • K-selection: select the top k values or nearest candidates from larger result sets.

Benchmarking Guide

  • Methodologies: compare vector indexes fairly with quality buckets, Pareto curves, and consistent reporting.
  • cuVS Bench Tool: start with the cuVS Bench guide for reproducible benchmark workflows.
  • cuVS Bench Installation: install cuVS Bench with packages or containers, or build it from source.
  • cuVS Bench Usage: configure algorithms, run benchmarks, and read build and search results.
  • cuVS Bench Datasets: prepare datasets, ground truth, binary files, and dataset descriptors.
  • cuVS Bench Backends: understand and extend backend integrations for benchmark execution.

Compatibility and Integration

  • Compatibility: understand cuVS release compatibility, ABI windows, and stable binary boundaries.
  • Integration Patterns: compare direct, offloaded, and service-oriented ways to integrate cuVS into products.
  • References: cite the research papers behind cuVS vector search, preprocessing, clustering, and GPU primitives.