Vamana

View as Markdown

Python module: cuvs.neighbors.vamana

Index

1cdef class Index

Vamana index object. This object stores the trained Vamana index state which can be used to perform nearest neighbors searches.

Members

NameKind
trainedproperty

trained

1def trained(self)

IndexParams

1cdef class IndexParams

Parameters for building a Vamana index

Parameters

NameTypeDescription
metricstr, default="sqeuclidean"String denoting the metric type. Supported metrics include:
- “sqeuclidean”
- “l2”
graph_degreeint, default=32Maximum degree of graph; corresponds to the R parameter of Vamana algorithm in the literature.
visited_sizeint, default=64Maximum number of visited nodes per search during Vamana algorithm. Loosely corresponds to the L parameter in the literature.
vamana_itersfloat, default=1Number of Vamana vector insertion iterations (each iteration inserts all vectors).
alphafloat, default=1.2Alpha for pruning parameter. Used to determine how aggressive the pruning will be.
max_fractionfloat, default=0.06Maximum fraction of dataset inserted per batch. Larger max batch decreases graph quality, but improves speed.
batch_basefloat, default=2.0Base of growth rate of batch sizes.
queue_sizeint, default=127Size of candidate queue structure - should be (2^x)-1.
reverse_batchsizeint, default=1000000Max batchsize of reverse edge processing (reduces memory footprint).

Constructor

1def __init__(self, *, metric="sqeuclidean", graph_degree=32, visited_size=64, vamana_iters=1, alpha=1.2, max_fraction=0.06, batch_base=2.0, queue_size=127, reverse_batchsize=1000000)

Members

NameKind
metricproperty
graph_degreeproperty
visited_sizeproperty
vamana_itersproperty
alphaproperty
max_fractionproperty
batch_baseproperty
queue_sizeproperty
reverse_batchsizeproperty

metric

1def metric(self)

graph_degree

1def graph_degree(self)

visited_size

1def visited_size(self)

vamana_iters

1def vamana_iters(self)

alpha

1def alpha(self)

max_fraction

1def max_fraction(self)

batch_base

1def batch_base(self)

queue_size

1def queue_size(self)

reverse_batchsize

1def reverse_batchsize(self)

build

@auto_sync_resources

1def build(IndexParams index_params, dataset, resources=None)

Build the Vamana index from the dataset for efficient search.

The build utilities the Vamana insertion-based algorithm to create the graph. The algorithm starts with an empty graph and iteratively inserts batches of nodes. Each batch involves performing a greedy search for each vector to be inserted, and inserting it with edges to all nodes traversed during the search. Reverse edges are also inserted and robustPrune is applied to improve graph quality. The index_params struct controls the degree of the final graph.

The following distance metrics are supported:

  • L2Expanded

Parameters

NameTypeDescription
index_paramsIndexParams object
datasetCUDA array interface compliant matrix shape (n_samples, dim)Supported dtype [float, int8, uint8]
resourcescuvs.common.Resources, optional

Returns

NameTypeDescription
indexcuvs.vamana.Index

Examples

1>>> import cupy as cp
2>>> from cuvs.neighbors import vamana
3>>> n_samples = 50000
4>>> n_features = 50
5>>> dataset = cp.random.random_sample((n_samples, n_features),
6... dtype=cp.float32)
7>>> build_params = vamana.IndexParams(metric="sqeuclidean")
8>>> index = vamana.build(build_params, dataset)
9>>> # Serialize index to file for later use with CPU DiskANN
10>>> vamana.save("my_index.bin", index)

save

@auto_sync_resources

1def save(filename, Index index, bool include_dataset=True, resources=None)

Saves the index to a file.

Matches the file format used by the DiskANN open-source repository, allowing cross-compatibility.

Parameters

NameTypeDescription
filenamestringName of the file.
indexIndexTrained Vamana index.
include_datasetboolWhether or not to write out the dataset along with the index. Including the dataset in the serialized index will use extra disk space, and might not be desired if you already have a copy of the dataset on disk.
resourcescuvs.common.Resources, optional

Examples

1>>> import cupy as cp
2>>> from cuvs.neighbors import vamana
3>>> n_samples = 50000
4>>> n_features = 50
5>>> dataset = cp.random.random_sample((n_samples, n_features),
6... dtype=cp.float32)
7>>> # Build index
8>>> index = vamana.build(vamana.IndexParams(), dataset)
9>>> # Serialize and save the vamana index
10>>> vamana.save("my_index.bin", index)