IVF PQ

View as Markdown

Source header: cuvs/neighbors/ivf_pq.h

IVF-PQ index build parameters

cuvsIvfPqCodebookGen

A type for specifying how PQ codebooks are created

1enum cuvsIvfPqCodebookGen { ... };

Values

NameValue
CUVS_IVF_PQ_CODEBOOK_GEN_PER_SUBSPACE0
CUVS_IVF_PQ_CODEBOOK_GEN_PER_CLUSTER1

cuvsIvfPqListLayout

A type for specifying the memory layout of IVF-PQ list data

1enum cuvsIvfPqListLayout { ... };

Values

NameValue
CUVS_IVF_PQ_LIST_LAYOUT_FLAT0
CUVS_IVF_PQ_LIST_LAYOUT_INTERLEAVED1

cuvsIvfPqIndexParams

Supplemental parameters to build IVF-PQ Index

1struct cuvsIvfPqIndexParams { ... };

Fields

NameTypeDescription
metriccuvsDistanceTypeDistance type.
metric_argfloatThe argument used by some distance metrics.
add_data_on_buildboolWhether to add the dataset content to the index, i.e.:
- true means the index is filled with the dataset vectors and ready to search after calling build.
- false means build only trains the underlying model (e.g. quantizer or clustering), but the index is left empty; you’d need to call extend on the index afterwards to populate it.
n_listsuint32_tThe number of inverted lists (clusters) Hint: the number of vectors per cluster (n_rows/n_lists) should be approximately 1,000 to 10,000.
kmeans_n_itersuint32_tThe number of iterations searching for kmeans centers (index building).
kmeans_trainset_fractiondoubleThe fraction of data to use during iterative kmeans building.
pq_bitsuint32_tThe bit length of the vector element after compression by PQ. Possible values: [4, 5, 6, 7, 8]. Hint: the smaller the ‘pq_bits’, the smaller the index size and the better the search performance, but the lower the recall.
pq_dimuint32_tThe dimensionality of the vector after compression by PQ. When zero, an optimal value is selected using a heuristic. NB: pq_dim * pq_bits must be a multiple of 8. Hint: a smaller ‘pq_dim’ results in a smaller index size and better search performance, but lower recall. If ‘pq_bits’ is 8, ‘pq_dim’ can be set to any number, but multiple of 8 are desirable for good performance. If ‘pq_bits’ is not 8, ‘pq_dim’ should be a multiple of 8. For good performance, it is desirable that ‘pq_dim’ is a multiple of 32. Ideally, ‘pq_dim’ should be also a divisor of the dataset dim.
codebook_kindenum cuvsIvfPqCodebookGenHow PQ codebooks are created.
force_random_rotationboolApply a random rotation matrix on the input data and queries even if dim % pq_dim == 0. Note: if dim is not multiple of pq_dim, a random rotation is always applied to the input data and queries to transform the working space from dim to rot_dim, which may be slightly larger than the original space and and is a multiple of pq_dim (rot_dim % pq_dim == 0). However, this transform is not necessary when dim is multiple of pq_dim (dim == rot_dim, hence no need in adding “extra” data columns / features). By default, if dim == rot_dim, the rotation transform is initialized with the identity matrix. When force_random_rotation == true, a random orthogonal transform matrix is generated regardless of the values of dim and pq_dim.
conservative_memory_allocationboolBy default, the algorithm allocates more space than necessary for individual clusters (list_data). This allows to amortize the cost of memory allocation and reduce the number of data copies during repeated calls to extend (extending the database). The alternative is the conservative allocation behavior; when enabled, the algorithm always allocates the minimum amount of memory required to store the given number of records. Set this flag to true if you prefer to use as little GPU memory for the database as possible.
max_train_points_per_pq_codeuint32_tThe max number of data points to use per PQ code during PQ codebook training. Using more data points per PQ code may increase the quality of PQ codebook but may also increase the build time. The parameter is applied to both PQ codebook generation methods, i.e., PER_SUBSPACE and PER_CLUSTER. In both cases, we will use pq_book_size * max_train_points_per_pq_code training points to train each codebook.
codes_layoutenum cuvsIvfPqListLayoutMemory layout of the IVF-PQ list data.
- CUVS_IVF_PQ_LIST_LAYOUT_FLAT: Codes are stored contiguously, one vector’s codes after another.
- CUVS_IVF_PQ_LIST_LAYOUT_INTERLEAVED: Codes are interleaved for optimized search performance. This is the default and recommended for search workloads.

cuvsIvfPqIndexParamsCreate

Allocate IVF-PQ Index params, and populate with default values

1CUVS_EXPORT cuvsError_t cuvsIvfPqIndexParamsCreate(cuvsIvfPqIndexParams_t* index_params);

Parameters

NameDirectionTypeDescription
index_paramsincuvsIvfPqIndexParams_t*cuvsIvfPqIndexParams_t to allocate

Returns

CUVS_EXPORT cuvsError_t

cuvsIvfPqIndexParamsDestroy

De-allocate IVF-PQ Index params

1CUVS_EXPORT cuvsError_t cuvsIvfPqIndexParamsDestroy(cuvsIvfPqIndexParams_t index_params);

Parameters

NameDirectionTypeDescription
index_paramsincuvsIvfPqIndexParams_t

Returns

CUVS_EXPORT cuvsError_t

IVF-PQ index search parameters

cuvsIvfPqSearchParams

Supplemental parameters to search IVF-PQ index

1struct cuvsIvfPqSearchParams { ... };

Fields

NameTypeDescription
n_probesuint32_tThe number of clusters to search.
lut_dtypecudaDataType_tData type of look up table to be created dynamically at search time. Possible values: [CUDA_R_32F, CUDA_R_16F, CUDA_R_8U] The use of low-precision types reduces the amount of shared memory required at search time, so fast shared memory kernels can be used even for datasets with large dimansionality. Note that the recall is slightly degraded when low-precision type is selected.
internal_distance_dtypecudaDataType_tStorage data type for distance/similarity computed at search time. Possible values: [CUDA_R_16F, CUDA_R_32F] If the performance limiter at search time is device memory access, selecting FP16 will improve performance slightly.
coarse_search_dtypecudaDataType_tThe data type to use as the GEMM element type when searching the clusters to probe. Possible values: [CUDA_R_8I, CUDA_R_16F, CUDA_R_32F].
- Legacy default: CUDA_R_32F (float)
- Recommended for performance: CUDA_R_16F (half)
- Experimental/low-precision: CUDA_R_8I (int8_t) (WARNING: int8_t variant degrades recall unless data is normalized and low-dimensional)
max_internal_batch_sizeuint32_tSet the internal batch size to improve GPU utilization at the cost of larger memory footprint.
preferred_shmem_carveoutdoublePreferred fraction of SM’s unified memory / L1 cache to be used as shared memory. Possible values: [0.0 - 1.0] as a fraction of the sharedMemPerMultiprocessor. One wants to increase the carveout to make sure a good GPU occupancy for the main search kernel, but not to keep it too high to leave some memory to be used as L1 cache. Note, this value is interpreted only as a hint. Moreover, a GPU usually allows only a fixed set of cache configurations, so the provided value is rounded up to the nearest configuration. Refer to the NVIDIA tuning guide for the target GPU architecture. Note, this is a low-level tuning parameter that can have drastic negative effects on the search performance if tweaked incorrectly.

cuvsIvfPqSearchParamsCreate

Allocate IVF-PQ search params, and populate with default values

1CUVS_EXPORT cuvsError_t cuvsIvfPqSearchParamsCreate(cuvsIvfPqSearchParams_t* params);

Parameters

NameDirectionTypeDescription
paramsincuvsIvfPqSearchParams_t*cuvsIvfPqSearchParams_t to allocate

Returns

CUVS_EXPORT cuvsError_t

cuvsIvfPqSearchParamsDestroy

De-allocate IVF-PQ search params

1CUVS_EXPORT cuvsError_t cuvsIvfPqSearchParamsDestroy(cuvsIvfPqSearchParams_t params);

Parameters

NameDirectionTypeDescription
paramsincuvsIvfPqSearchParams_t

Returns

CUVS_EXPORT cuvsError_t

IVF-PQ index

cuvsIvfPqIndex

Struct to hold address of cuvs::neighbors::ivf_pq::index and its active trained dtype

1typedef struct { ... } cuvsIvfPqIndex;

Fields

NameTypeDescription
addruintptr_t
dtypeDLDataType

cuvsIvfPqIndexCreate

Allocate IVF-PQ index

1CUVS_EXPORT cuvsError_t cuvsIvfPqIndexCreate(cuvsIvfPqIndex_t* index);

Parameters

NameDirectionTypeDescription
indexincuvsIvfPqIndex_t*cuvsIvfPqIndex_t to allocate

Returns

CUVS_EXPORT cuvsError_t

cuvsIvfPqIndexDestroy

De-allocate IVF-PQ index

1CUVS_EXPORT cuvsError_t cuvsIvfPqIndexDestroy(cuvsIvfPqIndex_t index);

Parameters

NameDirectionTypeDescription
indexincuvsIvfPqIndex_tcuvsIvfPqIndex_t to de-allocate

Returns

CUVS_EXPORT cuvsError_t

cuvsIvfPqIndexGetNLists

Get the number of clusters/inverted lists

1CUVS_EXPORT cuvsError_t cuvsIvfPqIndexGetNLists(cuvsIvfPqIndex_t index, int64_t* n_lists);

Parameters

NameDirectionTypeDescription
indexcuvsIvfPqIndex_t
n_listsint64_t*

Returns

CUVS_EXPORT cuvsError_t

cuvsIvfPqIndexGetDim

Get the dimensionality

1CUVS_EXPORT cuvsError_t cuvsIvfPqIndexGetDim(cuvsIvfPqIndex_t index, int64_t* dim);

Parameters

NameDirectionTypeDescription
indexcuvsIvfPqIndex_t
dimint64_t*

Returns

CUVS_EXPORT cuvsError_t

cuvsIvfPqIndexGetSize

Get the size of the index

1CUVS_EXPORT cuvsError_t cuvsIvfPqIndexGetSize(cuvsIvfPqIndex_t index, int64_t* size);

Parameters

NameDirectionTypeDescription
indexcuvsIvfPqIndex_t
sizeint64_t*

Returns

CUVS_EXPORT cuvsError_t

cuvsIvfPqIndexGetPqDim

Get the dimensionality of an encoded vector after compression by PQ.

1CUVS_EXPORT cuvsError_t cuvsIvfPqIndexGetPqDim(cuvsIvfPqIndex_t index, int64_t* pq_dim);

Parameters

NameDirectionTypeDescription
indexcuvsIvfPqIndex_t
pq_dimint64_t*

Returns

CUVS_EXPORT cuvsError_t

cuvsIvfPqIndexGetPqBits

Get the bit length of an encoded vector element after compression by PQ.

1CUVS_EXPORT cuvsError_t cuvsIvfPqIndexGetPqBits(cuvsIvfPqIndex_t index, int64_t* pq_bits);

Parameters

NameDirectionTypeDescription
indexcuvsIvfPqIndex_t
pq_bitsint64_t*

Returns

CUVS_EXPORT cuvsError_t

cuvsIvfPqIndexGetPqLen

Get the Dimensionality of a subspace, i.e. the number of vector

1CUVS_EXPORT cuvsError_t cuvsIvfPqIndexGetPqLen(cuvsIvfPqIndex_t index, int64_t* pq_len);

components mapped to a subspace

Parameters

NameDirectionTypeDescription
indexcuvsIvfPqIndex_t
pq_lenint64_t*

Returns

CUVS_EXPORT cuvsError_t

cuvsIvfPqIndexGetCenters

Get the cluster centers corresponding to the lists in the original space

1CUVS_EXPORT cuvsError_t cuvsIvfPqIndexGetCenters(cuvsIvfPqIndex_t index, DLManagedTensor* centers);

Parameters

NameDirectionTypeDescription
indexincuvsIvfPqIndex_tcuvsIvfPqIndex_t Built Ivf-Pq index
centersoutDLManagedTensor*Output tensor that will be populated with a non-owning view of the data

Returns

CUVS_EXPORT cuvsError_t

cuvsIvfPqIndexGetCentersPadded

Get the padded cluster centers [n_lists, dim_ext]

1CUVS_EXPORT cuvsError_t cuvsIvfPqIndexGetCentersPadded(cuvsIvfPqIndex_t index, DLManagedTensor* centers);

where dim_ext = round_up(dim + 1, 8)

This returns the full padded centers as a contiguous array, suitable for use with cuvsIvfPqBuildPrecomputed.

Parameters

NameDirectionTypeDescription
indexincuvsIvfPqIndex_tcuvsIvfPqIndex_t Built Ivf-Pq index
centersoutDLManagedTensor*Output tensor that will be populated with a non-owning view of the data

Returns

CUVS_EXPORT cuvsError_t

cuvsIvfPqIndexGetPqCenters

Get the PQ cluster centers

1CUVS_EXPORT cuvsError_t cuvsIvfPqIndexGetPqCenters(cuvsIvfPqIndex_t index, DLManagedTensor* pq_centers);
  • CUVS_IVF_PQ_CODEBOOK_GEN_PER_SUBSPACE: [pq_dim , pq_len, pq_book_size]
  • CUVS_IVF_PQ_CODEBOOK_GEN_PER_CLUSTER: [n_lists, pq_len, pq_book_size]

Parameters

NameDirectionTypeDescription
indexincuvsIvfPqIndex_tcuvsIvfPqIndex_t Built Ivf-Pq index
pq_centersoutDLManagedTensor*Output tensor that will be populated with a non-owning view of the data

Returns

CUVS_EXPORT cuvsError_t

cuvsIvfPqIndexGetCentersRot

Get the rotated cluster centers [n_lists, rot_dim]

1CUVS_EXPORT cuvsError_t cuvsIvfPqIndexGetCentersRot(cuvsIvfPqIndex_t index, DLManagedTensor* centers_rot);

where rot_dim = pq_len * pq_dim

Parameters

NameDirectionTypeDescription
indexincuvsIvfPqIndex_tcuvsIvfPqIndex_t Built Ivf-Pq index
centers_rotoutDLManagedTensor*Output tensor that will be populated with a non-owning view of the data

Returns

CUVS_EXPORT cuvsError_t

cuvsIvfPqIndexGetRotationMatrix

Get the rotation matrix [rot_dim, dim]

1CUVS_EXPORT cuvsError_t cuvsIvfPqIndexGetRotationMatrix(cuvsIvfPqIndex_t index,
2DLManagedTensor* rotation_matrix);

Transform matrix (original space -> rotated padded space)

data

Parameters

NameDirectionTypeDescription
indexincuvsIvfPqIndex_tcuvsIvfPqIndex_t Built Ivf-Pq index
rotation_matrixoutDLManagedTensor*Output tensor that will be populated with a non-owning view of the

Returns

CUVS_EXPORT cuvsError_t

cuvsIvfPqIndexGetListSizes

Get the sizes of each list

1CUVS_EXPORT cuvsError_t cuvsIvfPqIndexGetListSizes(cuvsIvfPqIndex_t index, DLManagedTensor* list_sizes);

Parameters

NameDirectionTypeDescription
indexincuvsIvfPqIndex_tcuvsIvfPqIndex_t Built Ivf-Pq index
list_sizesoutDLManagedTensor*Output tensor that will be populated with a non-owning view of the data

Returns

CUVS_EXPORT cuvsError_t

cuvsIvfPqIndexUnpackContiguousListData

Unpack n_rows consecutive PQ encoded vectors of a single list (cluster) in the

1CUVS_EXPORT cuvsError_t cuvsIvfPqIndexUnpackContiguousListData(cuvsResources_t res,
2cuvsIvfPqIndex_t index,
3DLManagedTensor* out_codes,
4uint32_t label,
5uint32_t offset);

compressed index starting at given offset, not expanded to one code per byte. Each code in the output buffer occupies ceildiv(index.pq_dim() * index.pq_bits(), 8) bytes.

Parameters

NameDirectionTypeDescription
resincuvsResources_traft resource
indexincuvsIvfPqIndex_tcuvsIvfPqIndex_t Built Ivf-Pq index
out_codesoutDLManagedTensor*the destination buffer [n_rows, ceildiv(index.pq_dim() * index.pq_bits(), 8)]. The length n_rows defines how many records to unpack, offset + n_rows must be smaller than or equal to the list size. This DLManagedTensor must already point to allocated device memory
labelinuint32_tThe id of the list (cluster) to decode.
offsetinuint32_tHow many records in the list to skip.

Returns

CUVS_EXPORT cuvsError_t

cuvsIvfPqIndexGetListIndices

Get the indices of each vector in a ivf-pq list

1CUVS_EXPORT cuvsError_t cuvsIvfPqIndexGetListIndices(cuvsIvfPqIndex_t index,
2uint32_t label,
3DLManagedTensor* out_labels);

Parameters

NameDirectionTypeDescription
indexincuvsIvfPqIndex_tcuvsIvfPqIndex_t Built Ivf-Pq index
labelinuint32_tThe id of the list (cluster) to decode.
out_labelsoutDLManagedTensor*output tensor that will be populated with a non-owning view of the data

Returns

CUVS_EXPORT cuvsError_t

IVF-PQ index build

cuvsIvfPqBuild

Build a IVF-PQ index with a DLManagedTensor which has underlying

1CUVS_EXPORT cuvsError_t cuvsIvfPqBuild(cuvsResources_t res,
2cuvsIvfPqIndexParams_t params,
3DLManagedTensor* dataset,
4cuvsIvfPqIndex_t index);

DLDeviceType equal to kDLCUDA, kDLCUDAHost, kDLCUDAManaged, or kDLCPU. Also, acceptable underlying types are:

  1. kDLDataType.code == kDLFloat and kDLDataType.bits = 32
  2. kDLDataType.code == kDLFloat and kDLDataType.bits = 16
  3. kDLDataType.code == kDLInt and kDLDataType.bits = 8
  4. kDLDataType.code == kDLUInt and kDLDataType.bits = 8

Parameters

NameDirectionTypeDescription
resincuvsResources_tcuvsResources_t opaque C handle
paramsincuvsIvfPqIndexParams_tcuvsIvfPqIndexParams_t used to build IVF-PQ index
datasetinDLManagedTensor*DLManagedTensor* training dataset
indexoutcuvsIvfPqIndex_tcuvsIvfPqIndex_t Newly built IVF-PQ index

Returns

CUVS_EXPORT cuvsError_t

cuvsIvfPqBuildPrecomputed

Build a view-type IVF-PQ index from device memory precomputed centroids and codebook.

1CUVS_EXPORT cuvsError_t cuvsIvfPqBuildPrecomputed(cuvsResources_t res,
2cuvsIvfPqIndexParams_t params,
3uint32_t dim,
4DLManagedTensor* pq_centers,
5DLManagedTensor* centers,
6DLManagedTensor* centers_rot,
7DLManagedTensor* rotation_matrix,
8cuvsIvfPqIndex_t index);

This function creates a non-owning index that stores a reference to the provided device data. All parameters must be provided with correct extents. The caller is responsible for ensuring the lifetime of the input data exceeds the lifetime of the returned index.

The index_params must be consistent with the provided matrices. Specifically:

  • index_params.codebook_kind determines the expected shape of pq_centers
  • index_params.metric will be stored in the index
  • index_params.conservative_memory_allocation will be stored in the index The function will verify consistency between index_params, dim, and the matrix extents.

matrices) dim]

Parameters

NameDirectionTypeDescription
resincuvsResources_tcuvsResources_t opaque C handle
paramsincuvsIvfPqIndexParams_tcuvsIvfPqIndexParams_t used to configure the index (must be consistent with
diminuint32_tdimensionality of the input data
pq_centersinDLManagedTensor*PQ codebook on device memory with required shape:
- codebook_kind CUVS_IVF_PQ_CODEBOOK_GEN_PER_SUBSPACE: [pq_dim, pq_len, pq_book_size]
- codebook_kind CUVS_IVF_PQ_CODEBOOK_GEN_PER_CLUSTER: [n_lists, pq_len, pq_book_size]
centersinDLManagedTensor*Cluster centers in the original space [n_lists, dim_ext] where dim_ext = round_up(dim + 1, 8)
centers_rotinDLManagedTensor*Rotated cluster centers [n_lists, rot_dim] where rot_dim = pq_len * pq_dim
rotation_matrixinDLManagedTensor*Transform matrix (original space -> rotated padded space) [rot_dim,
indexoutcuvsIvfPqIndex_tcuvsIvfPqIndex_t Newly built view-type IVF-PQ index

Returns

CUVS_EXPORT cuvsError_t

cuvsIvfPqSearch

Search a IVF-PQ index with a DLManagedTensor which has underlying

1CUVS_EXPORT cuvsError_t cuvsIvfPqSearch(cuvsResources_t res,
2cuvsIvfPqSearchParams_t search_params,
3cuvsIvfPqIndex_t index,
4DLManagedTensor* queries,
5DLManagedTensor* neighbors,
6DLManagedTensor* distances);

DLDeviceType equal to kDLCUDA, kDLCUDAHost, kDLCUDAManaged. It is also important to note that the IVF-PQ Index must have been built with the same type of queries, such that index.dtype.code == queries.dl_tensor.dtype.code Types for input are:

  1. queries: kDLDataType.code == kDLFloat and kDLDataType.bits = 32 or kDLDataType.bits = 16
  2. neighbors: kDLDataType.code == kDLUInt and kDLDataType.bits = 32
  3. distances: kDLDataType.code == kDLFloat and kDLDataType.bits = 32

Parameters

NameDirectionTypeDescription
resincuvsResources_tcuvsResources_t opaque C handle
search_paramsincuvsIvfPqSearchParams_tcuvsIvfPqSearchParams_t used to search IVF-PQ index
indexincuvsIvfPqIndex_tcuvsIvfPqIndex which has been returned by cuvsIvfPqBuild
queriesinDLManagedTensor*DLManagedTensor* queries dataset to search
neighborsoutDLManagedTensor*DLManagedTensor* output k neighbors for queries
distancesoutDLManagedTensor*DLManagedTensor* output k distances for queries

Returns

CUVS_EXPORT cuvsError_t

IVF-PQ C-API serialize functions

cuvsIvfPqSerialize

Save the index to file.

1CUVS_EXPORT cuvsError_t cuvsIvfPqSerialize(cuvsResources_t res, const char* filename, cuvsIvfPqIndex_t index);

Experimental, both the API and the serialization format are subject to change.

Parameters

NameDirectionTypeDescription
resincuvsResources_tcuvsResources_t opaque C handle
filenameinconst char*the file name for saving the index
indexincuvsIvfPqIndex_tIVF-PQ index

Returns

CUVS_EXPORT cuvsError_t

cuvsIvfPqDeserialize

Load index from file.

1CUVS_EXPORT cuvsError_t cuvsIvfPqDeserialize(cuvsResources_t res, const char* filename, cuvsIvfPqIndex_t index);

Experimental, both the API and the serialization format are subject to change.

Parameters

NameDirectionTypeDescription
resincuvsResources_tcuvsResources_t opaque C handle
filenameinconst char*the name of the file that stores the index
indexoutcuvsIvfPqIndex_tIVF-PQ index loaded disk

Returns

CUVS_EXPORT cuvsError_t

IVF-PQ index extend

cuvsIvfPqExtend

Extend the index with the new data.

1CUVS_EXPORT cuvsError_t cuvsIvfPqExtend(cuvsResources_t res,
2DLManagedTensor* new_vectors,
3DLManagedTensor* new_indices,
4cuvsIvfPqIndex_t index);

Parameters

NameDirectionTypeDescription
resincuvsResources_tcuvsResources_t opaque C handle
new_vectorsinDLManagedTensor*DLManagedTensor* the new vectors to add to the index
new_indicesinDLManagedTensor*DLManagedTensor* vector of new indices for the new vectors
indexinoutcuvsIvfPqIndex_tIVF-PQ index to be extended

Returns

CUVS_EXPORT cuvsError_t

IVF-PQ index transform

cuvsIvfPqTransform

Transform the input data by applying pq-encoding

1CUVS_EXPORT cuvsError_t cuvsIvfPqTransform(cuvsResources_t res,
2cuvsIvfPqIndex_t index,
3DLManagedTensor* input_dataset,
4DLManagedTensor* output_labels,
5DLManagedTensor* output_dataset);

Parameters

NameDirectionTypeDescription
resincuvsResources_tcuvsResources_t opaque C handle
indexincuvsIvfPqIndex_tIVF-PQ index
input_datasetinDLManagedTensor*DLManagedTensor* vectors to transform
output_labelsoutDLManagedTensor*DLManagedTensor* Vector of cluster labels for each vector in the input
output_datasetoutDLManagedTensor*DLManagedTensor* input vectors after pq-encoding

Returns

CUVS_EXPORT cuvsError_t