PQ

View as Markdown

Python module: cuvs.preprocessing.quantize.pq

Quantizer

1cdef class Quantizer

Defines and stores Product Quantizer upon training

The quantization is performed by a linear mapping of an interval in the float data type to the full range of the quantized int type.

Members

NameKind
pq_bitsproperty
pq_dimproperty
pq_codebookproperty
vq_codebookproperty
encoded_dimproperty
use_vqproperty

pq_bits

1def pq_bits(self)

pq_dim

1def pq_dim(self)

pq_codebook

1def pq_codebook(self)

Returns the PQ codebook

vq_codebook

1def vq_codebook(self)

Returns the VQ codebook

encoded_dim

1def encoded_dim(self)

Returns the encoded dimension of the quantized dataset

use_vq

1def use_vq(self)

QuantizerParams

1cdef class QuantizerParams

Parameters for product quantization

Parameters

NameTypeDescription
pq_bitsintspecifies the bit length of the vector element after compression by PQ possible values: within [4, 16]
pq_dimintspecifies the dimensionality of the vector after compression by PQ
use_subspacesboolspecifies whether to use subspaces for product quantization (PQ). When true, one PQ codebook is used for each subspace. Otherwise, a single PQ codebook is used.
use_vqboolspecifies whether to use Vector Quantization (KMeans) before product quantization (PQ).
vq_n_centersintspecifies the number of centers for the vector quantizer. When zero, an optimal value is selected using a heuristic. When one, only product quantization is used.
kmeans_n_itersintspecifies the number of iterations searching for kmeans centers
pq_kmeans_typestrspecifies the type of kmeans algorithm to use for PQ training possible values: “kmeans”, “kmeans_balanced”
max_train_points_per_pq_codeintspecifies the max number of data points to use per PQ code during PQ codebook training. Using more data points per PQ code may increase the quality of PQ codebook but may also increase the build time.
max_train_points_per_vq_clusterintspecifies the max number of data points to use per VQ cluster.

Constructor

1def __init__(self, *, pq_bits=8, pq_dim=0, use_subspaces=True, use_vq=False, vq_n_centers=0, kmeans_n_iters=25, pq_kmeans_type="kmeans_balanced", max_train_points_per_pq_code=256, max_train_points_per_vq_cluster=1024)

Members

NameKind
pq_bitsproperty
pq_dimproperty
vq_n_centersproperty
kmeans_n_itersproperty
pq_kmeans_typeproperty
max_train_points_per_pq_codeproperty
max_train_points_per_vq_clusterproperty
use_vqproperty
use_subspacesproperty

pq_bits

1def pq_bits(self)

pq_dim

1def pq_dim(self)

vq_n_centers

1def vq_n_centers(self)

kmeans_n_iters

1def kmeans_n_iters(self)

pq_kmeans_type

1def pq_kmeans_type(self)

max_train_points_per_pq_code

1def max_train_points_per_pq_code(self)

max_train_points_per_vq_cluster

1def max_train_points_per_vq_cluster(self)

use_vq

1def use_vq(self)

use_subspaces

1def use_subspaces(self)

build

@auto_sync_resources

1def build(QuantizerParams params, dataset, resources=None)

Builds a Product Quantizer to be used later for quantizing the dataset.

Parameters

NameTypeDescription
paramsQuantizerParams object
datasetrow major dataset on host or device memory. FP32
resourcescuvs.common.Resources, optional

Returns

NameTypeDescription
quantizercuvs.preprocessing.quantize.pq.Quantizer

Examples

1>>> import cupy as cp
2>>> from cuvs.preprocessing.quantize import pq
3>>> n_samples = 5000
4>>> n_features = 64
5>>> dataset = cp.random.random_sample((n_samples, n_features),
6... dtype=cp.float32)
7>>> params = pq.QuantizerParams(pq_bits=8, pq_dim=16)
8>>> quantizer = pq.build(params, dataset)
9>>> transformed, _ = pq.transform(quantizer, dataset)

transform

@auto_sync_resources @auto_convert_output

1def transform(Quantizer quantizer, dataset, codes_output=None, vq_labels=None, resources=None)

Applies Product Quantization transform to given dataset

Parameters

NameTypeDescription
quantizertrained Quantizer object
datasetrow major dataset on host or device memory. FP32
codes_outputoptional preallocated output memory, on device memory
vq_labelsoptional preallocated output memory for VQ labels, on device memory
resourcescuvs.common.Resources, optional

Returns

NameTypeDescription
codes_outputtransformed dataset quantized into a uint8
vq_labelsVQ labels when VQ is used, None otherwise

Examples

1>>> import cupy as cp
2>>> from cuvs.preprocessing.quantize import pq
3>>> n_samples = 5000
4>>> n_features = 64
5>>> dataset = cp.random.random_sample((n_samples, n_features),
6... dtype=cp.float32)
7>>> params = pq.QuantizerParams(pq_bits=8, pq_dim=16)
8>>> quantizer = pq.build(params, dataset)
9>>> transformed, _ = pq.transform(quantizer, dataset)

inverse_transform

@auto_sync_resources @auto_convert_output

1def inverse_transform(Quantizer quantizer, codes, output=None, vq_labels=None, resources=None)

Applies Product Quantization inverse transform to given codes

Parameters

NameTypeDescription
quantizertrained Quantizer object
codesrow major device codes to inverse transform. uint8
outputoptional preallocated output memory, on device memory
vq_labelsoptional VQ labels when VQ is used, on device memory
resourcescuvs.common.Resources, optional

Returns

NameTypeDescription
outputOriginal dataset reconstructed from quantized codes

Examples

1>>> import cupy as cp
2>>> from cuvs.preprocessing.quantize import pq
3>>> n_samples = 5000
4>>> n_features = 64
5>>> dataset = cp.random.random_sample((n_samples, n_features),
6... dtype=cp.float32)
7>>> params = pq.QuantizerParams(pq_bits=8, pq_dim=16, use_vq=True)
8>>> quantizer = pq.build(params, dataset)
9>>> transformed, vq_labels = pq.transform(quantizer, dataset)
10>>> reconstructed = pq.inverse_transform(quantizer, transformed, vq_labels=vq_labels)