Binary Quantizer
Binary quantization compresses each vector into a packed bit vector. Each input value becomes one bit, and each bit records whether the value is greater than a threshold.
Use binary quantization when you want very compact vectors for workflows based on bitwise distances such as Hamming distance. It is a strong compression step, so it trades away most magnitude information for much smaller storage and faster bitwise comparisons.
Example API Usage
C API | C++ API | Python API
Training a quantizer
Training is needed when thresholds are learned from data, such as per-dimension means or sampled medians. The Python API currently exposes the direct zero-threshold transform path.
C
C++
Python
Transforming data
Transforming packs every group of eight input dimensions into one byte. In C and C++, use the trained quantizer when using learned thresholds. Python uses a zero threshold and sets a bit when the corresponding input value is positive.
C
C++
Python
How Binary Quantization works
Binary quantization compares each value with a threshold. If the value is greater than the threshold, the output bit is set to 1; otherwise it is set to 0.
The threshold can be fixed at zero, learned as a per-dimension mean, or learned as a sampled per-dimension median. The output is packed into bytes, so a vector with D features needs ceil(D / 8) bytes.
When to use Binary Quantization
Use binary quantization when compact storage and bitwise comparison speed matter more than preserving floating-point magnitude. It is especially useful for binary vector search with bitwise Hamming distance.
Avoid binary quantization when vector magnitude or fine-grained distance information is important. Because each value becomes only one bit, this is the most aggressive quantizer in the preprocessing guide.
Configuration parameters
Train parameters
Tuning
Use zero when the data is already centered or when you want the fastest path. Use mean when dimensions have different offsets. Use sampling_median when outliers skew the mean and a more robust threshold is needed.
Increase sampling_ratio only when the sampled median is unstable. It does not affect the compressed size, only threshold quality and training cost.
Memory footprint
Binary quantization is usually dominated by the input matrix and the packed output matrix.
Variables:
N: Number of vectors.D: Number of features per vector.B_x: Bytes per input element.B_t: Bytes per threshold value.
Scratch and maximum rows
The scratch term covers temporary packing buffers, threshold-training buffers, allocator padding, CUDA library workspaces, and memory held by the active memory resource. Use H = 0.10 for direct transform and H = 0.20 when training learned thresholds. If you can measure a representative run, use:
Then set:
The capacity variables in this subsection are:
M_free: Free memory in the relevant memory space before the operation starts. Use device memory for GPU-resident formulas and host memory for formulas explicitly marked as host memory.M_other: Memory reserved for arrays, memory pools, concurrent work, or application buffers that are not included in the formula.H: Scratch headroom fraction reserved for temporary buffers and allocator overhead.M_usable: Memory budget left for the formula after subtractingM_otherand reserving headroom.observed_peak: Peak memory observed during a smaller representative run.formula_without_scratch: Value of the selected peak formula with explicitscratchterms removed and without applying headroom.peak_without_scratch(count): The selected peak formula rewritten as a function of the count being estimated, excluding scratch and headroom. The count is usuallyNfor rows or vectors andBfor K-selection batch rows.B_per_row/B_per_vector: Bytes added by one more row or vector in the selected formula. For linear formulas, add the coefficients of the count being estimated after fixed values such asD,K,Q, andLare substituted.B_fixed: Bytes in the selected formula that do not change with the estimated count, such as codebooks, centroids, fixed query batches, capped training buffers, or metadata.N_max/B_max: Estimated largest row, vector, or batch-row count that fits inM_usable.
The packed output is still linear in N. Rewrite the selected peak as:
and solve:
Packed vectors
Each row stores one bit per input feature:
Thresholds
Learned thresholds store one threshold per input dimension:
The zero-threshold path does not need a learned threshold vector.
Transform peak
The transform peak is approximately: