K-Means
K-Means
Source header: cuvs/cluster/kmeans.hpp
Types
cluster::kmeans::base_params
Base structure for parameters that are common to all k-means algorithms
Fields
k-means hyperparameters
cluster::kmeans::params
Simple object to specify hyper-parameters to the kmeans algorithm.
Fields
cluster::kmeans::balanced_params
Simple object to specify hyper-parameters to the balanced k-means algorithm.
The following metrics are currently supported in k-means balanced:
- CosineExpanded
- InnerProduct
- L2Expanded
- L2SqrtExpanded
Fields
cluster::kmeans::kmeans_type
Type of k-means algorithm.
Values
k-means clustering APIs
cluster::kmeans::fit
Find clusters with k-means algorithm using batched processing of host data.
TODO: Evaluate replacing the extent type with int64_t. Reference issue: https://github.com/rapidsai/cuvs/issues/1961
This overload supports out-of-core computation where the dataset resides on the host. Data is processed in GPU-sized batches, streaming from host to device. The batch size is controlled by params.streaming_batch_size. In multi-GPU mode, this is a per-rank batch size.
Multi-GPU dispatch is selected automatically based on the handle state:
- If
raft::resource::is_multi_gpu(handle)(cuVS SNMG): the full dataset X is split across GPUs internally with an OpenMP parallel region and NCCL. - If
raft::resource::comms_initialized(handle)(Dask/Ray/MPI): X is treated as this worker’s partition, and RAFT communicators are used for collectives. - Otherwise: single-GPU batched k-means.
With params.init == InitMethod::KMeansPlusPlus in multi-GPU mode, the effective initialization sample must fit in GPU memory on every rank because it is materialized on every device. Rank 0 must also have enough GPU memory for the seeding workspace before centroids are broadcast.
Parameters
Returns
void
Additional overload: cluster::kmeans::fit
Find clusters with k-means algorithm using batched processing of host data.
Parameters
Returns
void
Additional overload: cluster::kmeans::fit
Find clusters with k-means algorithm. Initial centroids are chosen with k-means++ algorithm. Empty clusters are reinitialized by choosing new centroids with k-means++ algorithm.
Parameters
Returns
void
Additional overload: cluster::kmeans::fit
Find clusters with k-means algorithm. Initial centroids are chosen with k-means++ algorithm. Empty clusters are reinitialized by choosing new centroids with k-means++ algorithm.
Parameters
Returns
void
Additional overload: cluster::kmeans::fit
Find clusters with k-means algorithm. Initial centroids are chosen with k-means++ algorithm. Empty clusters are reinitialized by choosing new centroids with k-means++ algorithm.
Parameters
Returns
void
Additional overload: cluster::kmeans::fit
Find clusters with k-means algorithm. Initial centroids are chosen with k-means++ algorithm. Empty clusters are reinitialized by choosing new centroids with k-means++ algorithm.
Parameters
Returns
void
Additional overload: cluster::kmeans::fit
Find clusters with k-means algorithm. Initial centroids are chosen with k-means++ algorithm. Empty clusters are reinitialized by choosing new centroids with k-means++ algorithm.
Parameters
Returns
void
Additional overload: cluster::kmeans::fit
Find balanced clusters with k-means algorithm.
Parameters
Returns
void
Additional overload: cluster::kmeans::fit
Find balanced clusters with k-means algorithm.
Parameters
Returns
void
Additional overload: cluster::kmeans::fit
Find balanced clusters with k-means algorithm.
Parameters
Returns
void
Additional overload: cluster::kmeans::fit
Find balanced clusters with k-means algorithm.
Parameters
Returns
void
cluster::kmeans::predict
Predict the closest cluster each sample in X belongs to.
Parameters
Returns
void
Additional overload: cluster::kmeans::predict
Predict the closest cluster each sample in X belongs to.
Parameters
Returns
void
Additional overload: cluster::kmeans::predict
Predict the closest cluster each sample in X belongs to.
Parameters
Returns
void
Additional overload: cluster::kmeans::predict
Predict the closest cluster each sample in X belongs to.
Parameters
Returns
void
Additional overload: cluster::kmeans::predict
Predict the closest cluster each sample in X belongs to.
Parameters
Returns
void
Additional overload: cluster::kmeans::predict
Predict the closest cluster each sample in X belongs to.
Parameters
Returns
void
Additional overload: cluster::kmeans::predict
Predict the closest cluster each sample in X belongs to.
Parameters
Returns
void
Additional overload: cluster::kmeans::predict
Predict the closest cluster each sample in X belongs to.
Parameters
Returns
void
Additional overload: cluster::kmeans::predict
Predict the closest cluster each sample in X belongs to.
Parameters
Returns
void
Additional overload: cluster::kmeans::predict
Predict the closest cluster each sample in X belongs to.
Parameters
Returns
void
cluster::kmeans::fit_predict
Compute k-means clustering and predicts cluster index for each sample in the input.
Parameters
Returns
void
Additional overload: cluster::kmeans::fit_predict
Compute k-means clustering and predicts cluster index for each sample in the input.
Parameters
Returns
void
Additional overload: cluster::kmeans::fit_predict
Compute k-means clustering and predicts cluster index for each sample in the input.
Parameters
Returns
void
Additional overload: cluster::kmeans::fit_predict
Compute k-means clustering and predicts cluster index for each sample in the input.
Parameters
Returns
void
Additional overload: cluster::kmeans::fit_predict
Compute balanced k-means clustering and predicts cluster index for each sample in the input.
Parameters
Returns
void
Additional overload: cluster::kmeans::fit_predict
Compute balanced k-means clustering and predicts cluster index for each sample in the input.
Parameters
Returns
void
cluster::kmeans::transform
Transform X to a cluster-distance space.
Parameters
Returns
void
Additional overload: cluster::kmeans::transform
Transform X to a cluster-distance space.
Parameters
Returns
void
cluster::kmeans::cluster_cost
Compute (optionally weighted) cluster cost
Parameters
Returns
void
Additional overload: cluster::kmeans::cluster_cost
Compute cluster cost
Parameters
Returns
void
Additional overload: cluster::kmeans::cluster_cost
Compute (optionally weighted) cluster cost
Parameters
Returns
void
Additional overload: cluster::kmeans::cluster_cost
Compute (optionally weighted) cluster cost
Parameters
Returns
void
k-means API helpers
cluster::kmeans::helpers::find_k
Automatically find the optimal value of k using a binary search. This method maximizes the Calinski-Harabasz Index while minimizing the per-cluster inertia.
Parameters
Returns
void