K-Means

View as Markdown

Source header: cuvs/cluster/kmeans.hpp

Types

cluster::kmeans::base_params

Base structure for parameters that are common to all k-means algorithms

1struct base_params { ... };

Fields

NameTypeDescription
metriccuvs::distance::DistanceTypeMetric to use for distance computation. The supported metrics can vary per algorithm.

k-means hyperparameters

cluster::kmeans::params

Simple object to specify hyper-parameters to the kmeans algorithm.

1struct params : base_params { ... };

Fields

NameTypeDescription
n_clustersintThe number of clusters to form as well as the number of centroids to generate (default:8).
initInitMethodMethod for initialization, defaults to k-means++:
- InitMethod::KMeansPlusPlus (k-means++): Use scalable k-means++ algorithm to select the initial cluster centers.
- InitMethod::Random (random): Choose ‘n_clusters’ observations (rows) at random from the input data for the initial centroids.
- InitMethod::Array (ndarray): Use ‘centroids’ as initial cluster centers.
max_iterintMaximum number of iterations of the k-means algorithm for a single run.
toldoubleRelative tolerance with regards to inertia to declare convergence.
verbosityrapids_logger::level_enumverbosity level.
rng_stateraft::random::RngStateSeed to the random number generator.
n_initintNumber of instance k-means algorithm will be run with different seeds.
oversampling_factordoubleOversampling factor for use in the k-means|| algorithm
batch_samplesintbatch_samples and batch_centroids are used to tile 1NN computation which is useful to optimize/control the memory footprint Default tile is [batch_samples x n_clusters] i.e. when batch_centroids is 0 then don’t tile the centroids NB: These parameters are unrelated to streaming_batch_size, which controls how many samples to transfer from host to device per batch when processing out-of-core data.
batch_centroidsintif 0 then batch_centroids = n_clusters
init_sizeint64_tNumber of samples to randomly draw for the KMeansPlusPlus initialization step. A random subset of this size is used for centroid seeding. Only applies when dataset is on host; for device data the full dataset is always used for seeding and this parameter is ignored. When set to 0 (default) with host data uses min(3 * n_clusters, n_samples) as a default. Default: 0.
streaming_batch_sizeint64_tNumber of samples to process per GPU batch when fitting with host data. When set to 0, defaults to n_samples (process all at once). Only used by the batched (host-data) code path and ignored by device-data overloads. Default: 0 (process all data at once).

cluster::kmeans::balanced_params

Simple object to specify hyper-parameters to the balanced k-means algorithm.

The following metrics are currently supported in k-means balanced:

  • CosineExpanded
  • InnerProduct
  • L2Expanded
  • L2SqrtExpanded
1struct balanced_params : base_params { ... };

Fields

NameTypeDescription
n_itersuint32_tNumber of training iterations

cluster::kmeans::kmeans_type

Type of k-means algorithm.

1enum class kmeans_type { ... };

Values

NameValue
KMeans0
KMeansBalanced1

k-means clustering APIs

cluster::kmeans::fit

Find clusters with k-means algorithm using batched processing of host data.

1void fit(raft::resources const& handle,
2const cuvs::cluster::kmeans::params& params,
3raft::host_matrix_view<const float, int64_t> X,
4std::optional<raft::host_vector_view<const float, int64_t>> sample_weight,
5raft::device_matrix_view<float, int64_t> centroids,
6raft::host_scalar_view<float> inertia,
7raft::host_scalar_view<int64_t> n_iter);

TODO: Evaluate replacing the extent type with int64_t. Reference issue: https://github.com/rapidsai/cuvs/issues/1961

This overload supports out-of-core computation where the dataset resides on the host. Data is processed in GPU-sized batches, streaming from host to device. The batch size is controlled by params.streaming_batch_size.

Parameters

NameDirectionTypeDescription
handleinraft::resources const&The raft handle.
paramsinconst cuvs::cluster::kmeans::params&Parameters for KMeans model. Batch size is read from params.streaming_batch_size.
Xinraft::host_matrix_view<const float, int64_t>Training instances on HOST memory. The data must be in row-major format. [dim = n_samples x n_features]
sample_weightinstd::optional<raft::host_vector_view<const float, int64_t>>Optional weights for each observation in X (on host). [len = n_samples]
centroidsinoutraft::device_matrix_view<float, int64_t>[in] When init is InitMethod::Array, use centroids as the initial cluster centers. [out] The generated centroids from the kmeans algorithm are stored at the address pointed by ‘centroids’. [dim = n_clusters x n_features]
inertiaoutraft::host_scalar_view<float>Sum of squared distances of samples to their closest cluster center.
n_iteroutraft::host_scalar_view<int64_t>Number of iterations run.

Returns

void

Additional overload: cluster::kmeans::fit

Find clusters with k-means algorithm using batched processing of host data.

1void fit(raft::resources const& handle,
2const cuvs::cluster::kmeans::params& params,
3raft::host_matrix_view<const double, int64_t> X,
4std::optional<raft::host_vector_view<const double, int64_t>> sample_weight,
5raft::device_matrix_view<double, int64_t> centroids,
6raft::host_scalar_view<double> inertia,
7raft::host_scalar_view<int64_t> n_iter);

Parameters

NameDirectionTypeDescription
handleraft::resources const&
paramsconst cuvs::cluster::kmeans::params&
Xraft::host_matrix_view<const double, int64_t>
sample_weightstd::optional<raft::host_vector_view<const double, int64_t>>
centroidsraft::device_matrix_view<double, int64_t>
inertiaraft::host_scalar_view<double>
n_iterraft::host_scalar_view<int64_t>

Returns

void

Additional overload: cluster::kmeans::fit

Find clusters with k-means algorithm.

1void fit(raft::resources const& handle,
2const cuvs::cluster::kmeans::params& params,
3raft::device_matrix_view<const float, int> X,
4std::optional<raft::device_vector_view<const float, int>> sample_weight,
5raft::device_matrix_view<float, int> centroids,
6raft::host_scalar_view<float> inertia,
7raft::host_scalar_view<int> n_iter);

Initial centroids are chosen with k-means++ algorithm. Empty clusters are reinitialized by choosing new centroids with k-means++ algorithm.

Parameters

NameDirectionTypeDescription
handleinraft::resources const&The raft handle.
paramsinconst cuvs::cluster::kmeans::params&Parameters for KMeans model.
Xinraft::device_matrix_view<const float, int>Training instances to cluster. The data must be in row-major format. [dim = n_samples x n_features]
sample_weightinstd::optional<raft::device_vector_view<const float, int>>Optional weights for each observation in X. [len = n_samples]
centroidsinoutraft::device_matrix_view<float, int>[in] When init is InitMethod::Array, use centroids as the initial cluster centers. [out] The generated centroids from the kmeans algorithm are stored at the address pointed by ‘centroids’. [dim = n_clusters x n_features]
inertiaoutraft::host_scalar_view<float>Sum of squared distances of samples to their closest cluster center.
n_iteroutraft::host_scalar_view<int>Number of iterations run.

Returns

void

Additional overload: cluster::kmeans::fit

Find clusters with k-means algorithm.

1void fit(raft::resources const& handle,
2const cuvs::cluster::kmeans::params& params,
3raft::device_matrix_view<const float, int64_t> X,
4std::optional<raft::device_vector_view<const float, int64_t>> sample_weight,
5raft::device_matrix_view<float, int64_t> centroids,
6raft::host_scalar_view<float> inertia,
7raft::host_scalar_view<int64_t> n_iter);

Initial centroids are chosen with k-means++ algorithm. Empty clusters are reinitialized by choosing new centroids with k-means++ algorithm.

Parameters

NameDirectionTypeDescription
handleinraft::resources const&The raft handle.
paramsinconst cuvs::cluster::kmeans::params&Parameters for KMeans model.
Xinraft::device_matrix_view<const float, int64_t>Training instances to cluster. The data must be in row-major format. [dim = n_samples x n_features]
sample_weightinstd::optional<raft::device_vector_view<const float, int64_t>>Optional weights for each observation in X. [len = n_samples]
centroidsinoutraft::device_matrix_view<float, int64_t>[in] When init is InitMethod::Array, use centroids as the initial cluster centers. [out] The generated centroids from the kmeans algorithm are stored at the address pointed by ‘centroids’. [dim = n_clusters x n_features]
inertiaoutraft::host_scalar_view<float>Sum of squared distances of samples to their closest cluster center.
n_iteroutraft::host_scalar_view<int64_t>Number of iterations run.

Returns

void

Additional overload: cluster::kmeans::fit

Find clusters with k-means algorithm.

1void fit(raft::resources const& handle,
2const cuvs::cluster::kmeans::params& params,
3raft::device_matrix_view<const double, int> X,
4std::optional<raft::device_vector_view<const double, int>> sample_weight,
5raft::device_matrix_view<double, int> centroids,
6raft::host_scalar_view<double> inertia,
7raft::host_scalar_view<int> n_iter);

Initial centroids are chosen with k-means++ algorithm. Empty clusters are reinitialized by choosing new centroids with k-means++ algorithm.

Parameters

NameDirectionTypeDescription
handleinraft::resources const&The raft handle.
paramsinconst cuvs::cluster::kmeans::params&Parameters for KMeans model.
Xinraft::device_matrix_view<const double, int>Training instances to cluster. The data must be in row-major format. [dim = n_samples x n_features]
sample_weightinstd::optional<raft::device_vector_view<const double, int>>Optional weights for each observation in X. [len = n_samples]
centroidsinoutraft::device_matrix_view<double, int>[in] When init is InitMethod::Array, use centroids as the initial cluster centers. [out] The generated centroids from the kmeans algorithm are stored at the address pointed by ‘centroids’. [dim = n_clusters x n_features]
inertiaoutraft::host_scalar_view<double>Sum of squared distances of samples to their closest cluster center.
n_iteroutraft::host_scalar_view<int>Number of iterations run.

Returns

void

Additional overload: cluster::kmeans::fit

Find clusters with k-means algorithm.

1void fit(raft::resources const& handle,
2const cuvs::cluster::kmeans::params& params,
3raft::device_matrix_view<const double, int64_t> X,
4std::optional<raft::device_vector_view<const double, int64_t>> sample_weight,
5raft::device_matrix_view<double, int64_t> centroids,
6raft::host_scalar_view<double> inertia,
7raft::host_scalar_view<int64_t> n_iter);

Initial centroids are chosen with k-means++ algorithm. Empty clusters are reinitialized by choosing new centroids with k-means++ algorithm.

Parameters

NameDirectionTypeDescription
handleinraft::resources const&The raft handle.
paramsinconst cuvs::cluster::kmeans::params&Parameters for KMeans model.
Xinraft::device_matrix_view<const double, int64_t>Training instances to cluster. The data must be in row-major format. [dim = n_samples x n_features]
sample_weightinstd::optional<raft::device_vector_view<const double, int64_t>>Optional weights for each observation in X. [len = n_samples]
centroidsinoutraft::device_matrix_view<double, int64_t>[in] When init is InitMethod::Array, use centroids as the initial cluster centers. [out] The generated centroids from the kmeans algorithm are stored at the address pointed by ‘centroids’. [dim = n_clusters x n_features]
inertiaoutraft::host_scalar_view<double>Sum of squared distances of samples to their closest cluster center.
n_iteroutraft::host_scalar_view<int64_t>Number of iterations run.

Returns

void

Additional overload: cluster::kmeans::fit

Find clusters with k-means algorithm.

1void fit(raft::resources const& handle,
2const cuvs::cluster::kmeans::params& params,
3raft::device_matrix_view<const int8_t, int> X,
4std::optional<raft::device_vector_view<const int8_t, int>> sample_weight,
5raft::device_matrix_view<int8_t, int> centroids,
6raft::host_scalar_view<int8_t> inertia,
7raft::host_scalar_view<int> n_iter);

Initial centroids are chosen with k-means++ algorithm. Empty clusters are reinitialized by choosing new centroids with k-means++ algorithm.

Parameters

NameDirectionTypeDescription
handleinraft::resources const&The raft handle.
paramsinconst cuvs::cluster::kmeans::params&Parameters for KMeans model.
Xinraft::device_matrix_view<const int8_t, int>Training instances to cluster. The data must be in row-major format. [dim = n_samples x n_features]
sample_weightinstd::optional<raft::device_vector_view<const int8_t, int>>Optional weights for each observation in X. [len = n_samples]
centroidsinoutraft::device_matrix_view<int8_t, int>[in] When init is InitMethod::Array, use centroids as the initial cluster centers. [out] The generated centroids from the kmeans algorithm are stored at the address pointed by ‘centroids’. [dim = n_clusters x n_features]
inertiaoutraft::host_scalar_view<int8_t>Sum of squared distances of samples to their closest cluster center.
n_iteroutraft::host_scalar_view<int>Number of iterations run.

Returns

void

Additional overload: cluster::kmeans::fit

Find balanced clusters with k-means algorithm.

1void fit(const raft::resources& handle,
2cuvs::cluster::kmeans::balanced_params const& params,
3raft::device_matrix_view<const float, int64_t> X,
4raft::device_matrix_view<float, int64_t> centroids,
5std::optional<raft::host_scalar_view<float>> inertia = std::nullopt);

Parameters

NameDirectionTypeDescription
handleinconst raft::resources&The raft handle.
paramsincuvs::cluster::kmeans::balanced_params const&Parameters for KMeans model.
Xinraft::device_matrix_view<const float, int64_t>Training instances to cluster. The data must be in row-major format. [dim = n_samples x n_features]
centroidsoutraft::device_matrix_view<float, int64_t>[out] The generated centroids from the kmeans algorithm are stored at the address pointed by ‘centroids’. [dim = n_clusters x n_features]
inertiaoutstd::optional<raft::host_scalar_view<float>>Sum of squared distances of samples to their closest cluster center. Default: std::nullopt.

Returns

void

Additional overload: cluster::kmeans::fit

Find balanced clusters with k-means algorithm.

1void fit(const raft::resources& handle,
2cuvs::cluster::kmeans::balanced_params const& params,
3raft::device_matrix_view<const int8_t, int64_t> X,
4raft::device_matrix_view<float, int64_t> centroids,
5std::optional<raft::host_scalar_view<float>> inertia = std::nullopt);

Parameters

NameDirectionTypeDescription
handleinconst raft::resources&The raft handle.
paramsincuvs::cluster::kmeans::balanced_params const&Parameters for KMeans model.
Xinraft::device_matrix_view<const int8_t, int64_t>Training instances to cluster. The data must be in row-major format. [dim = n_samples x n_features]
centroidsinoutraft::device_matrix_view<float, int64_t>[out] The generated centroids from the kmeans algorithm are stored at the address pointed by ‘centroids’. [dim = n_clusters x n_features]
inertiaoutstd::optional<raft::host_scalar_view<float>>Sum of squared distances of samples to their closest cluster center. Default: std::nullopt.

Returns

void

Additional overload: cluster::kmeans::fit

Find balanced clusters with k-means algorithm.

1void fit(const raft::resources& handle,
2cuvs::cluster::kmeans::balanced_params const& params,
3raft::device_matrix_view<const half, int64_t> X,
4raft::device_matrix_view<float, int64_t> centroids,
5std::optional<raft::host_scalar_view<float>> inertia = std::nullopt);

Parameters

NameDirectionTypeDescription
handleinconst raft::resources&The raft handle.
paramsincuvs::cluster::kmeans::balanced_params const&Parameters for KMeans model.
Xinraft::device_matrix_view<const half, int64_t>Training instances to cluster. The data must be in row-major format. [dim = n_samples x n_features]
centroidsinoutraft::device_matrix_view<float, int64_t>[out] The generated centroids from the kmeans algorithm are stored at the address pointed by ‘centroids’. [dim = n_clusters x n_features]
inertiaoutstd::optional<raft::host_scalar_view<float>>Sum of squared distances of samples to their closest cluster center. Default: std::nullopt.

Returns

void

Additional overload: cluster::kmeans::fit

Find balanced clusters with k-means algorithm.

1void fit(const raft::resources& handle,
2cuvs::cluster::kmeans::balanced_params const& params,
3raft::device_matrix_view<const uint8_t, int64_t> X,
4raft::device_matrix_view<float, int64_t> centroids,
5std::optional<raft::host_scalar_view<float>> inertia = std::nullopt);

Parameters

NameDirectionTypeDescription
handleinconst raft::resources&The raft handle.
paramsincuvs::cluster::kmeans::balanced_params const&Parameters for KMeans model.
Xinraft::device_matrix_view<const uint8_t, int64_t>Training instances to cluster. The data must be in row-major format. [dim = n_samples x n_features]
centroidsinoutraft::device_matrix_view<float, int64_t>[out] The generated centroids from the kmeans algorithm are stored at the address pointed by ‘centroids’. [dim = n_clusters x n_features]
inertiaoutstd::optional<raft::host_scalar_view<float>>Sum of squared distances of samples to their closest cluster center. Default: std::nullopt.

Returns

void

cluster::kmeans::predict

Predict the closest cluster each sample in X belongs to.

1void predict(raft::resources const& handle,
2const kmeans::params& params,
3raft::device_matrix_view<const float, int> X,
4std::optional<raft::device_vector_view<const float, int>> sample_weight,
5raft::device_matrix_view<const float, int> centroids,
6raft::device_vector_view<int, int> labels,
7bool normalize_weight,
8raft::host_scalar_view<float> inertia);

Parameters

NameDirectionTypeDescription
handleinraft::resources const&The raft handle.
paramsinconst kmeans::params&Parameters for KMeans model.
Xinraft::device_matrix_view<const float, int>New data to predict. [dim = n_samples x n_features]
sample_weightinstd::optional<raft::device_vector_view<const float, int>>Optional weights for each observation in X. [len = n_samples]
centroidsinraft::device_matrix_view<const float, int>Cluster centroids. The data must be in row-major format. [dim = n_clusters x n_features]
labelsoutraft::device_vector_view<int, int>Index of the cluster each sample in X belongs to. [len = n_samples]
normalize_weightinboolTrue if the weights should be normalized
inertiaoutraft::host_scalar_view<float>Sum of squared distances of samples to their closest cluster center.

Returns

void

Additional overload: cluster::kmeans::predict

Predict the closest cluster each sample in X belongs to.

1void predict(raft::resources const& handle,
2const kmeans::params& params,
3raft::device_matrix_view<const float, int64_t> X,
4std::optional<raft::device_vector_view<const float, int64_t>> sample_weight,
5raft::device_matrix_view<const float, int64_t> centroids,
6raft::device_vector_view<int64_t, int64_t> labels,
7bool normalize_weight,
8raft::host_scalar_view<float> inertia);

Parameters

NameDirectionTypeDescription
handleinraft::resources const&The raft handle.
paramsinconst kmeans::params&Parameters for KMeans model.
Xinraft::device_matrix_view<const float, int64_t>New data to predict. [dim = n_samples x n_features]
sample_weightinstd::optional<raft::device_vector_view<const float, int64_t>>Optional weights for each observation in X. [len = n_samples]
centroidsinraft::device_matrix_view<const float, int64_t>Cluster centroids. The data must be in row-major format. [dim = n_clusters x n_features]
labelsoutraft::device_vector_view<int64_t, int64_t>Index of the cluster each sample in X belongs to. [len = n_samples]
normalize_weightinboolTrue if the weights should be normalized
inertiaoutraft::host_scalar_view<float>Sum of squared distances of samples to their closest cluster center.

Returns

void

Additional overload: cluster::kmeans::predict

Predict the closest cluster each sample in X belongs to.

1void predict(raft::resources const& handle,
2const kmeans::params& params,
3raft::device_matrix_view<const double, int> X,
4std::optional<raft::device_vector_view<const double, int>> sample_weight,
5raft::device_matrix_view<const double, int> centroids,
6raft::device_vector_view<int, int> labels,
7bool normalize_weight,
8raft::host_scalar_view<double> inertia);

Parameters

NameDirectionTypeDescription
handleinraft::resources const&The raft handle.
paramsinconst kmeans::params&Parameters for KMeans model.
Xinraft::device_matrix_view<const double, int>New data to predict. [dim = n_samples x n_features]
sample_weightinstd::optional<raft::device_vector_view<const double, int>>Optional weights for each observation in X. [len = n_samples]
centroidsinraft::device_matrix_view<const double, int>Cluster centroids. The data must be in row-major format. [dim = n_clusters x n_features]
labelsoutraft::device_vector_view<int, int>Index of the cluster each sample in X belongs to. [len = n_samples]
normalize_weightinboolTrue if the weights should be normalized
inertiaoutraft::host_scalar_view<double>Sum of squared distances of samples to their closest cluster center.

Returns

void

Additional overload: cluster::kmeans::predict

Predict the closest cluster each sample in X belongs to.

1void predict(raft::resources const& handle,
2const kmeans::params& params,
3raft::device_matrix_view<const double, int64_t> X,
4std::optional<raft::device_vector_view<const double, int64_t>> sample_weight,
5raft::device_matrix_view<const double, int64_t> centroids,
6raft::device_vector_view<int64_t, int64_t> labels,
7bool normalize_weight,
8raft::host_scalar_view<double> inertia);

Parameters

NameDirectionTypeDescription
handleinraft::resources const&The raft handle.
paramsinconst kmeans::params&Parameters for KMeans model.
Xinraft::device_matrix_view<const double, int64_t>New data to predict. [dim = n_samples x n_features]
sample_weightinstd::optional<raft::device_vector_view<const double, int64_t>>Optional weights for each observation in X. [len = n_samples]
centroidsinraft::device_matrix_view<const double, int64_t>Cluster centroids. The data must be in row-major format. [dim = n_clusters x n_features]
labelsoutraft::device_vector_view<int64_t, int64_t>Index of the cluster each sample in X belongs to. [len = n_samples]
normalize_weightinboolTrue if the weights should be normalized
inertiaoutraft::host_scalar_view<double>Sum of squared distances of samples to their closest cluster center.

Returns

void

Additional overload: cluster::kmeans::predict

Predict the closest cluster each sample in X belongs to.

1void predict(const raft::resources& handle,
2cuvs::cluster::kmeans::balanced_params const& params,
3raft::device_matrix_view<const int8_t, int64_t> X,
4raft::device_matrix_view<const float, int64_t> centroids,
5raft::device_vector_view<uint32_t, int64_t> labels);

Parameters

NameDirectionTypeDescription
handleinconst raft::resources&The raft handle.
paramsincuvs::cluster::kmeans::balanced_params const&Parameters for KMeans model.
Xinraft::device_matrix_view<const int8_t, int64_t>New data to predict. [dim = n_samples x n_features]
centroidsinraft::device_matrix_view<const float, int64_t>Cluster centroids. The data must be in row-major format. [dim = n_clusters x n_features]
labelsoutraft::device_vector_view<uint32_t, int64_t>Index of the cluster each sample in X belongs to. [len = n_samples]

Returns

void

Additional overload: cluster::kmeans::predict

Predict the closest cluster each sample in X belongs to.

1void predict(const raft::resources& handle,
2cuvs::cluster::kmeans::balanced_params const& params,
3raft::device_matrix_view<const int8_t, int64_t> X,
4raft::device_matrix_view<const float, int64_t> centroids,
5raft::device_vector_view<int, int64_t> labels);

Parameters

NameDirectionTypeDescription
handleinconst raft::resources&The raft handle.
paramsincuvs::cluster::kmeans::balanced_params const&Parameters for KMeans model.
Xinraft::device_matrix_view<const int8_t, int64_t>New data to predict. [dim = n_samples x n_features]
centroidsinraft::device_matrix_view<const float, int64_t>Cluster centroids. The data must be in row-major format. [dim = n_clusters x n_features]
labelsoutraft::device_vector_view<int, int64_t>Index of the cluster each sample in X belongs to. [len = n_samples]

Returns

void

Additional overload: cluster::kmeans::predict

Predict the closest cluster each sample in X belongs to.

1void predict(const raft::resources& handle,
2cuvs::cluster::kmeans::balanced_params const& params,
3raft::device_matrix_view<const float, int64_t> X,
4raft::device_matrix_view<const float, int64_t> centroids,
5raft::device_vector_view<int, int64_t> labels);

Parameters

NameDirectionTypeDescription
handleinconst raft::resources&The raft handle.
paramsincuvs::cluster::kmeans::balanced_params const&Parameters for KMeans model.
Xinraft::device_matrix_view<const float, int64_t>New data to predict. [dim = n_samples x n_features]
centroidsinraft::device_matrix_view<const float, int64_t>Cluster centroids. The data must be in row-major format. [dim = n_clusters x n_features]
labelsoutraft::device_vector_view<int, int64_t>Index of the cluster each sample in X belongs to. [len = n_samples]

Returns

void

Additional overload: cluster::kmeans::predict

Predict the closest cluster each sample in X belongs to.

1void predict(const raft::resources& handle,
2cuvs::cluster::kmeans::balanced_params const& params,
3raft::device_matrix_view<const float, int64_t> X,
4raft::device_matrix_view<const float, int64_t> centroids,
5raft::device_vector_view<uint32_t, int64_t> labels);

Parameters

NameDirectionTypeDescription
handleinconst raft::resources&The raft handle.
paramsincuvs::cluster::kmeans::balanced_params const&Parameters for KMeans model.
Xinraft::device_matrix_view<const float, int64_t>New data to predict. [dim = n_samples x n_features]
centroidsinraft::device_matrix_view<const float, int64_t>Cluster centroids. The data must be in row-major format. [dim = n_clusters x n_features]
labelsoutraft::device_vector_view<uint32_t, int64_t>Index of the cluster each sample in X belongs to. [len = n_samples]

Returns

void

Additional overload: cluster::kmeans::predict

Predict the closest cluster each sample in X belongs to.

1void predict(const raft::resources& handle,
2cuvs::cluster::kmeans::balanced_params const& params,
3raft::device_matrix_view<const half, int64_t> X,
4raft::device_matrix_view<const float, int64_t> centroids,
5raft::device_vector_view<uint32_t, int64_t> labels);

Parameters

NameDirectionTypeDescription
handleinconst raft::resources&The raft handle.
paramsincuvs::cluster::kmeans::balanced_params const&Parameters for KMeans model.
Xinraft::device_matrix_view<const half, int64_t>New data to predict. [dim = n_samples x n_features]
centroidsinraft::device_matrix_view<const float, int64_t>Cluster centroids. The data must be in row-major format. [dim = n_clusters x n_features]
labelsoutraft::device_vector_view<uint32_t, int64_t>Index of the cluster each sample in X belongs to. [len = n_samples]

Returns

void

Additional overload: cluster::kmeans::predict

Predict the closest cluster each sample in X belongs to.

1void predict(const raft::resources& handle,
2cuvs::cluster::kmeans::balanced_params const& params,
3raft::device_matrix_view<const uint8_t, int64_t> X,
4raft::device_matrix_view<const float, int64_t> centroids,
5raft::device_vector_view<uint32_t, int64_t> labels);

Parameters

NameDirectionTypeDescription
handleinconst raft::resources&The raft handle.
paramsincuvs::cluster::kmeans::balanced_params const&Parameters for KMeans model.
Xinraft::device_matrix_view<const uint8_t, int64_t>New data to predict. [dim = n_samples x n_features]
centroidsinraft::device_matrix_view<const float, int64_t>Cluster centroids. The data must be in row-major format. [dim = n_clusters x n_features]
labelsoutraft::device_vector_view<uint32_t, int64_t>Index of the cluster each sample in X belongs to. [len = n_samples]

Returns

void

cluster::kmeans::fit_predict

Compute k-means clustering and predicts cluster index for each sample

1void fit_predict(raft::resources const& handle,
2const kmeans::params& params,
3raft::device_matrix_view<const float, int> X,
4std::optional<raft::device_vector_view<const float, int>> sample_weight,
5std::optional<raft::device_matrix_view<float, int>> centroids,
6raft::device_vector_view<int, int> labels,
7raft::host_scalar_view<float> inertia,
8raft::host_scalar_view<int> n_iter);

in the input.

Parameters

NameDirectionTypeDescription
handleinraft::resources const&The raft handle.
paramsinconst kmeans::params&Parameters for KMeans model.
Xinraft::device_matrix_view<const float, int>Training instances to cluster. The data must be in row-major format. [dim = n_samples x n_features]
sample_weightinstd::optional<raft::device_vector_view<const float, int>>Optional weights for each observation in X. [len = n_samples]
centroidsinoutstd::optional<raft::device_matrix_view<float, int>>Optional [in] When init is InitMethod::Array, use centroids as the initial cluster centers [out] The generated centroids from the kmeans algorithm are stored at the address pointed by ‘centroids’. [dim = n_clusters x n_features]
labelsoutraft::device_vector_view<int, int>Index of the cluster each sample in X belongs to. [len = n_samples]
inertiaoutraft::host_scalar_view<float>Sum of squared distances of samples to their closest cluster center.
n_iteroutraft::host_scalar_view<int>Number of iterations run.

Returns

void

Additional overload: cluster::kmeans::fit_predict

Compute k-means clustering and predicts cluster index for each sample

1void fit_predict(raft::resources const& handle,
2const kmeans::params& params,
3raft::device_matrix_view<const float, int64_t> X,
4std::optional<raft::device_vector_view<const float, int64_t>> sample_weight,
5std::optional<raft::device_matrix_view<float, int64_t>> centroids,
6raft::device_vector_view<int64_t, int64_t> labels,
7raft::host_scalar_view<float> inertia,
8raft::host_scalar_view<int64_t> n_iter);

in the input.

Parameters

NameDirectionTypeDescription
handleinraft::resources const&The raft handle.
paramsinconst kmeans::params&Parameters for KMeans model.
Xinraft::device_matrix_view<const float, int64_t>Training instances to cluster. The data must be in row-major format. [dim = n_samples x n_features]
sample_weightinstd::optional<raft::device_vector_view<const float, int64_t>>Optional weights for each observation in X. [len = n_samples]
centroidsinoutstd::optional<raft::device_matrix_view<float, int64_t>>Optional [in] When init is InitMethod::Array, use centroids as the initial cluster centers [out] The generated centroids from the kmeans algorithm are stored at the address pointed by ‘centroids’. [dim = n_clusters x n_features]
labelsoutraft::device_vector_view<int64_t, int64_t>Index of the cluster each sample in X belongs to. [len = n_samples]
inertiaoutraft::host_scalar_view<float>Sum of squared distances of samples to their closest cluster center.
n_iteroutraft::host_scalar_view<int64_t>Number of iterations run.

Returns

void

Additional overload: cluster::kmeans::fit_predict

Compute k-means clustering and predicts cluster index for each sample

1void fit_predict(raft::resources const& handle,
2const kmeans::params& params,
3raft::device_matrix_view<const double, int> X,
4std::optional<raft::device_vector_view<const double, int>> sample_weight,
5std::optional<raft::device_matrix_view<double, int>> centroids,
6raft::device_vector_view<int, int> labels,
7raft::host_scalar_view<double> inertia,
8raft::host_scalar_view<int> n_iter);

in the input.

Parameters

NameDirectionTypeDescription
handleinraft::resources const&The raft handle.
paramsinconst kmeans::params&Parameters for KMeans model.
Xinraft::device_matrix_view<const double, int>Training instances to cluster. The data must be in row-major format. [dim = n_samples x n_features]
sample_weightinstd::optional<raft::device_vector_view<const double, int>>Optional weights for each observation in X. [len = n_samples]
centroidsinoutstd::optional<raft::device_matrix_view<double, int>>Optional [in] When init is InitMethod::Array, use centroids as the initial cluster centers [out] The generated centroids from the kmeans algorithm are stored at the address pointed by ‘centroids’. [dim = n_clusters x n_features]
labelsoutraft::device_vector_view<int, int>Index of the cluster each sample in X belongs to. [len = n_samples]
inertiaoutraft::host_scalar_view<double>Sum of squared distances of samples to their closest cluster center.
n_iteroutraft::host_scalar_view<int>Number of iterations run.

Returns

void

Additional overload: cluster::kmeans::fit_predict

Compute k-means clustering and predicts cluster index for each sample

1void fit_predict(raft::resources const& handle,
2const kmeans::params& params,
3raft::device_matrix_view<const double, int64_t> X,
4std::optional<raft::device_vector_view<const double, int64_t>> sample_weight,
5std::optional<raft::device_matrix_view<double, int64_t>> centroids,
6raft::device_vector_view<int64_t, int64_t> labels,
7raft::host_scalar_view<double> inertia,
8raft::host_scalar_view<int64_t> n_iter);

in the input.

Parameters

NameDirectionTypeDescription
handleinraft::resources const&The raft handle.
paramsinconst kmeans::params&Parameters for KMeans model.
Xinraft::device_matrix_view<const double, int64_t>Training instances to cluster. The data must be in row-major format. [dim = n_samples x n_features]
sample_weightinstd::optional<raft::device_vector_view<const double, int64_t>>Optional weights for each observation in X. [len = n_samples]
centroidsinoutstd::optional<raft::device_matrix_view<double, int64_t>>Optional [in] When init is InitMethod::Array, use centroids as the initial cluster centers [out] The generated centroids from the kmeans algorithm are stored at the address pointed by ‘centroids’. [dim = n_clusters x n_features]
labelsoutraft::device_vector_view<int64_t, int64_t>Index of the cluster each sample in X belongs to. [len = n_samples]
inertiaoutraft::host_scalar_view<double>Sum of squared distances of samples to their closest cluster center.
n_iteroutraft::host_scalar_view<int64_t>Number of iterations run.

Returns

void

Additional overload: cluster::kmeans::fit_predict

Compute balanced k-means clustering and predicts cluster index for each sample

1void fit_predict(const raft::resources& handle,
2cuvs::cluster::kmeans::balanced_params const& params,
3raft::device_matrix_view<const float, int64_t> X,
4raft::device_matrix_view<float, int64_t> centroids,
5raft::device_vector_view<uint32_t, int64_t> labels);

in the input.

Parameters

NameDirectionTypeDescription
handleinconst raft::resources&The raft handle.
paramsincuvs::cluster::kmeans::balanced_params const&Parameters for KMeans model.
Xinraft::device_matrix_view<const float, int64_t>Training instances to cluster. The data must be in row-major format. [dim = n_samples x n_features]
centroidsinoutraft::device_matrix_view<float, int64_t>Optional [in] When init is InitMethod::Array, use centroids as the initial cluster centers [out] The generated centroids from the kmeans algorithm are stored at the address pointed by ‘centroids’. [dim = n_clusters x n_features]
labelsoutraft::device_vector_view<uint32_t, int64_t>Index of the cluster each sample in X belongs to. [len = n_samples]

Returns

void

Additional overload: cluster::kmeans::fit_predict

Compute balanced k-means clustering and predicts cluster index for each sample

1void fit_predict(const raft::resources& handle,
2cuvs::cluster::kmeans::balanced_params const& params,
3raft::device_matrix_view<const int8_t, int64_t> X,
4raft::device_matrix_view<float, int64_t> centroids,
5raft::device_vector_view<uint32_t, int64_t> labels);

in the input.

Parameters

NameDirectionTypeDescription
handleinconst raft::resources&The raft handle.
paramsincuvs::cluster::kmeans::balanced_params const&Parameters for KMeans model.
Xinraft::device_matrix_view<const int8_t, int64_t>Training instances to cluster. The data must be in row-major format. [dim = n_samples x n_features]
centroidsinoutraft::device_matrix_view<float, int64_t>Optional [in] When init is InitMethod::Array, use centroids as the initial cluster centers [out] The generated centroids from the kmeans algorithm are stored at the address pointed by ‘centroids’. [dim = n_clusters x n_features]
labelsoutraft::device_vector_view<uint32_t, int64_t>Index of the cluster each sample in X belongs to. [len = n_samples]

Returns

void

cluster::kmeans::transform

Transform X to a cluster-distance space.

1void transform(raft::resources const& handle,
2const kmeans::params& params,
3raft::device_matrix_view<const float, int> X,
4raft::device_matrix_view<const float, int> centroids,
5raft::device_matrix_view<float, int> X_new);

Parameters

NameDirectionTypeDescription
handleinraft::resources const&The raft handle.
paramsinconst kmeans::params&Parameters for KMeans model.
Xinraft::device_matrix_view<const float, int>Training instances to cluster. The data must be in row-major format [dim = n_samples x n_features]
centroidsinraft::device_matrix_view<const float, int>Cluster centroids. The data must be in row-major format. [dim = n_clusters x n_features]
X_newoutraft::device_matrix_view<float, int>X transformed in the new space. [dim = n_samples x n_features]

Returns

void

Additional overload: cluster::kmeans::transform

Transform X to a cluster-distance space.

1void transform(raft::resources const& handle,
2const kmeans::params& params,
3raft::device_matrix_view<const double, int> X,
4raft::device_matrix_view<const double, int> centroids,
5raft::device_matrix_view<double, int> X_new);

Parameters

NameDirectionTypeDescription
handleinraft::resources const&The raft handle.
paramsinconst kmeans::params&Parameters for KMeans model.
Xinraft::device_matrix_view<const double, int>Training instances to cluster. The data must be in row-major format [dim = n_samples x n_features]
centroidsinraft::device_matrix_view<const double, int>Cluster centroids. The data must be in row-major format. [dim = n_clusters x n_features]
X_newoutraft::device_matrix_view<double, int>X transformed in the new space. [dim = n_samples x n_features]

Returns

void

cluster::kmeans::cluster_cost

Compute (optionally weighted) cluster cost

1void cluster_cost(
2const raft::resources& handle,
3raft::device_matrix_view<const float, int> X,
4raft::device_matrix_view<const float, int> centroids,
5raft::host_scalar_view<float> cost,
6std::optional<raft::device_vector_view<const float, int>> sample_weight = std::nullopt);

Parameters

NameDirectionTypeDescription
handleinconst raft::resources&The raft handle
Xinraft::device_matrix_view<const float, int>Training instances to cluster. The data must be in row-major format. [dim = n_samples x n_features]
centroidsinraft::device_matrix_view<const float, int>Cluster centroids. The data must be in row-major format. [dim = n_clusters x n_features]
costoutraft::host_scalar_view<float>Resulting cluster cost
sample_weightinstd::optional<raft::device_vector_view<const float, int>>Optional per-sample weights. [len = n_samples] Default: std::nullopt.

Returns

void

Additional overload: cluster::kmeans::cluster_cost

Compute cluster cost

1void cluster_cost(
2const raft::resources& handle,
3raft::device_matrix_view<const double, int> X,
4raft::device_matrix_view<const double, int> centroids,
5raft::host_scalar_view<double> cost,
6std::optional<raft::device_vector_view<const double, int>> sample_weight = std::nullopt);

Parameters

NameDirectionTypeDescription
handleinconst raft::resources&The raft handle
Xinraft::device_matrix_view<const double, int>Training instances to cluster. The data must be in row-major format. [dim = n_samples x n_features]
centroidsinraft::device_matrix_view<const double, int>Cluster centroids. The data must be in row-major format. [dim = n_clusters x n_features]
costoutraft::host_scalar_view<double>Resulting cluster cost
sample_weightinstd::optional<raft::device_vector_view<const double, int>>Optional per-sample weights. [len = n_samples] Default: std::nullopt.

Returns

void

Additional overload: cluster::kmeans::cluster_cost

Compute (optionally weighted) cluster cost

1void cluster_cost(
2const raft::resources& handle,
3raft::device_matrix_view<const float, int64_t> X,
4raft::device_matrix_view<const float, int64_t> centroids,
5raft::host_scalar_view<float> cost,
6std::optional<raft::device_vector_view<const float, int64_t>> sample_weight = std::nullopt);

Parameters

NameDirectionTypeDescription
handleinconst raft::resources&The raft handle
Xinraft::device_matrix_view<const float, int64_t>Training instances to cluster. The data must be in row-major format. [dim = n_samples x n_features]
centroidsinraft::device_matrix_view<const float, int64_t>Cluster centroids. The data must be in row-major format. [dim = n_clusters x n_features]
costoutraft::host_scalar_view<float>Resulting cluster cost
sample_weightinstd::optional<raft::device_vector_view<const float, int64_t>>Optional per-sample weights. [len = n_samples] Default: std::nullopt.

Returns

void

Additional overload: cluster::kmeans::cluster_cost

Compute (optionally weighted) cluster cost

1void cluster_cost(
2const raft::resources& handle,
3raft::device_matrix_view<const double, int64_t> X,
4raft::device_matrix_view<const double, int64_t> centroids,
5raft::host_scalar_view<double> cost,
6std::optional<raft::device_vector_view<const double, int64_t>> sample_weight = std::nullopt);

Parameters

NameDirectionTypeDescription
handleinconst raft::resources&The raft handle
Xinraft::device_matrix_view<const double, int64_t>Training instances to cluster. The data must be in row-major format. [dim = n_samples x n_features]
centroidsinraft::device_matrix_view<const double, int64_t>Cluster centroids. The data must be in row-major format. [dim = n_clusters x n_features]
costoutraft::host_scalar_view<double>Resulting cluster cost
sample_weightinstd::optional<raft::device_vector_view<const double, int64_t>>Optional per-sample weights. [len = n_samples] Default: std::nullopt.

Returns

void

k-means API helpers

cluster::kmeans::helpers::find_k

Automatically find the optimal value of k using a binary search.

1void find_k(raft::resources const& handle,
2raft::device_matrix_view<const float, int> X,
3raft::host_scalar_view<int> best_k,
4raft::host_scalar_view<float> inertia,
5raft::host_scalar_view<int> n_iter,
6int kmax,
7int kmin = 1,
8int maxiter = 100,
9float tol = 1e-3);

This method maximizes the Calinski-Harabasz Index while minimizing the per-cluster inertia.

Parameters

NameDirectionTypeDescription
handleraft::resources const&raft handle
Xraft::device_matrix_view<const float, int>input observations (shape n_samples, n_dims)
best_kraft::host_scalar_view<int>best k found from binary search
inertiaraft::host_scalar_view<float>inertia of best k found
n_iterraft::host_scalar_view<int>number of iterations used to find best k
kmaxintmaximum k to try in search
kminintminimum k to try in search (should be >= 1) Default: 1.
maxiterintmaximum number of iterations to run Default: 100.
tolfloattolerance for early stopping convergence Default: 1e-3.

Returns

void