K-Means
K-Means
Source header: cuvs/cluster/kmeans.hpp
Types
cluster::kmeans::base_params
Base structure for parameters that are common to all k-means algorithms
1 struct base_params { ... };
Fields
| Name | Type | Description |
|---|---|---|
metric | cuvs::distance::DistanceType | Metric to use for distance computation. The supported metrics can vary per algorithm. |
k-means hyperparameters
cluster::kmeans::params
Simple object to specify hyper-parameters to the kmeans algorithm.
1 struct params : base_params { ... };
Fields
| Name | Type | Description |
|---|---|---|
n_clusters | int | The number of clusters to form as well as the number of centroids to generate (default:8). |
init | InitMethod | Method for initialization, defaults to k-means++: - InitMethod::KMeansPlusPlus (k-means++): Use scalable k-means++ algorithm to select the initial cluster centers. - InitMethod::Random (random): Choose ‘n_clusters’ observations (rows) at random from the input data for the initial centroids. - InitMethod::Array (ndarray): Use ‘centroids’ as initial cluster centers. |
max_iter | int | Maximum number of iterations of the k-means algorithm for a single run. |
tol | double | Relative tolerance with regards to inertia to declare convergence. |
verbosity | rapids_logger::level_enum | verbosity level. |
rng_state | raft::random::RngState | Seed to the random number generator. |
n_init | int | Number of instance k-means algorithm will be run with different seeds. |
oversampling_factor | double | Oversampling factor for use in the k-means|| algorithm |
batch_samples | int | batch_samples and batch_centroids are used to tile 1NN computation which is useful to optimize/control the memory footprint Default tile is [batch_samples x n_clusters] i.e. when batch_centroids is 0 then don’t tile the centroids NB: These parameters are unrelated to streaming_batch_size, which controls how many samples to transfer from host to device per batch when processing out-of-core data. |
batch_centroids | int | if 0 then batch_centroids = n_clusters |
init_size | int64_t | Number of samples to randomly draw for the KMeansPlusPlus initialization step. A random subset of this size is used for centroid seeding. Only applies when dataset is on host; for device data the full dataset is always used for seeding and this parameter is ignored. When set to 0 (default) with host data uses min(3 * n_clusters, n_samples) as a default. Default: 0. |
streaming_batch_size | int64_t | Number of samples to process per GPU batch when fitting with host data. When set to 0, defaults to n_samples (process all at once). Only used by the batched (host-data) code path and ignored by device-data overloads. Default: 0 (process all data at once). |
cluster::kmeans::balanced_params
Simple object to specify hyper-parameters to the balanced k-means algorithm.
The following metrics are currently supported in k-means balanced:
- CosineExpanded
- InnerProduct
- L2Expanded
- L2SqrtExpanded
1 struct balanced_params : base_params { ... };
Fields
| Name | Type | Description |
|---|---|---|
n_iters | uint32_t | Number of training iterations |
cluster::kmeans::kmeans_type
Type of k-means algorithm.
1 enum class kmeans_type { ... };
Values
| Name | Value |
|---|---|
KMeans | 0 |
KMeansBalanced | 1 |
k-means clustering APIs
cluster::kmeans::fit
Find clusters with k-means algorithm using batched processing of host data.
1 void fit(raft::resources const& handle, 2 const cuvs::cluster::kmeans::params& params, 3 raft::host_matrix_view<const float, int64_t> X, 4 std::optional<raft::host_vector_view<const float, int64_t>> sample_weight, 5 raft::device_matrix_view<float, int64_t> centroids, 6 raft::host_scalar_view<float> inertia, 7 raft::host_scalar_view<int64_t> n_iter);
TODO: Evaluate replacing the extent type with int64_t. Reference issue: https://github.com/rapidsai/cuvs/issues/1961
This overload supports out-of-core computation where the dataset resides on the host. Data is processed in GPU-sized batches, streaming from host to device. The batch size is controlled by params.streaming_batch_size.
Parameters
| Name | Direction | Type | Description |
|---|---|---|---|
handle | in | raft::resources const& | The raft handle. |
params | in | const cuvs::cluster::kmeans::params& | Parameters for KMeans model. Batch size is read from params.streaming_batch_size. |
X | in | raft::host_matrix_view<const float, int64_t> | Training instances on HOST memory. The data must be in row-major format. [dim = n_samples x n_features] |
sample_weight | in | std::optional<raft::host_vector_view<const float, int64_t>> | Optional weights for each observation in X (on host). [len = n_samples] |
centroids | inout | raft::device_matrix_view<float, int64_t> | [in] When init is InitMethod::Array, use centroids as the initial cluster centers. [out] The generated centroids from the kmeans algorithm are stored at the address pointed by ‘centroids’. [dim = n_clusters x n_features] |
inertia | out | raft::host_scalar_view<float> | Sum of squared distances of samples to their closest cluster center. |
n_iter | out | raft::host_scalar_view<int64_t> | Number of iterations run. |
Returns
void
Additional overload: cluster::kmeans::fit
Find clusters with k-means algorithm using batched processing of host data.
1 void fit(raft::resources const& handle, 2 const cuvs::cluster::kmeans::params& params, 3 raft::host_matrix_view<const double, int64_t> X, 4 std::optional<raft::host_vector_view<const double, int64_t>> sample_weight, 5 raft::device_matrix_view<double, int64_t> centroids, 6 raft::host_scalar_view<double> inertia, 7 raft::host_scalar_view<int64_t> n_iter);
Parameters
| Name | Direction | Type | Description |
|---|---|---|---|
handle | raft::resources const& | ||
params | const cuvs::cluster::kmeans::params& | ||
X | raft::host_matrix_view<const double, int64_t> | ||
sample_weight | std::optional<raft::host_vector_view<const double, int64_t>> | ||
centroids | raft::device_matrix_view<double, int64_t> | ||
inertia | raft::host_scalar_view<double> | ||
n_iter | raft::host_scalar_view<int64_t> |
Returns
void
Additional overload: cluster::kmeans::fit
Find clusters with k-means algorithm.
1 void fit(raft::resources const& handle, 2 const cuvs::cluster::kmeans::params& params, 3 raft::device_matrix_view<const float, int> X, 4 std::optional<raft::device_vector_view<const float, int>> sample_weight, 5 raft::device_matrix_view<float, int> centroids, 6 raft::host_scalar_view<float> inertia, 7 raft::host_scalar_view<int> n_iter);
Initial centroids are chosen with k-means++ algorithm. Empty clusters are reinitialized by choosing new centroids with k-means++ algorithm.
Parameters
| Name | Direction | Type | Description |
|---|---|---|---|
handle | in | raft::resources const& | The raft handle. |
params | in | const cuvs::cluster::kmeans::params& | Parameters for KMeans model. |
X | in | raft::device_matrix_view<const float, int> | Training instances to cluster. The data must be in row-major format. [dim = n_samples x n_features] |
sample_weight | in | std::optional<raft::device_vector_view<const float, int>> | Optional weights for each observation in X. [len = n_samples] |
centroids | inout | raft::device_matrix_view<float, int> | [in] When init is InitMethod::Array, use centroids as the initial cluster centers. [out] The generated centroids from the kmeans algorithm are stored at the address pointed by ‘centroids’. [dim = n_clusters x n_features] |
inertia | out | raft::host_scalar_view<float> | Sum of squared distances of samples to their closest cluster center. |
n_iter | out | raft::host_scalar_view<int> | Number of iterations run. |
Returns
void
Additional overload: cluster::kmeans::fit
Find clusters with k-means algorithm.
1 void fit(raft::resources const& handle, 2 const cuvs::cluster::kmeans::params& params, 3 raft::device_matrix_view<const float, int64_t> X, 4 std::optional<raft::device_vector_view<const float, int64_t>> sample_weight, 5 raft::device_matrix_view<float, int64_t> centroids, 6 raft::host_scalar_view<float> inertia, 7 raft::host_scalar_view<int64_t> n_iter);
Initial centroids are chosen with k-means++ algorithm. Empty clusters are reinitialized by choosing new centroids with k-means++ algorithm.
Parameters
| Name | Direction | Type | Description |
|---|---|---|---|
handle | in | raft::resources const& | The raft handle. |
params | in | const cuvs::cluster::kmeans::params& | Parameters for KMeans model. |
X | in | raft::device_matrix_view<const float, int64_t> | Training instances to cluster. The data must be in row-major format. [dim = n_samples x n_features] |
sample_weight | in | std::optional<raft::device_vector_view<const float, int64_t>> | Optional weights for each observation in X. [len = n_samples] |
centroids | inout | raft::device_matrix_view<float, int64_t> | [in] When init is InitMethod::Array, use centroids as the initial cluster centers. [out] The generated centroids from the kmeans algorithm are stored at the address pointed by ‘centroids’. [dim = n_clusters x n_features] |
inertia | out | raft::host_scalar_view<float> | Sum of squared distances of samples to their closest cluster center. |
n_iter | out | raft::host_scalar_view<int64_t> | Number of iterations run. |
Returns
void
Additional overload: cluster::kmeans::fit
Find clusters with k-means algorithm.
1 void fit(raft::resources const& handle, 2 const cuvs::cluster::kmeans::params& params, 3 raft::device_matrix_view<const double, int> X, 4 std::optional<raft::device_vector_view<const double, int>> sample_weight, 5 raft::device_matrix_view<double, int> centroids, 6 raft::host_scalar_view<double> inertia, 7 raft::host_scalar_view<int> n_iter);
Initial centroids are chosen with k-means++ algorithm. Empty clusters are reinitialized by choosing new centroids with k-means++ algorithm.
Parameters
| Name | Direction | Type | Description |
|---|---|---|---|
handle | in | raft::resources const& | The raft handle. |
params | in | const cuvs::cluster::kmeans::params& | Parameters for KMeans model. |
X | in | raft::device_matrix_view<const double, int> | Training instances to cluster. The data must be in row-major format. [dim = n_samples x n_features] |
sample_weight | in | std::optional<raft::device_vector_view<const double, int>> | Optional weights for each observation in X. [len = n_samples] |
centroids | inout | raft::device_matrix_view<double, int> | [in] When init is InitMethod::Array, use centroids as the initial cluster centers. [out] The generated centroids from the kmeans algorithm are stored at the address pointed by ‘centroids’. [dim = n_clusters x n_features] |
inertia | out | raft::host_scalar_view<double> | Sum of squared distances of samples to their closest cluster center. |
n_iter | out | raft::host_scalar_view<int> | Number of iterations run. |
Returns
void
Additional overload: cluster::kmeans::fit
Find clusters with k-means algorithm.
1 void fit(raft::resources const& handle, 2 const cuvs::cluster::kmeans::params& params, 3 raft::device_matrix_view<const double, int64_t> X, 4 std::optional<raft::device_vector_view<const double, int64_t>> sample_weight, 5 raft::device_matrix_view<double, int64_t> centroids, 6 raft::host_scalar_view<double> inertia, 7 raft::host_scalar_view<int64_t> n_iter);
Initial centroids are chosen with k-means++ algorithm. Empty clusters are reinitialized by choosing new centroids with k-means++ algorithm.
Parameters
| Name | Direction | Type | Description |
|---|---|---|---|
handle | in | raft::resources const& | The raft handle. |
params | in | const cuvs::cluster::kmeans::params& | Parameters for KMeans model. |
X | in | raft::device_matrix_view<const double, int64_t> | Training instances to cluster. The data must be in row-major format. [dim = n_samples x n_features] |
sample_weight | in | std::optional<raft::device_vector_view<const double, int64_t>> | Optional weights for each observation in X. [len = n_samples] |
centroids | inout | raft::device_matrix_view<double, int64_t> | [in] When init is InitMethod::Array, use centroids as the initial cluster centers. [out] The generated centroids from the kmeans algorithm are stored at the address pointed by ‘centroids’. [dim = n_clusters x n_features] |
inertia | out | raft::host_scalar_view<double> | Sum of squared distances of samples to their closest cluster center. |
n_iter | out | raft::host_scalar_view<int64_t> | Number of iterations run. |
Returns
void
Additional overload: cluster::kmeans::fit
Find clusters with k-means algorithm.
1 void fit(raft::resources const& handle, 2 const cuvs::cluster::kmeans::params& params, 3 raft::device_matrix_view<const int8_t, int> X, 4 std::optional<raft::device_vector_view<const int8_t, int>> sample_weight, 5 raft::device_matrix_view<int8_t, int> centroids, 6 raft::host_scalar_view<int8_t> inertia, 7 raft::host_scalar_view<int> n_iter);
Initial centroids are chosen with k-means++ algorithm. Empty clusters are reinitialized by choosing new centroids with k-means++ algorithm.
Parameters
| Name | Direction | Type | Description |
|---|---|---|---|
handle | in | raft::resources const& | The raft handle. |
params | in | const cuvs::cluster::kmeans::params& | Parameters for KMeans model. |
X | in | raft::device_matrix_view<const int8_t, int> | Training instances to cluster. The data must be in row-major format. [dim = n_samples x n_features] |
sample_weight | in | std::optional<raft::device_vector_view<const int8_t, int>> | Optional weights for each observation in X. [len = n_samples] |
centroids | inout | raft::device_matrix_view<int8_t, int> | [in] When init is InitMethod::Array, use centroids as the initial cluster centers. [out] The generated centroids from the kmeans algorithm are stored at the address pointed by ‘centroids’. [dim = n_clusters x n_features] |
inertia | out | raft::host_scalar_view<int8_t> | Sum of squared distances of samples to their closest cluster center. |
n_iter | out | raft::host_scalar_view<int> | Number of iterations run. |
Returns
void
Additional overload: cluster::kmeans::fit
Find balanced clusters with k-means algorithm.
1 void fit(const raft::resources& handle, 2 cuvs::cluster::kmeans::balanced_params const& params, 3 raft::device_matrix_view<const float, int64_t> X, 4 raft::device_matrix_view<float, int64_t> centroids, 5 std::optional<raft::host_scalar_view<float>> inertia = std::nullopt);
Parameters
| Name | Direction | Type | Description |
|---|---|---|---|
handle | in | const raft::resources& | The raft handle. |
params | in | cuvs::cluster::kmeans::balanced_params const& | Parameters for KMeans model. |
X | in | raft::device_matrix_view<const float, int64_t> | Training instances to cluster. The data must be in row-major format. [dim = n_samples x n_features] |
centroids | out | raft::device_matrix_view<float, int64_t> | [out] The generated centroids from the kmeans algorithm are stored at the address pointed by ‘centroids’. [dim = n_clusters x n_features] |
inertia | out | std::optional<raft::host_scalar_view<float>> | Sum of squared distances of samples to their closest cluster center. Default: std::nullopt. |
Returns
void
Additional overload: cluster::kmeans::fit
Find balanced clusters with k-means algorithm.
1 void fit(const raft::resources& handle, 2 cuvs::cluster::kmeans::balanced_params const& params, 3 raft::device_matrix_view<const int8_t, int64_t> X, 4 raft::device_matrix_view<float, int64_t> centroids, 5 std::optional<raft::host_scalar_view<float>> inertia = std::nullopt);
Parameters
| Name | Direction | Type | Description |
|---|---|---|---|
handle | in | const raft::resources& | The raft handle. |
params | in | cuvs::cluster::kmeans::balanced_params const& | Parameters for KMeans model. |
X | in | raft::device_matrix_view<const int8_t, int64_t> | Training instances to cluster. The data must be in row-major format. [dim = n_samples x n_features] |
centroids | inout | raft::device_matrix_view<float, int64_t> | [out] The generated centroids from the kmeans algorithm are stored at the address pointed by ‘centroids’. [dim = n_clusters x n_features] |
inertia | out | std::optional<raft::host_scalar_view<float>> | Sum of squared distances of samples to their closest cluster center. Default: std::nullopt. |
Returns
void
Additional overload: cluster::kmeans::fit
Find balanced clusters with k-means algorithm.
1 void fit(const raft::resources& handle, 2 cuvs::cluster::kmeans::balanced_params const& params, 3 raft::device_matrix_view<const half, int64_t> X, 4 raft::device_matrix_view<float, int64_t> centroids, 5 std::optional<raft::host_scalar_view<float>> inertia = std::nullopt);
Parameters
| Name | Direction | Type | Description |
|---|---|---|---|
handle | in | const raft::resources& | The raft handle. |
params | in | cuvs::cluster::kmeans::balanced_params const& | Parameters for KMeans model. |
X | in | raft::device_matrix_view<const half, int64_t> | Training instances to cluster. The data must be in row-major format. [dim = n_samples x n_features] |
centroids | inout | raft::device_matrix_view<float, int64_t> | [out] The generated centroids from the kmeans algorithm are stored at the address pointed by ‘centroids’. [dim = n_clusters x n_features] |
inertia | out | std::optional<raft::host_scalar_view<float>> | Sum of squared distances of samples to their closest cluster center. Default: std::nullopt. |
Returns
void
Additional overload: cluster::kmeans::fit
Find balanced clusters with k-means algorithm.
1 void fit(const raft::resources& handle, 2 cuvs::cluster::kmeans::balanced_params const& params, 3 raft::device_matrix_view<const uint8_t, int64_t> X, 4 raft::device_matrix_view<float, int64_t> centroids, 5 std::optional<raft::host_scalar_view<float>> inertia = std::nullopt);
Parameters
| Name | Direction | Type | Description |
|---|---|---|---|
handle | in | const raft::resources& | The raft handle. |
params | in | cuvs::cluster::kmeans::balanced_params const& | Parameters for KMeans model. |
X | in | raft::device_matrix_view<const uint8_t, int64_t> | Training instances to cluster. The data must be in row-major format. [dim = n_samples x n_features] |
centroids | inout | raft::device_matrix_view<float, int64_t> | [out] The generated centroids from the kmeans algorithm are stored at the address pointed by ‘centroids’. [dim = n_clusters x n_features] |
inertia | out | std::optional<raft::host_scalar_view<float>> | Sum of squared distances of samples to their closest cluster center. Default: std::nullopt. |
Returns
void
cluster::kmeans::predict
Predict the closest cluster each sample in X belongs to.
1 void predict(raft::resources const& handle, 2 const kmeans::params& params, 3 raft::device_matrix_view<const float, int> X, 4 std::optional<raft::device_vector_view<const float, int>> sample_weight, 5 raft::device_matrix_view<const float, int> centroids, 6 raft::device_vector_view<int, int> labels, 7 bool normalize_weight, 8 raft::host_scalar_view<float> inertia);
Parameters
| Name | Direction | Type | Description |
|---|---|---|---|
handle | in | raft::resources const& | The raft handle. |
params | in | const kmeans::params& | Parameters for KMeans model. |
X | in | raft::device_matrix_view<const float, int> | New data to predict. [dim = n_samples x n_features] |
sample_weight | in | std::optional<raft::device_vector_view<const float, int>> | Optional weights for each observation in X. [len = n_samples] |
centroids | in | raft::device_matrix_view<const float, int> | Cluster centroids. The data must be in row-major format. [dim = n_clusters x n_features] |
labels | out | raft::device_vector_view<int, int> | Index of the cluster each sample in X belongs to. [len = n_samples] |
normalize_weight | in | bool | True if the weights should be normalized |
inertia | out | raft::host_scalar_view<float> | Sum of squared distances of samples to their closest cluster center. |
Returns
void
Additional overload: cluster::kmeans::predict
Predict the closest cluster each sample in X belongs to.
1 void predict(raft::resources const& handle, 2 const kmeans::params& params, 3 raft::device_matrix_view<const float, int64_t> X, 4 std::optional<raft::device_vector_view<const float, int64_t>> sample_weight, 5 raft::device_matrix_view<const float, int64_t> centroids, 6 raft::device_vector_view<int64_t, int64_t> labels, 7 bool normalize_weight, 8 raft::host_scalar_view<float> inertia);
Parameters
| Name | Direction | Type | Description |
|---|---|---|---|
handle | in | raft::resources const& | The raft handle. |
params | in | const kmeans::params& | Parameters for KMeans model. |
X | in | raft::device_matrix_view<const float, int64_t> | New data to predict. [dim = n_samples x n_features] |
sample_weight | in | std::optional<raft::device_vector_view<const float, int64_t>> | Optional weights for each observation in X. [len = n_samples] |
centroids | in | raft::device_matrix_view<const float, int64_t> | Cluster centroids. The data must be in row-major format. [dim = n_clusters x n_features] |
labels | out | raft::device_vector_view<int64_t, int64_t> | Index of the cluster each sample in X belongs to. [len = n_samples] |
normalize_weight | in | bool | True if the weights should be normalized |
inertia | out | raft::host_scalar_view<float> | Sum of squared distances of samples to their closest cluster center. |
Returns
void
Additional overload: cluster::kmeans::predict
Predict the closest cluster each sample in X belongs to.
1 void predict(raft::resources const& handle, 2 const kmeans::params& params, 3 raft::device_matrix_view<const double, int> X, 4 std::optional<raft::device_vector_view<const double, int>> sample_weight, 5 raft::device_matrix_view<const double, int> centroids, 6 raft::device_vector_view<int, int> labels, 7 bool normalize_weight, 8 raft::host_scalar_view<double> inertia);
Parameters
| Name | Direction | Type | Description |
|---|---|---|---|
handle | in | raft::resources const& | The raft handle. |
params | in | const kmeans::params& | Parameters for KMeans model. |
X | in | raft::device_matrix_view<const double, int> | New data to predict. [dim = n_samples x n_features] |
sample_weight | in | std::optional<raft::device_vector_view<const double, int>> | Optional weights for each observation in X. [len = n_samples] |
centroids | in | raft::device_matrix_view<const double, int> | Cluster centroids. The data must be in row-major format. [dim = n_clusters x n_features] |
labels | out | raft::device_vector_view<int, int> | Index of the cluster each sample in X belongs to. [len = n_samples] |
normalize_weight | in | bool | True if the weights should be normalized |
inertia | out | raft::host_scalar_view<double> | Sum of squared distances of samples to their closest cluster center. |
Returns
void
Additional overload: cluster::kmeans::predict
Predict the closest cluster each sample in X belongs to.
1 void predict(raft::resources const& handle, 2 const kmeans::params& params, 3 raft::device_matrix_view<const double, int64_t> X, 4 std::optional<raft::device_vector_view<const double, int64_t>> sample_weight, 5 raft::device_matrix_view<const double, int64_t> centroids, 6 raft::device_vector_view<int64_t, int64_t> labels, 7 bool normalize_weight, 8 raft::host_scalar_view<double> inertia);
Parameters
| Name | Direction | Type | Description |
|---|---|---|---|
handle | in | raft::resources const& | The raft handle. |
params | in | const kmeans::params& | Parameters for KMeans model. |
X | in | raft::device_matrix_view<const double, int64_t> | New data to predict. [dim = n_samples x n_features] |
sample_weight | in | std::optional<raft::device_vector_view<const double, int64_t>> | Optional weights for each observation in X. [len = n_samples] |
centroids | in | raft::device_matrix_view<const double, int64_t> | Cluster centroids. The data must be in row-major format. [dim = n_clusters x n_features] |
labels | out | raft::device_vector_view<int64_t, int64_t> | Index of the cluster each sample in X belongs to. [len = n_samples] |
normalize_weight | in | bool | True if the weights should be normalized |
inertia | out | raft::host_scalar_view<double> | Sum of squared distances of samples to their closest cluster center. |
Returns
void
Additional overload: cluster::kmeans::predict
Predict the closest cluster each sample in X belongs to.
1 void predict(const raft::resources& handle, 2 cuvs::cluster::kmeans::balanced_params const& params, 3 raft::device_matrix_view<const int8_t, int64_t> X, 4 raft::device_matrix_view<const float, int64_t> centroids, 5 raft::device_vector_view<uint32_t, int64_t> labels);
Parameters
| Name | Direction | Type | Description |
|---|---|---|---|
handle | in | const raft::resources& | The raft handle. |
params | in | cuvs::cluster::kmeans::balanced_params const& | Parameters for KMeans model. |
X | in | raft::device_matrix_view<const int8_t, int64_t> | New data to predict. [dim = n_samples x n_features] |
centroids | in | raft::device_matrix_view<const float, int64_t> | Cluster centroids. The data must be in row-major format. [dim = n_clusters x n_features] |
labels | out | raft::device_vector_view<uint32_t, int64_t> | Index of the cluster each sample in X belongs to. [len = n_samples] |
Returns
void
Additional overload: cluster::kmeans::predict
Predict the closest cluster each sample in X belongs to.
1 void predict(const raft::resources& handle, 2 cuvs::cluster::kmeans::balanced_params const& params, 3 raft::device_matrix_view<const int8_t, int64_t> X, 4 raft::device_matrix_view<const float, int64_t> centroids, 5 raft::device_vector_view<int, int64_t> labels);
Parameters
| Name | Direction | Type | Description |
|---|---|---|---|
handle | in | const raft::resources& | The raft handle. |
params | in | cuvs::cluster::kmeans::balanced_params const& | Parameters for KMeans model. |
X | in | raft::device_matrix_view<const int8_t, int64_t> | New data to predict. [dim = n_samples x n_features] |
centroids | in | raft::device_matrix_view<const float, int64_t> | Cluster centroids. The data must be in row-major format. [dim = n_clusters x n_features] |
labels | out | raft::device_vector_view<int, int64_t> | Index of the cluster each sample in X belongs to. [len = n_samples] |
Returns
void
Additional overload: cluster::kmeans::predict
Predict the closest cluster each sample in X belongs to.
1 void predict(const raft::resources& handle, 2 cuvs::cluster::kmeans::balanced_params const& params, 3 raft::device_matrix_view<const float, int64_t> X, 4 raft::device_matrix_view<const float, int64_t> centroids, 5 raft::device_vector_view<int, int64_t> labels);
Parameters
| Name | Direction | Type | Description |
|---|---|---|---|
handle | in | const raft::resources& | The raft handle. |
params | in | cuvs::cluster::kmeans::balanced_params const& | Parameters for KMeans model. |
X | in | raft::device_matrix_view<const float, int64_t> | New data to predict. [dim = n_samples x n_features] |
centroids | in | raft::device_matrix_view<const float, int64_t> | Cluster centroids. The data must be in row-major format. [dim = n_clusters x n_features] |
labels | out | raft::device_vector_view<int, int64_t> | Index of the cluster each sample in X belongs to. [len = n_samples] |
Returns
void
Additional overload: cluster::kmeans::predict
Predict the closest cluster each sample in X belongs to.
1 void predict(const raft::resources& handle, 2 cuvs::cluster::kmeans::balanced_params const& params, 3 raft::device_matrix_view<const float, int64_t> X, 4 raft::device_matrix_view<const float, int64_t> centroids, 5 raft::device_vector_view<uint32_t, int64_t> labels);
Parameters
| Name | Direction | Type | Description |
|---|---|---|---|
handle | in | const raft::resources& | The raft handle. |
params | in | cuvs::cluster::kmeans::balanced_params const& | Parameters for KMeans model. |
X | in | raft::device_matrix_view<const float, int64_t> | New data to predict. [dim = n_samples x n_features] |
centroids | in | raft::device_matrix_view<const float, int64_t> | Cluster centroids. The data must be in row-major format. [dim = n_clusters x n_features] |
labels | out | raft::device_vector_view<uint32_t, int64_t> | Index of the cluster each sample in X belongs to. [len = n_samples] |
Returns
void
Additional overload: cluster::kmeans::predict
Predict the closest cluster each sample in X belongs to.
1 void predict(const raft::resources& handle, 2 cuvs::cluster::kmeans::balanced_params const& params, 3 raft::device_matrix_view<const half, int64_t> X, 4 raft::device_matrix_view<const float, int64_t> centroids, 5 raft::device_vector_view<uint32_t, int64_t> labels);
Parameters
| Name | Direction | Type | Description |
|---|---|---|---|
handle | in | const raft::resources& | The raft handle. |
params | in | cuvs::cluster::kmeans::balanced_params const& | Parameters for KMeans model. |
X | in | raft::device_matrix_view<const half, int64_t> | New data to predict. [dim = n_samples x n_features] |
centroids | in | raft::device_matrix_view<const float, int64_t> | Cluster centroids. The data must be in row-major format. [dim = n_clusters x n_features] |
labels | out | raft::device_vector_view<uint32_t, int64_t> | Index of the cluster each sample in X belongs to. [len = n_samples] |
Returns
void
Additional overload: cluster::kmeans::predict
Predict the closest cluster each sample in X belongs to.
1 void predict(const raft::resources& handle, 2 cuvs::cluster::kmeans::balanced_params const& params, 3 raft::device_matrix_view<const uint8_t, int64_t> X, 4 raft::device_matrix_view<const float, int64_t> centroids, 5 raft::device_vector_view<uint32_t, int64_t> labels);
Parameters
| Name | Direction | Type | Description |
|---|---|---|---|
handle | in | const raft::resources& | The raft handle. |
params | in | cuvs::cluster::kmeans::balanced_params const& | Parameters for KMeans model. |
X | in | raft::device_matrix_view<const uint8_t, int64_t> | New data to predict. [dim = n_samples x n_features] |
centroids | in | raft::device_matrix_view<const float, int64_t> | Cluster centroids. The data must be in row-major format. [dim = n_clusters x n_features] |
labels | out | raft::device_vector_view<uint32_t, int64_t> | Index of the cluster each sample in X belongs to. [len = n_samples] |
Returns
void
cluster::kmeans::fit_predict
Compute k-means clustering and predicts cluster index for each sample
1 void fit_predict(raft::resources const& handle, 2 const kmeans::params& params, 3 raft::device_matrix_view<const float, int> X, 4 std::optional<raft::device_vector_view<const float, int>> sample_weight, 5 std::optional<raft::device_matrix_view<float, int>> centroids, 6 raft::device_vector_view<int, int> labels, 7 raft::host_scalar_view<float> inertia, 8 raft::host_scalar_view<int> n_iter);
in the input.
Parameters
| Name | Direction | Type | Description |
|---|---|---|---|
handle | in | raft::resources const& | The raft handle. |
params | in | const kmeans::params& | Parameters for KMeans model. |
X | in | raft::device_matrix_view<const float, int> | Training instances to cluster. The data must be in row-major format. [dim = n_samples x n_features] |
sample_weight | in | std::optional<raft::device_vector_view<const float, int>> | Optional weights for each observation in X. [len = n_samples] |
centroids | inout | std::optional<raft::device_matrix_view<float, int>> | Optional [in] When init is InitMethod::Array, use centroids as the initial cluster centers [out] The generated centroids from the kmeans algorithm are stored at the address pointed by ‘centroids’. [dim = n_clusters x n_features] |
labels | out | raft::device_vector_view<int, int> | Index of the cluster each sample in X belongs to. [len = n_samples] |
inertia | out | raft::host_scalar_view<float> | Sum of squared distances of samples to their closest cluster center. |
n_iter | out | raft::host_scalar_view<int> | Number of iterations run. |
Returns
void
Additional overload: cluster::kmeans::fit_predict
Compute k-means clustering and predicts cluster index for each sample
1 void fit_predict(raft::resources const& handle, 2 const kmeans::params& params, 3 raft::device_matrix_view<const float, int64_t> X, 4 std::optional<raft::device_vector_view<const float, int64_t>> sample_weight, 5 std::optional<raft::device_matrix_view<float, int64_t>> centroids, 6 raft::device_vector_view<int64_t, int64_t> labels, 7 raft::host_scalar_view<float> inertia, 8 raft::host_scalar_view<int64_t> n_iter);
in the input.
Parameters
| Name | Direction | Type | Description |
|---|---|---|---|
handle | in | raft::resources const& | The raft handle. |
params | in | const kmeans::params& | Parameters for KMeans model. |
X | in | raft::device_matrix_view<const float, int64_t> | Training instances to cluster. The data must be in row-major format. [dim = n_samples x n_features] |
sample_weight | in | std::optional<raft::device_vector_view<const float, int64_t>> | Optional weights for each observation in X. [len = n_samples] |
centroids | inout | std::optional<raft::device_matrix_view<float, int64_t>> | Optional [in] When init is InitMethod::Array, use centroids as the initial cluster centers [out] The generated centroids from the kmeans algorithm are stored at the address pointed by ‘centroids’. [dim = n_clusters x n_features] |
labels | out | raft::device_vector_view<int64_t, int64_t> | Index of the cluster each sample in X belongs to. [len = n_samples] |
inertia | out | raft::host_scalar_view<float> | Sum of squared distances of samples to their closest cluster center. |
n_iter | out | raft::host_scalar_view<int64_t> | Number of iterations run. |
Returns
void
Additional overload: cluster::kmeans::fit_predict
Compute k-means clustering and predicts cluster index for each sample
1 void fit_predict(raft::resources const& handle, 2 const kmeans::params& params, 3 raft::device_matrix_view<const double, int> X, 4 std::optional<raft::device_vector_view<const double, int>> sample_weight, 5 std::optional<raft::device_matrix_view<double, int>> centroids, 6 raft::device_vector_view<int, int> labels, 7 raft::host_scalar_view<double> inertia, 8 raft::host_scalar_view<int> n_iter);
in the input.
Parameters
| Name | Direction | Type | Description |
|---|---|---|---|
handle | in | raft::resources const& | The raft handle. |
params | in | const kmeans::params& | Parameters for KMeans model. |
X | in | raft::device_matrix_view<const double, int> | Training instances to cluster. The data must be in row-major format. [dim = n_samples x n_features] |
sample_weight | in | std::optional<raft::device_vector_view<const double, int>> | Optional weights for each observation in X. [len = n_samples] |
centroids | inout | std::optional<raft::device_matrix_view<double, int>> | Optional [in] When init is InitMethod::Array, use centroids as the initial cluster centers [out] The generated centroids from the kmeans algorithm are stored at the address pointed by ‘centroids’. [dim = n_clusters x n_features] |
labels | out | raft::device_vector_view<int, int> | Index of the cluster each sample in X belongs to. [len = n_samples] |
inertia | out | raft::host_scalar_view<double> | Sum of squared distances of samples to their closest cluster center. |
n_iter | out | raft::host_scalar_view<int> | Number of iterations run. |
Returns
void
Additional overload: cluster::kmeans::fit_predict
Compute k-means clustering and predicts cluster index for each sample
1 void fit_predict(raft::resources const& handle, 2 const kmeans::params& params, 3 raft::device_matrix_view<const double, int64_t> X, 4 std::optional<raft::device_vector_view<const double, int64_t>> sample_weight, 5 std::optional<raft::device_matrix_view<double, int64_t>> centroids, 6 raft::device_vector_view<int64_t, int64_t> labels, 7 raft::host_scalar_view<double> inertia, 8 raft::host_scalar_view<int64_t> n_iter);
in the input.
Parameters
| Name | Direction | Type | Description |
|---|---|---|---|
handle | in | raft::resources const& | The raft handle. |
params | in | const kmeans::params& | Parameters for KMeans model. |
X | in | raft::device_matrix_view<const double, int64_t> | Training instances to cluster. The data must be in row-major format. [dim = n_samples x n_features] |
sample_weight | in | std::optional<raft::device_vector_view<const double, int64_t>> | Optional weights for each observation in X. [len = n_samples] |
centroids | inout | std::optional<raft::device_matrix_view<double, int64_t>> | Optional [in] When init is InitMethod::Array, use centroids as the initial cluster centers [out] The generated centroids from the kmeans algorithm are stored at the address pointed by ‘centroids’. [dim = n_clusters x n_features] |
labels | out | raft::device_vector_view<int64_t, int64_t> | Index of the cluster each sample in X belongs to. [len = n_samples] |
inertia | out | raft::host_scalar_view<double> | Sum of squared distances of samples to their closest cluster center. |
n_iter | out | raft::host_scalar_view<int64_t> | Number of iterations run. |
Returns
void
Additional overload: cluster::kmeans::fit_predict
Compute balanced k-means clustering and predicts cluster index for each sample
1 void fit_predict(const raft::resources& handle, 2 cuvs::cluster::kmeans::balanced_params const& params, 3 raft::device_matrix_view<const float, int64_t> X, 4 raft::device_matrix_view<float, int64_t> centroids, 5 raft::device_vector_view<uint32_t, int64_t> labels);
in the input.
Parameters
| Name | Direction | Type | Description |
|---|---|---|---|
handle | in | const raft::resources& | The raft handle. |
params | in | cuvs::cluster::kmeans::balanced_params const& | Parameters for KMeans model. |
X | in | raft::device_matrix_view<const float, int64_t> | Training instances to cluster. The data must be in row-major format. [dim = n_samples x n_features] |
centroids | inout | raft::device_matrix_view<float, int64_t> | Optional [in] When init is InitMethod::Array, use centroids as the initial cluster centers [out] The generated centroids from the kmeans algorithm are stored at the address pointed by ‘centroids’. [dim = n_clusters x n_features] |
labels | out | raft::device_vector_view<uint32_t, int64_t> | Index of the cluster each sample in X belongs to. [len = n_samples] |
Returns
void
Additional overload: cluster::kmeans::fit_predict
Compute balanced k-means clustering and predicts cluster index for each sample
1 void fit_predict(const raft::resources& handle, 2 cuvs::cluster::kmeans::balanced_params const& params, 3 raft::device_matrix_view<const int8_t, int64_t> X, 4 raft::device_matrix_view<float, int64_t> centroids, 5 raft::device_vector_view<uint32_t, int64_t> labels);
in the input.
Parameters
| Name | Direction | Type | Description |
|---|---|---|---|
handle | in | const raft::resources& | The raft handle. |
params | in | cuvs::cluster::kmeans::balanced_params const& | Parameters for KMeans model. |
X | in | raft::device_matrix_view<const int8_t, int64_t> | Training instances to cluster. The data must be in row-major format. [dim = n_samples x n_features] |
centroids | inout | raft::device_matrix_view<float, int64_t> | Optional [in] When init is InitMethod::Array, use centroids as the initial cluster centers [out] The generated centroids from the kmeans algorithm are stored at the address pointed by ‘centroids’. [dim = n_clusters x n_features] |
labels | out | raft::device_vector_view<uint32_t, int64_t> | Index of the cluster each sample in X belongs to. [len = n_samples] |
Returns
void
cluster::kmeans::transform
Transform X to a cluster-distance space.
1 void transform(raft::resources const& handle, 2 const kmeans::params& params, 3 raft::device_matrix_view<const float, int> X, 4 raft::device_matrix_view<const float, int> centroids, 5 raft::device_matrix_view<float, int> X_new);
Parameters
| Name | Direction | Type | Description |
|---|---|---|---|
handle | in | raft::resources const& | The raft handle. |
params | in | const kmeans::params& | Parameters for KMeans model. |
X | in | raft::device_matrix_view<const float, int> | Training instances to cluster. The data must be in row-major format [dim = n_samples x n_features] |
centroids | in | raft::device_matrix_view<const float, int> | Cluster centroids. The data must be in row-major format. [dim = n_clusters x n_features] |
X_new | out | raft::device_matrix_view<float, int> | X transformed in the new space. [dim = n_samples x n_features] |
Returns
void
Additional overload: cluster::kmeans::transform
Transform X to a cluster-distance space.
1 void transform(raft::resources const& handle, 2 const kmeans::params& params, 3 raft::device_matrix_view<const double, int> X, 4 raft::device_matrix_view<const double, int> centroids, 5 raft::device_matrix_view<double, int> X_new);
Parameters
| Name | Direction | Type | Description |
|---|---|---|---|
handle | in | raft::resources const& | The raft handle. |
params | in | const kmeans::params& | Parameters for KMeans model. |
X | in | raft::device_matrix_view<const double, int> | Training instances to cluster. The data must be in row-major format [dim = n_samples x n_features] |
centroids | in | raft::device_matrix_view<const double, int> | Cluster centroids. The data must be in row-major format. [dim = n_clusters x n_features] |
X_new | out | raft::device_matrix_view<double, int> | X transformed in the new space. [dim = n_samples x n_features] |
Returns
void
cluster::kmeans::cluster_cost
Compute (optionally weighted) cluster cost
1 void cluster_cost( 2 const raft::resources& handle, 3 raft::device_matrix_view<const float, int> X, 4 raft::device_matrix_view<const float, int> centroids, 5 raft::host_scalar_view<float> cost, 6 std::optional<raft::device_vector_view<const float, int>> sample_weight = std::nullopt);
Parameters
| Name | Direction | Type | Description |
|---|---|---|---|
handle | in | const raft::resources& | The raft handle |
X | in | raft::device_matrix_view<const float, int> | Training instances to cluster. The data must be in row-major format. [dim = n_samples x n_features] |
centroids | in | raft::device_matrix_view<const float, int> | Cluster centroids. The data must be in row-major format. [dim = n_clusters x n_features] |
cost | out | raft::host_scalar_view<float> | Resulting cluster cost |
sample_weight | in | std::optional<raft::device_vector_view<const float, int>> | Optional per-sample weights. [len = n_samples] Default: std::nullopt. |
Returns
void
Additional overload: cluster::kmeans::cluster_cost
Compute cluster cost
1 void cluster_cost( 2 const raft::resources& handle, 3 raft::device_matrix_view<const double, int> X, 4 raft::device_matrix_view<const double, int> centroids, 5 raft::host_scalar_view<double> cost, 6 std::optional<raft::device_vector_view<const double, int>> sample_weight = std::nullopt);
Parameters
| Name | Direction | Type | Description |
|---|---|---|---|
handle | in | const raft::resources& | The raft handle |
X | in | raft::device_matrix_view<const double, int> | Training instances to cluster. The data must be in row-major format. [dim = n_samples x n_features] |
centroids | in | raft::device_matrix_view<const double, int> | Cluster centroids. The data must be in row-major format. [dim = n_clusters x n_features] |
cost | out | raft::host_scalar_view<double> | Resulting cluster cost |
sample_weight | in | std::optional<raft::device_vector_view<const double, int>> | Optional per-sample weights. [len = n_samples] Default: std::nullopt. |
Returns
void
Additional overload: cluster::kmeans::cluster_cost
Compute (optionally weighted) cluster cost
1 void cluster_cost( 2 const raft::resources& handle, 3 raft::device_matrix_view<const float, int64_t> X, 4 raft::device_matrix_view<const float, int64_t> centroids, 5 raft::host_scalar_view<float> cost, 6 std::optional<raft::device_vector_view<const float, int64_t>> sample_weight = std::nullopt);
Parameters
| Name | Direction | Type | Description |
|---|---|---|---|
handle | in | const raft::resources& | The raft handle |
X | in | raft::device_matrix_view<const float, int64_t> | Training instances to cluster. The data must be in row-major format. [dim = n_samples x n_features] |
centroids | in | raft::device_matrix_view<const float, int64_t> | Cluster centroids. The data must be in row-major format. [dim = n_clusters x n_features] |
cost | out | raft::host_scalar_view<float> | Resulting cluster cost |
sample_weight | in | std::optional<raft::device_vector_view<const float, int64_t>> | Optional per-sample weights. [len = n_samples] Default: std::nullopt. |
Returns
void
Additional overload: cluster::kmeans::cluster_cost
Compute (optionally weighted) cluster cost
1 void cluster_cost( 2 const raft::resources& handle, 3 raft::device_matrix_view<const double, int64_t> X, 4 raft::device_matrix_view<const double, int64_t> centroids, 5 raft::host_scalar_view<double> cost, 6 std::optional<raft::device_vector_view<const double, int64_t>> sample_weight = std::nullopt);
Parameters
| Name | Direction | Type | Description |
|---|---|---|---|
handle | in | const raft::resources& | The raft handle |
X | in | raft::device_matrix_view<const double, int64_t> | Training instances to cluster. The data must be in row-major format. [dim = n_samples x n_features] |
centroids | in | raft::device_matrix_view<const double, int64_t> | Cluster centroids. The data must be in row-major format. [dim = n_clusters x n_features] |
cost | out | raft::host_scalar_view<double> | Resulting cluster cost |
sample_weight | in | std::optional<raft::device_vector_view<const double, int64_t>> | Optional per-sample weights. [len = n_samples] Default: std::nullopt. |
Returns
void
k-means API helpers
cluster::kmeans::helpers::find_k
Automatically find the optimal value of k using a binary search.
1 void find_k(raft::resources const& handle, 2 raft::device_matrix_view<const float, int> X, 3 raft::host_scalar_view<int> best_k, 4 raft::host_scalar_view<float> inertia, 5 raft::host_scalar_view<int> n_iter, 6 int kmax, 7 int kmin = 1, 8 int maxiter = 100, 9 float tol = 1e-3);
This method maximizes the Calinski-Harabasz Index while minimizing the per-cluster inertia.
Parameters
| Name | Direction | Type | Description |
|---|---|---|---|
handle | raft::resources const& | raft handle | |
X | raft::device_matrix_view<const float, int> | input observations (shape n_samples, n_dims) | |
best_k | raft::host_scalar_view<int> | best k found from binary search | |
inertia | raft::host_scalar_view<float> | inertia of best k found | |
n_iter | raft::host_scalar_view<int> | number of iterations used to find best k | |
kmax | int | maximum k to try in search | |
kmin | int | minimum k to try in search (should be >= 1) Default: 1. | |
maxiter | int | maximum number of iterations to run Default: 100. | |
tol | float | tolerance for early stopping convergence Default: 1e-3. |
Returns
void