是否可以使用scikit-learn K-Means聚类来指定自己的距离函数?

当前回答

Spectral Python的k-means允许使用L1 (Manhattan)距离。

2013-03-31 00:14:34

其他回答

The Affinity propagation algorithm from the sklearn library allows you to pass the similarity matrix instead of the samples. So, you can use your metric to compute the similarity matrix (not the dissimilarity matrix) and pass it to the function by setting the "affinity" term to "precomputed".https://scikit-learn.org/stable/modules/generated/sklearn.cluster.AffinityPropagation.html#sklearn.cluster.AffinityPropagation.fit In terms of the K-Mean, I think it is also possible but I have not tried it. However, as the other answers stated, finding the mean using a different metric will be the issue. Instead, you can use PAM (K-Medoids) algorthim as it calculates the change in Total Deviation (TD), thus it does not rely on the distance metric. https://python-kmedoids.readthedocs.io/en/latest/#fasterpam

2022-12-14 07:08:24

Spectral Python的k-means允许使用L1 (Manhattan)距离。

2013-03-31 00:14:34

Sklearn Kmeans使用欧几里得距离。它没有度量参数。也就是说，如果你在聚类时间序列，你可以使用tslearn python包，当你可以指定一个度量(dtw, softdtw，欧几里得)。

2019-05-27 10:45:28

是的，在当前稳定版本的sklearn (scikit-learn 1.1.3)中，您可以轻松地使用自己的距离度量。你所要做的就是创建一个继承自sklearn.cluster.KMeans的类，并覆盖它的_transform方法。

下面的例子是IOU与Yolov2论文的距离。

import sklearn.cluster
import numpy as np

def anchor_iou(box_dims, centroid_box_dims):
    box_w, box_h = box_dims[..., 0], box_dims[..., 1]
    centroid_w, centroid_h = centroid_box_dims[..., 0], centroid_box_dims[..., 1]
    inter_w = np.minimum(box_w[..., np.newaxis], centroid_w[np.newaxis, ...])
    inter_h = np.minimum(box_h[..., np.newaxis], centroid_h[np.newaxis, ...])
    inter_area = inter_w * inter_h
    centroid_area = centroid_w * centroid_h
    box_area = box_w * box_h
    return inter_area / (
        centroid_area[np.newaxis, ...] + box_area[..., np.newaxis] - inter_area
    )

class IOUKMeans(sklearn.cluster.KMeans):
    def __init__(
        self,
        n_clusters=8,
        *,
        init="k-means++",
        n_init=10,
        max_iter=300,
        tol=1e-4,
        verbose=0,
        random_state=None,
        copy_x=True,
        algorithm="lloyd",
    ):
        super().__init__(
            n_clusters=n_clusters,
            init=init,
            n_init=n_init,
            max_iter=max_iter,
            tol=tol,
            verbose=verbose,
            random_state=random_state,
            copy_x=copy_x,
            algorithm=algorithm
        )

    def _transform(self, X):
        return anchor_iou(X, self.cluster_centers_)

rng = np.random.default_rng(12345)
num_boxes = 10
bboxes = rng.integers(low=0, high=100, size=(num_boxes, 2))

kmeans = IOUKMeans(num_clusters).fit(bboxes)

2022-10-28 08:38:19

不幸的是没有:scikit-learn目前实现的k-means只使用欧几里得距离。

将k-means扩展到其他距离并不是一件简单的事情，denis上面的回答并不是对其他度量实现k-means的正确方法。

2011-04-03 17:17:02

是否可以使用scikit-learn K-Means聚类来指定自己的距离函数?

推荐文章

最新文章

标签