是否可以使用scikit-learn K-Means聚类来指定自己的距离函数?
当前回答
Spectral Python的k-means允许使用L1 (Manhattan)距离。
其他回答
The Affinity propagation algorithm from the sklearn library allows you to pass the similarity matrix instead of the samples. So, you can use your metric to compute the similarity matrix (not the dissimilarity matrix) and pass it to the function by setting the "affinity" term to "precomputed".https://scikit-learn.org/stable/modules/generated/sklearn.cluster.AffinityPropagation.html#sklearn.cluster.AffinityPropagation.fit In terms of the K-Mean, I think it is also possible but I have not tried it. However, as the other answers stated, finding the mean using a different metric will be the issue. Instead, you can use PAM (K-Medoids) algorthim as it calculates the change in Total Deviation (TD), thus it does not rely on the distance metric. https://python-kmedoids.readthedocs.io/en/latest/#fasterpam
Spectral Python的k-means允许使用L1 (Manhattan)距离。
Sklearn Kmeans使用欧几里得距离。它没有度量参数。也就是说,如果你在聚类时间序列,你可以使用tslearn python包,当你可以指定一个度量(dtw, softdtw,欧几里得)。
是的,在当前稳定版本的sklearn (scikit-learn 1.1.3)中,您可以轻松地使用自己的距离度量。你所要做的就是创建一个继承自sklearn.cluster.KMeans的类,并覆盖它的_transform方法。
下面的例子是IOU与Yolov2论文的距离。
import sklearn.cluster
import numpy as np
def anchor_iou(box_dims, centroid_box_dims):
box_w, box_h = box_dims[..., 0], box_dims[..., 1]
centroid_w, centroid_h = centroid_box_dims[..., 0], centroid_box_dims[..., 1]
inter_w = np.minimum(box_w[..., np.newaxis], centroid_w[np.newaxis, ...])
inter_h = np.minimum(box_h[..., np.newaxis], centroid_h[np.newaxis, ...])
inter_area = inter_w * inter_h
centroid_area = centroid_w * centroid_h
box_area = box_w * box_h
return inter_area / (
centroid_area[np.newaxis, ...] + box_area[..., np.newaxis] - inter_area
)
class IOUKMeans(sklearn.cluster.KMeans):
def __init__(
self,
n_clusters=8,
*,
init="k-means++",
n_init=10,
max_iter=300,
tol=1e-4,
verbose=0,
random_state=None,
copy_x=True,
algorithm="lloyd",
):
super().__init__(
n_clusters=n_clusters,
init=init,
n_init=n_init,
max_iter=max_iter,
tol=tol,
verbose=verbose,
random_state=random_state,
copy_x=copy_x,
algorithm=algorithm
)
def _transform(self, X):
return anchor_iou(X, self.cluster_centers_)
rng = np.random.default_rng(12345)
num_boxes = 10
bboxes = rng.integers(low=0, high=100, size=(num_boxes, 2))
kmeans = IOUKMeans(num_clusters).fit(bboxes)
不幸的是没有:scikit-learn目前实现的k-means只使用欧几里得距离。
将k-means扩展到其他距离并不是一件简单的事情,denis上面的回答并不是对其他度量实现k-means的正确方法。
推荐文章
- 证书验证失败:无法获得本地颁发者证书
- 当使用pip3安装包时,“Python中的ssl模块不可用”
- 无法切换Python与pyenv
- Python if not == vs if !=
- 如何从scikit-learn决策树中提取决策规则?
- 为什么在Mac OS X v10.9 (Mavericks)的终端中apt-get功能不起作用?
- 将旋转的xtick标签与各自的xtick对齐
- 为什么元组可以包含可变项?
- 如何合并字典的字典?
- 如何创建类属性?
- 数据挖掘中分类和聚类的区别?
- 不区分大小写的“in”
- 在Python中获取迭代器中的元素个数
- 解析日期字符串并更改格式
- 使用try和。Python中的if