数据挖掘中分类和聚类的区别?

There are two definitions in data mining "Supervised" and "Unsupervised". When someone tells the computer, algorithm, code, ... that this thing is like an apple and that thing is like an orange, this is supervised learning and using supervised learning (like tags for each sample in a data set) for classifying the data, you'll get classification. But on the other hand if you let the computer find out what is what and differentiate between features of the given data set, in fact learning unsupervised, for classifying the data set this would be called clustering. In this case data that are fed to the algorithm don't have tags and the algorithm should find out different classes.

2017-02-27 21:19:44

分类 —预测类别标签 -根据训练集和类标签属性中的值(类标签)对数据进行分类(构造模型) —使用该模型对新数据进行分类

集群:数据对象的集合 —同一集群内彼此相似 —与其他集群中的对象不同

2012-01-11 14:15:21

通过聚类，可以用所需的属性(如数量、形状和提取的聚类的其他属性)对数据进行分组。而在分类中，组的数量和形状是固定的。大多数聚类算法都给出了聚类个数作为参数。然而，有一些方法可以找出合适的集群数量。

2017-09-02 05:34:27

+分类: 给你一些新的数据，你必须为它们设置新的标签。

例如，一家公司希望对其潜在客户进行分类。当一个新客户来的时候，他们必须确定这个客户是否会购买他们的产品。

+集群: 你得到了一组历史交易记录，记录了谁买了什么。

通过使用聚类技术，您可以区分客户的细分。

2011-11-19 07:40:32

首先，像这里的许多回答一样:分类是有监督的学习，聚类是无监督的。这意味着:

Classification needs labeled data so the classifiers can be trained on this data, and after that start classifying new unseen data based on what he knows. Unsupervised learning like clustering does not uses labeled data, and what it actually does is to discover intrinsic structures in the data like groups. Another difference between both techniques (related to the previous one), is the fact that classification is a form of discrete regression problem where the output is a categorical dependent variable. Whereas clustering's output yields a set of subsets called groups. The way to evaluate these two models is also different for the same reason: in classification you often have to check for the precision and recall, things like overfitting and underfitting, etc. Those things will tell you how good is the model. But in clustering you usually need the vision of and expert to interpret what you find, because you don't know what type of structure you have (type of group or cluster). That's why clustering belongs to exploratory data analysis. Finally, i would say that applications are the main difference between both. Classification as the word says, is used to discriminate instances that belong to a class or another, for example a man or a woman, a cat or a dog, etc. Clustering is often used in the diagnosis of medical illness, discovery of patterns, etc.

2018-11-29 13:43:37

There are two definitions in data mining "Supervised" and "Unsupervised". When someone tells the computer, algorithm, code, ... that this thing is like an apple and that thing is like an orange, this is supervised learning and using supervised learning (like tags for each sample in a data set) for classifying the data, you'll get classification. But on the other hand if you let the computer find out what is what and differentiate between features of the given data set, in fact learning unsupervised, for classifying the data set this would be called clustering. In this case data that are fed to the algorithm don't have tags and the algorithm should find out different classes.

2017-02-27 21:19:44

数据挖掘中分类和聚类的区别?

推荐文章

最新文章

标签