中文说明:k-均值聚类是一种矢量量化,最初从信号处理,这是流行的数据挖掘中的聚类分析的方法。k-均值聚类分区 n 个观测到 k 集群每个观察值属于最近均值集群目标,作为该群集的一个原型。这会导致 Voronoi 单元格数据空间的划分。问题是计算困难 (np) ;然而,有高效的启发式算法,并普遍采用和快速收敛到局部最优解。这些是通常类似于混合物通过这两种算法的迭代加细方法的高斯分布的期望最大化算法。此外,它们都使用聚类中心来模型数据 ;然而,k-均值聚类倾向找到集群的可比性的空间范围,而期望最大化机制允许群集,有不同的形状。
English Description:
k-means clustering is a method of vector quantization, originally from signal processing, that is popular for cluster analysis in data mining. k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells.The problem is computationally difficult (NP-hard); however, there are efficient heuristic algorithms that are commonly employed and converge quickly to a local optimum. These are usually similar to the expectation-maximization algorithm for mixtures of Gaussian distributions via an iterative refinement approach employed by both algorithms. Additionally, they both use cluster centers to model the data; however, k-means clustering tends to find clusters of comparable spatial extent, while the expectation-maximization mechanism allows clusters to have different shapes.