An Improved K-Means Clustering Algorithm for Data Mining

Bok av Aggarwal Neha

Data clustering is an unsupervised classification method aims at creating groups of objects, or clusters, in such a way that objects in the same cluster are very similar and objects in different clusters are quite distinct. K-means is an iterative algorithm in which the number of clusters must be determined before the execution.In this book an efficient k-means algorithm is proposed. Since, in each iteration, the k-means algorithm computes the distances between data point and all centers, this is computationally very expensive especially for huge data sets. For each data point, we can keep the distance to the nearest cluster. At the next iteration, we compute the distance to the previous nearest cluster. If the new distance is less than or equal to the previous distance, the point stays in its cluster, and there is no need to compute its distances to the other cluster centers. This saves the time required to compute distances to k1 clusters. Experimental results show the accuracy and effectiveness of the proposed method.