Abstract
Clustering results validation is an important topic in the context of pattern recognition. We review approaches and systems in this context. In the first part of this paper we presented clustering validity checking approaches based on internal and external criteria. In the second, current part, we present a review of clustering validity approaches based on relative criteria. Also we discuss the results of an experimental study based on widely known validity indices. Finally the paper illustrates the issues that are under-addressed by the recent approaches and proposes the research directions in the field.
- Michael J. A. Berry, Gordon Linoff. Data Mining Techniques For marketing, Sales and Customer Support. John Willey & Sons, Inc, 1996.]] Google ScholarDigital Library
- Bezdeck, J.C, Ehrlich, R., Full, W.. "FCM:Fuzzy C-Means Algorithm", Computers and Geoscience, 1984.]]Google Scholar
- Dave, R. N.. "Validating fuzzy partitions obtained through c-shells clustering", Pattern Recognition Letters, Vol. 17, pp613-623, 1996.]] Google ScholarDigital Library
- Davies, DL, Bouldin, D.W. "A cluster separation measure". IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 1, No2, 1979.]]Google ScholarDigital Library
- Dunn, J. C.. "Well separated clusters and optimal fuzzy partitions", J. Cybern. Vol.4, pp. 95-104, 1974.]]Google ScholarCross Ref
- Gath I., Geva A.B. "Unsupervised optimal fuzzy clustering", IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 11(7), 1989.]] Google ScholarDigital Library
- Guha, S., Rastogi, R., Shim K. (1998). "CURE: An Efficient Clustering Algorithm for Large Databases", Published in the Proceedings of the ACM SIGMOD Conference.]] Google ScholarDigital Library
- Halkidi, M., Vazirgiannis, M., Batistakis, I.. "Quality scheme assessment in the clustering process", Proceedings of PKDD, Lyon, France, 2000.]] Google ScholarDigital Library
- Halkidi M, Vazirgiannis M., "A data set oriented approach for clustering algorithm selection", Proceedings of PKDD, Freiburg, Germany, 2001]] Google ScholarDigital Library
- M. Halkidi, M. Vazirgiannis, "Clustering Validity Assessment: Finding the optimal partitioning of a data set", to appear in the Proceedings of ICDM, California, USA, November 2001.]] Google ScholarDigital Library
- Krishnapuram, R., Frigui, H., Nasraoui. O. "Quadratic shell clustering algorithms and the detection of second-degree curves", Pattern Recognition Letters, Vol. 14(7), 1993]] Google ScholarDigital Library
- MacQueen, J.B (1967). "Some Methods for Classification and Analysis of Multivariate Observations", In Proceedings of 5th Berkley Symposium on Mathematical Statistics and Probability, Volume I: Statistics, pp281-297.]]Google Scholar
- Milligan, G.W. and Cooper, M.C.. "An Examination of Procedures for Determining the Number of Clusters in a Data Set", Psychometrika, Vol.50, pp 159-179, 1985.]]Google ScholarCross Ref
- Pal, N.R., Biswas, J.. "Cluster Validation using graph theoretic concepts". Pattern Recognition, Vol. 30(6), 1997.]]Google Scholar
- Rezaee, R, Lelieveldt, B.P.F., Reiber, J.H.C. "A new cluster validity index for the fuzzy c-mean", Pattern Recognition Letters, 19, pp. 237-246, 1998.]] Google ScholarDigital Library
- Sharma, S.C.. Applied Multivariate Techniques. John Willwy & Sons, 1996.]] Google ScholarDigital Library
- Smyth, P. "Clustering using Monte Carlo Cross-Validation". Proceedings of KDD Conference, 1996.]]Google Scholar
- Theodoridis, S., Koutroubas, K.. Pattern recognition, Academic Press, 1999.]] Google ScholarDigital Library
- Xie, X. L, Beni, G.. "A Validity measure for Fuzzy Clustering", IEEE Transactions on Pattern Analysis and machine Intelligence, Vol.13, No4, 1991.]] Google ScholarDigital Library
Recommendations
Cluster validity methods: part I
Clustering is an unsupervised process since there are no predefined classes and no examples that would indicate grouping properties in the data set. The majority of the clustering algorithms behave differently depending on the features of the data set ...
An overview of clustering methods
Data clustering is the process of identifying natural groupings or clusters within multidimensional data based on some similarity measure. Clustering is a fundamental process in many different disciplines. Hence, researchers from different fields are ...
K-means clustering versus validation measures: a data-distribution perspective
K-means is a well-known and widely used partitional clustering method. While there are considerable research efforts to characterize the key features of the K-means clustering algorithm, further investigation is needed to understand how data ...
Comments