Skip to main content
Top
Published in: Journal of Medical Systems 5/2014

01-05-2014 | Transactional Processing Systems

A New Data Preparation Method Based on Clustering Algorithms for Diagnosis Systems of Heart and Diabetes Diseases

Authors: Nihat Yilmaz, Onur Inan, Mustafa Serter Uzer

Published in: Journal of Medical Systems | Issue 5/2014

Login to get access

Abstract

The most important factors that prevent pattern recognition from functioning rapidly and effectively are the noisy and inconsistent data in databases. This article presents a new data preparation method based on clustering algorithms for diagnosis of heart and diabetes diseases. In this method, a new modified K-means Algorithm is used for clustering based data preparation system for the elimination of noisy and inconsistent data and Support Vector Machines is used for classification. This newly developed approach was tested in the diagnosis of heart diseases and diabetes, which are prevalent within society and figure among the leading causes of death. The data sets used in the diagnosis of these diseases are the Statlog (Heart), the SPECT images and the Pima Indians Diabetes data sets obtained from the UCI database. The proposed system achieved 97.87 %, 98.18 %, 96.71 % classification success rates from these data sets. Classification accuracies for these data sets were obtained through using 10-fold cross-validation method. According to the results, the proposed method of performance is highly successful compared to other results attained, and seems very promising for pattern recognition applications.
Literature
1.
go back to reference Myatt. G. J., Making sense of data a practical guide to exploratory data analysis and data mining. John Wiley & Sons, 2007. Myatt. G. J., Making sense of data a practical guide to exploratory data analysis and data mining. John Wiley & Sons, 2007.
2.
go back to reference Han, J., and Kamber, M., Data Mining Concepts and Techniques, (2nd ed.). Morgan Kauffmann Publishers, 2006. Han, J., and Kamber, M., Data Mining Concepts and Techniques, (2nd ed.). Morgan Kauffmann Publishers, 2006.
5.
go back to reference Tang, W., and Khoshgoftaar, T. M., Noise identification with the k-means algorithm. Ictai 2004: 16th IEEE Internationalconference on Tools with Artificial Intelligence, Proceedings:373–378, 2004 Tang, W., and Khoshgoftaar, T. M., Noise identification with the k-means algorithm. Ictai 2004: 16th IEEE Internationalconference on Tools with Artificial Intelligence, Proceedings:373–378, 2004
6.
go back to reference Zhang, B., Li, S. S., Wu, C. S., Gao, L. R., Zhang, W. J., and Peng, M., A neighbourhood-constrained k-means approach to classify very high spatial resolution hyperspectral imagery. Remote Sens Lett 4(2):161–170, 2013. doi:10.1080/2150704x.2012.713139.CrossRef Zhang, B., Li, S. S., Wu, C. S., Gao, L. R., Zhang, W. J., and Peng, M., A neighbourhood-constrained k-means approach to classify very high spatial resolution hyperspectral imagery. Remote Sens Lett 4(2):161–170, 2013. doi:10.​1080/​2150704x.​2012.​713139.CrossRef
9.
go back to reference Zaman, S., and Karray, F., Features Selection using Fuzzy ESVDF for Data Dimensionality Reduction. Int. Conf. Comput. Eng. Technol. I:81–87, 2009. doi:10.1109/Iccet.2009.36. Zaman, S., and Karray, F., Features Selection using Fuzzy ESVDF for Data Dimensionality Reduction. Int. Conf. Comput. Eng. Technol. I:81–87, 2009. doi:10.​1109/​Iccet.​2009.​36.
11.
go back to reference Duch, W., Adamczak, R., and Grabczewski, K., A new methodology of extraction, optimization and application of crisp and fuzzy logical rules. IEEE Trans. Neural Networks. 12(2):277–306, 2001.CrossRef Duch, W., Adamczak, R., and Grabczewski, K., A new methodology of extraction, optimization and application of crisp and fuzzy logical rules. IEEE Trans. Neural Networks. 12(2):277–306, 2001.CrossRef
12.
go back to reference Sahan, S., Polat, K., Kodaz, H., and Gunes, S., The medical applications of attribute weighted artificial immune system (AWAIS): Diagnosis of Heart and Diabetes Diseases. Artif. Immune Syst., Proc. 3627:456–468, 2005.CrossRef Sahan, S., Polat, K., Kodaz, H., and Gunes, S., The medical applications of attribute weighted artificial immune system (AWAIS): Diagnosis of Heart and Diabetes Diseases. Artif. Immune Syst., Proc. 3627:456–468, 2005.CrossRef
15.
go back to reference Ozsen, S., Gunes, S., Kara, S., and Latifoglu, F., Use of kernel functions in artificial immune systems for the nonlinear classification problems. IEEE T. Inf. Technol. B. 13(4):621–628, 2009. doi:10.1109/Titb.2009.2019637.CrossRef Ozsen, S., Gunes, S., Kara, S., and Latifoglu, F., Use of kernel functions in artificial immune systems for the nonlinear classification problems. IEEE T. Inf. Technol. B. 13(4):621–628, 2009. doi:10.​1109/​Titb.​2009.​2019637.CrossRef
16.
go back to reference Sub bulakshmi, C. V., Deepa, S. N., and Malathi, N., Extreme learning machine for two category data classification. Paper presented at the IEEE International Conference on Advanced Communication Control and Computing Technologies (ICACCCT), 2012 Sub bulakshmi, C. V., Deepa, S. N., and Malathi, N., Extreme learning machine for two category data classification. Paper presented at the IEEE International Conference on Advanced Communication Control and Computing Technologies (ICACCCT), 2012
17.
go back to reference Karabulut, E. M., and Ibrikci, T., Effective Diagnosis of Coronary Artery Disease Using The Rotation Forest Ensemble Method. J. Med. Syst. 36(5):3011–3018, 2012.CrossRef Karabulut, E. M., and Ibrikci, T., Effective Diagnosis of Coronary Artery Disease Using The Rotation Forest Ensemble Method. J. Med. Syst. 36(5):3011–3018, 2012.CrossRef
20.
go back to reference Polat, K., and Gunes, S., An expert system approach based on principal component analysis and adaptive neuro-fuzzy inference system to diagnosis of diabetes disease. Digit. Signal Proc. 17(4):702–710, 2007. doi:10.1016/j.dsp.2006.09.005.CrossRef Polat, K., and Gunes, S., An expert system approach based on principal component analysis and adaptive neuro-fuzzy inference system to diagnosis of diabetes disease. Digit. Signal Proc. 17(4):702–710, 2007. doi:10.​1016/​j.​dsp.​2006.​09.​005.CrossRef
21.
go back to reference Polat, K., Gunes, S., and Arslan, A., A cascade learning system for classification of diabetes disease: Generalized discriminant analysis and least square support vector machine. Expert Syst. Appl. 34(1):482–487, 2008. doi:10.1016/j.eswa.2006.09.012.CrossRef Polat, K., Gunes, S., and Arslan, A., A cascade learning system for classification of diabetes disease: Generalized discriminant analysis and least square support vector machine. Expert Syst. Appl. 34(1):482–487, 2008. doi:10.​1016/​j.​eswa.​2006.​09.​012.CrossRef
22.
go back to reference Chikh, M. A., Saidi, M., and Settouti, N., Diagnosis of Diabetes Diseases Using an Artificial Immune Recognition System2 (AIRS2) with Fuzzy K-nearest Neighbor. J. Med. Syst. 36(5):2721–2729, 2012.CrossRef Chikh, M. A., Saidi, M., and Settouti, N., Diagnosis of Diabetes Diseases Using an Artificial Immune Recognition System2 (AIRS2) with Fuzzy K-nearest Neighbor. J. Med. Syst. 36(5):2721–2729, 2012.CrossRef
23.
go back to reference Ahmad, F., Isa, N. A. M., Hussain, Z., and Osman, M. K., Intelligent medical disease diagnosis using improved hybrid genetic algorithm - multilayer perceptron network. J. Med. Syst. 37(2), 2013 Ahmad, F., Isa, N. A. M., Hussain, Z., and Osman, M. K., Intelligent medical disease diagnosis using improved hybrid genetic algorithm - multilayer perceptron network. J. Med. Syst. 37(2), 2013
24.
go back to reference Ozcift, A., SVM feature selection based rotation forest ensemble classifiers to improve computer-aided diagnosis of parkinson disease. J. Med. Syst. 36(4):2141–2147, 2012.CrossRef Ozcift, A., SVM feature selection based rotation forest ensemble classifiers to improve computer-aided diagnosis of parkinson disease. J. Med. Syst. 36(4):2141–2147, 2012.CrossRef
25.
go back to reference MacQueen, J. B., Some methods for classification and analysis of multivariate observations. Paper presented at the In Proceedings of 5th Berkeley symposium on mathematical statistics and probability, California, 1967 MacQueen, J. B., Some methods for classification and analysis of multivariate observations. Paper presented at the In Proceedings of 5th Berkeley symposium on mathematical statistics and probability, California, 1967
28.
go back to reference Cortes, C., and Vapnik, V., Support-Vector Networks. Mach Learn. 20(3):273–297, 1995.MATH Cortes, C., and Vapnik, V., Support-Vector Networks. Mach Learn. 20(3):273–297, 1995.MATH
29.
go back to reference Stehman, S. V., Selecting and interpreting measures of thematic classification accuracy. Remote Sens. Environ. 62(1):77–89, 1997.CrossRef Stehman, S. V., Selecting and interpreting measures of thematic classification accuracy. Remote Sens. Environ. 62(1):77–89, 1997.CrossRef
32.
go back to reference Francois, D., Rossi, F., Wertz, V., and Verleysen, M., Resampling methods for parameter-free and robust feature selection with mutual information. Neurocomputing. 70:1276–1288, 2007.CrossRef Francois, D., Rossi, F., Wertz, V., and Verleysen, M., Resampling methods for parameter-free and robust feature selection with mutual information. Neurocomputing. 70:1276–1288, 2007.CrossRef
33.
go back to reference Diamantidis, N. A., Karlis, D., and Giakoumakis, E. A., Unsupervised stratification of cross-validation for accuracy estimation. Artif. Intell. 116:1–16, 2000.CrossRefMATHMathSciNet Diamantidis, N. A., Karlis, D., and Giakoumakis, E. A., Unsupervised stratification of cross-validation for accuracy estimation. Artif. Intell. 116:1–16, 2000.CrossRefMATHMathSciNet
34.
go back to reference Breiman, L., Friedman, J., Olshen, R., Stone, C., Classification and regression trees. Wadsworth & Boks/Cole Advanced Boks & Software, 1984 Breiman, L., Friedman, J., Olshen, R., Stone, C., Classification and regression trees. Wadsworth & Boks/Cole Advanced Boks & Software, 1984
35.
go back to reference Kohavi, R., A study of cross validation and bootstrap for accuracy estimation and model selection. Paper presented at the The Fourteenth International Joint Conference on Artificial Intelligence, San Francisco, 1995 Kohavi, R., A study of cross validation and bootstrap for accuracy estimation and model selection. Paper presented at the The Fourteenth International Joint Conference on Artificial Intelligence, San Francisco, 1995
36.
go back to reference Yao, X., and Liu, Y., A new evolutionary system for evolving artificial neural networks. IEEE Trans. Neural Networks 8(3):694–713, 1997.CrossRefMathSciNet Yao, X., and Liu, Y., A new evolutionary system for evolving artificial neural networks. IEEE Trans. Neural Networks 8(3):694–713, 1997.CrossRefMathSciNet
37.
go back to reference Polat, K., Sahan, S., and Gunes, S., Automatic detection of heart disease using an artificial immune recognition system (AIRS) with fuzzy resource allocation mechanism and k-nn (nearest neighbour) based weighting preprocessing. Expert Syst. Appl. 32(2):625–631, 2007. doi:10.1016/j.eswa.2006.01.027.CrossRef Polat, K., Sahan, S., and Gunes, S., Automatic detection of heart disease using an artificial immune recognition system (AIRS) with fuzzy resource allocation mechanism and k-nn (nearest neighbour) based weighting preprocessing. Expert Syst. Appl. 32(2):625–631, 2007. doi:10.​1016/​j.​eswa.​2006.​01.​027.CrossRef
38.
go back to reference Blake, C. L., M.C.J. (1998) UCI repository of machine learning databases. Blake, C. L., M.C.J. (1998) UCI repository of machine learning databases.
39.
Metadata
Title
A New Data Preparation Method Based on Clustering Algorithms for Diagnosis Systems of Heart and Diabetes Diseases
Authors
Nihat Yilmaz
Onur Inan
Mustafa Serter Uzer
Publication date
01-05-2014
Publisher
Springer US
Published in
Journal of Medical Systems / Issue 5/2014
Print ISSN: 0148-5598
Electronic ISSN: 1573-689X
DOI
https://doi.org/10.1007/s10916-014-0048-7

Other articles of this Issue 5/2014

Journal of Medical Systems 5/2014 Go to the issue