Skip to main content
Top
Published in: Journal of Medical Systems 1/2019

01-01-2019 | Systems-Level Quality Improvement

A Systematic Mapping Study of Data Preparation in Heart Disease Knowledge Discovery

Authors: H. Benhar, A. Idri, J. L. Fernández-Alemán

Published in: Journal of Medical Systems | Issue 1/2019

Login to get access

Abstract

The increasing amount of data produced by various biomedical and healthcare systems has led to a need for methodologies related to knowledge data discovery. Data mining (DM) offers a set of powerful techniques that allow the identification and extraction of relevant information from medical datasets, thus enabling doctors and patients to greatly benefit from DM, particularly in the case of diseases with high mortality and morbidity rates, such as heart disease (HD). Nonetheless, the use of raw medical data implies several challenges, such as missing data, noise, redundancy and high dimensionality, which make the extraction of useful and relevant information difficult and challenging. Intensive research has, therefore, recently begun in order to prepare raw healthcare data before knowledge extraction. In any knowledge data discovery (KDD) process, data preparation is the step prior to DM that deals with data imperfectness in order to improve its quality so as to satisfy the requirements and improve the performances of DM techniques. The objective of this paper is to perform a systematic mapping study (SMS) on data preparation for KDD in cardiology so as to provide an overview of the quantity and type of research carried out in this respect. The SMS consisted of a set of 58 selected papers published in the period January 2000 and December 2017. The selected studies were analyzed according to six criteria: year and channel of publication, preparation task, medical task, DM objective, research type and empirical type. The results show that a high amount of data preparation research was carried out in order to improve the performance of DM-based decision support systems in cardiology. Researchers were mainly interested in the data reduction preparation task and particularly in feature selection. Moreover, the majority of the selected studies focused on classification for the diagnosis of HD. Two main research types were identified in the selected studies: solution proposal and evaluation research, and the most frequently used empirical type was that of historical-based evaluation.
Appendix
Available only for authorised users
Literature
17.
go back to reference Benhar H., Idri A., Fernández-Alemán J.L. (2018) Data preprocessing for decision making in medical informatics: potential and analysis. In: Rocha Á., Adeli H., Reis L., Costanzo S. (eds) Trends and advances in information systems and technologies. WorldCIST'18 2018. Advances in intelligent systems and computing, vol 746. Springer, Cham. Benhar H., Idri A., Fernández-Alemán J.L. (2018) Data preprocessing for decision making in medical informatics: potential and analysis. In: Rocha Á., Adeli H., Reis L., Costanzo S. (eds) Trends and advances in information systems and technologies. WorldCIST'18 2018. Advances in intelligent systems and computing, vol 746. Springer, Cham.
20.
go back to reference Zhang, Y., Kambhampati, C., Davis, D. N., Goode, K., Cleland, J. G. F., A comparative study of missing value imputation with multiclass classification for clinical heart failure data. In: Proc. - 2012 9th Int. Conf. Fuzzy Syst. Knowl. Discov. FSKD 2012, pp. 2840–2844, 2012. https://doi.org/10.1109/FSKD.2012.6233805. Zhang, Y., Kambhampati, C., Davis, D. N., Goode, K., Cleland, J. G. F., A comparative study of missing value imputation with multiclass classification for clinical heart failure data. In: Proc. - 2012 9th Int. Conf. Fuzzy Syst. Knowl. Discov. FSKD 2012, pp. 2840–2844, 2012. https://​doi.​org/​10.​1109/​FSKD.​2012.​6233805.
32.
go back to reference Condori-Fernandez, N., Daneva, M., Sikkel, K., Wieringa, R., Dieste, O., Pastor, O., A Systematic mapping study on empirical evaluation of software requirements specifications techniques. In: 2009 3rd Int. Symp. Empir. Softw. Eng. Meas., pp. 502–505, 2009. https://doi.org/10.1109/ESEM.2009.5314232. Condori-Fernandez, N., Daneva, M., Sikkel, K., Wieringa, R., Dieste, O., Pastor, O., A Systematic mapping study on empirical evaluation of software requirements specifications techniques. In: 2009 3rd Int. Symp. Empir. Softw. Eng. Meas., pp. 502–505, 2009. https://​doi.​org/​10.​1109/​ESEM.​2009.​5314232.
43.
go back to reference Anbarasi, M., Anupriya, E., and Iyengar, N. C. S. N., Enhanced prediction of heart disease with feature subset selection using genetic algorithm. Int. J. Eng. Sci. Technol. 2:5370–5376, 2010. Anbarasi, M., Anupriya, E., and Iyengar, N. C. S. N., Enhanced prediction of heart disease with feature subset selection using genetic algorithm. Int. J. Eng. Sci. Technol. 2:5370–5376, 2010.
44.
go back to reference Peter, T. J., and Somasundaram, K., Study and development of novel feature selection framework for heart disease prediction. IJSRP 2:1–7, 2012. Peter, T. J., and Somasundaram, K., Study and development of novel feature selection framework for heart disease prediction. IJSRP 2:1–7, 2012.
58.
go back to reference Jabbar, M. A., Deekshatulu, B. L., and Chandra, P., Classification of heart disease using artificial neural network and feature subset selection. GJCST 13:5–14, 2013. Jabbar, M. A., Deekshatulu, B. L., and Chandra, P., Classification of heart disease using artificial neural network and feature subset selection. GJCST 13:5–14, 2013.
60.
go back to reference Bhatia, S., Prakash, P., Pillai, G. N., SVM based decision support system for heart disease classification with integer-coded genetic algorithm to select critical features. In: Proc. World Congr. Eng. Comput. Sci., 2008. Bhatia, S., Prakash, P., Pillai, G. N., SVM based decision support system for heart disease classification with integer-coded genetic algorithm to select critical features. In: Proc. World Congr. Eng. Comput. Sci., 2008.
61.
go back to reference Millet-Roig, J., Ventura-Galiano, R., Chorro-Gasco, F. J., Cebrian, A., Support vector machine for arrhythmia discrimination with wavelet transform-based feature selection, in: Comput. Cardiol. 2000. vol. 27 (Cat. 00CH37163), IEEE, pp. 407–410, 2000. https://doi.org/10.1109/CIC.2000.898543. Millet-Roig, J., Ventura-Galiano, R., Chorro-Gasco, F. J., Cebrian, A., Support vector machine for arrhythmia discrimination with wavelet transform-based feature selection, in: Comput. Cardiol. 2000. vol. 27 (Cat. 00CH37163), IEEE, pp. 407–410, 2000. https://​doi.​org/​10.​1109/​CIC.​2000.​898543.​
65.
go back to reference Weston, J., Watkins, C., Support vector machines for multi-class pattern recognition. In ESANN, 1999 Weston, J., Watkins, C., Support vector machines for multi-class pattern recognition. In ESANN, 1999
66.
go back to reference Zhu, X., Zhang, S., Jin, Z., Zhang, Z., and Xu, Z., Missing value estimation for mixed-attribute data sets. IEEE Trans. Knowl. Data Eng. 23(1):110–121, 2011.CrossRef Zhu, X., Zhang, S., Jin, Z., Zhang, Z., and Xu, Z., Missing value estimation for mixed-attribute data sets. IEEE Trans. Knowl. Data Eng. 23(1):110–121, 2011.CrossRef
81.
89.
go back to reference Bowyer, K. W., Mentoring advice on “Conferences versus journals” for CSE Faculty 2012, pp. 1–9, 2012. Bowyer, K. W., Mentoring advice on “Conferences versus journals” for CSE Faculty 2012, pp. 1–9, 2012.
93.
go back to reference Visalakshi, N. K., and Thangavel, K., Impact of normalization in distributed K-means clustering. Int. J. Soft Comput. 4:168–172, 2009. Visalakshi, N. K., and Thangavel, K., Impact of normalization in distributed K-means clustering. Int. J. Soft Comput. 4:168–172, 2009.
97.
go back to reference El Idrissi, T., Idri, A., Bakkoury, Z., Systematic map and review of predictive techniques in diabetes self- management. Int. J. Inf. Manag., In Press, 2018. El Idrissi, T., Idri, A., Bakkoury, Z., Systematic map and review of predictive techniques in diabetes self- management. Int. J. Inf. Manag., In Press, 2018.
Metadata
Title
A Systematic Mapping Study of Data Preparation in Heart Disease Knowledge Discovery
Authors
H. Benhar
A. Idri
J. L. Fernández-Alemán
Publication date
01-01-2019
Publisher
Springer US
Published in
Journal of Medical Systems / Issue 1/2019
Print ISSN: 0148-5598
Electronic ISSN: 1573-689X
DOI
https://doi.org/10.1007/s10916-018-1134-z

Other articles of this Issue 1/2019

Journal of Medical Systems 1/2019 Go to the issue