Skip to main content
Top
Published in: Journal of Medical Systems 4/2012

01-08-2012 | ORIGINAL PAPER

Data Mining in Healthcare and Biomedicine: A Survey of the Literature

Authors: Illhoi Yoo, Patricia Alafaireet, Miroslav Marinov, Keila Pena-Hernandez, Rajitha Gopidi, Jia-Fu Chang, Lei Hua

Published in: Journal of Medical Systems | Issue 4/2012

Login to get access

Abstract

As a new concept that emerged in the middle of 1990’s, data mining can help researchers gain both novel and deep insights and can facilitate unprecedented understanding of large biomedical datasets. Data mining can uncover new biomedical and healthcare knowledge for clinical and administrative decision making as well as generate scientific hypotheses from large experimental data, clinical databases, and/or biomedical literature. This review first introduces data mining in general (e.g., the background, definition, and process of data mining), discusses the major differences between statistics and data mining and then speaks to the uniqueness of data mining in the biomedical and healthcare fields. A brief summarization of various data mining algorithms used for classification, clustering, and association as well as their respective advantages and drawbacks is also presented. Suggested guidelines on how to use data mining algorithms in each area of classification, clustering, and association are offered along with three examples of how data mining has been used in the healthcare industry. Given the successful application of data mining by health related organizations that has helped to predict health insurance fraud and under-diagnosed patients, and identify and classify at-risk people in terms of health with the goal of reducing healthcare cost, we introduce how data mining technologies (in each area of classification, clustering, and association) have been used for a multitude of purposes, including research in the biomedical and healthcare fields. A discussion of the technologies available to enable the prediction of healthcare costs (including length of hospital stay), disease diagnosis and prognosis, and the discovery of hidden biomedical and healthcare patterns from related databases is offered along with a discussion of the use of data mining to discover such relationships as those between health conditions and a disease, relationships among diseases, and relationships among drugs. The article concludes with a discussion of the problems that hamper the clinical use of data mining by health professionals.
Footnotes
1
MeSH is National Library of Medicine (NLM)’s controlled vocabulary used for indexing MEDLINE articles.
 
2
For example, if it takes for a hierarchical algorithm 60 s to cluster 1000 objects (records), to cluster 3000 objects it takes 1620 s (=(3000/1000)3*60) (if there is enough system memory).
 
3
Some classification algorithms can mine only either numeric data or categorical data.
 
4
Clustering accuracies can be measured only if class (i.e., a dependent variable) is available.
 
Literature
1.
go back to reference The Technology Review Ten, MIT Technology Review (January/February 2001). The Technology Review Ten, MIT Technology Review (January/February 2001).
2.
go back to reference Larose, D. T., Discovering knowledge in data: an introduction to data mining. Wiley, 2004. Larose, D. T., Discovering knowledge in data: an introduction to data mining. Wiley, 2004.
3.
go back to reference Hand, D., Mannila, H., Smyth, P., Principles of data mining. MIT, 2001. Hand, D., Mannila, H., Smyth, P., Principles of data mining. MIT, 2001.
5.
go back to reference Richards, G., Rayward-Smith, V. J., Sönksen, P. H., Carey, S., and Weng, C., Data mining for indicators of early mortality in a database of clinical records. Artif. Intell. Med. 22:215–231, 2001.CrossRef Richards, G., Rayward-Smith, V. J., Sönksen, P. H., Carey, S., and Weng, C., Data mining for indicators of early mortality in a database of clinical records. Artif. Intell. Med. 22:215–231, 2001.CrossRef
6.
go back to reference Fayyad, U., Piatetsky-Shapiro, G., and Smyth, P., The KDD process of extracting useful knowledge from volumes of data. Commun. ACM 39(11):27–34, 1996.CrossRef Fayyad, U., Piatetsky-Shapiro, G., and Smyth, P., The KDD process of extracting useful knowledge from volumes of data. Commun. ACM 39(11):27–34, 1996.CrossRef
7.
go back to reference Berger, A., and Berger, C., Data mining as a tool for research and knowledge development in nursing. Comput. Inform. Nurs. 22(3):123–131, 2004.CrossRef Berger, A., and Berger, C., Data mining as a tool for research and knowledge development in nursing. Comput. Inform. Nurs. 22(3):123–131, 2004.CrossRef
8.
go back to reference Shearer, C., The CRISP-DM model: the new blueprint for data mining. J Data Warehous 5(4):13–22, 2000. Shearer, C., The CRISP-DM model: the new blueprint for data mining. J Data Warehous 5(4):13–22, 2000.
9.
go back to reference Fayyad, U., Piatetsky-Shapiro, G., and Smyth, P., From data mining to knowledge discovery in databases. Commun. ACM 39(11):24–26, 1996.CrossRef Fayyad, U., Piatetsky-Shapiro, G., and Smyth, P., From data mining to knowledge discovery in databases. Commun. ACM 39(11):24–26, 1996.CrossRef
10.
go back to reference Han, J., Kamber, M., Data mining: concepts and techniques. 2nd ed. The Morgan Kaufmann Series, 2006. Han, J., Kamber, M., Data mining: concepts and techniques. 2nd ed. The Morgan Kaufmann Series, 2006.
11.
go back to reference Silver, M., Sakara, T., Su, H. C., Herman, C., Dolins, S. B., and O’shea, M. J., Case study: how to apply data mining techniques in a healthcare data warehouse. J. Healthc. Inf. Manage. 15(2):155–164, 2001. Silver, M., Sakara, T., Su, H. C., Herman, C., Dolins, S. B., and O’shea, M. J., Case study: how to apply data mining techniques in a healthcare data warehouse. J. Healthc. Inf. Manage. 15(2):155–164, 2001.
12.
go back to reference Harper, P. R., A review and comparison of classification algorithms for medical decision making. Health Policy 71:315–331, 2005.CrossRef Harper, P. R., A review and comparison of classification algorithms for medical decision making. Health Policy 71:315–331, 2005.CrossRef
13.
go back to reference Sierra, B., and Larranaga, P., Predicting survival in malignant skin melanoma using Bayesian networks automatically induced by genetic algorithms. An empirical comparison between different approaches. Artif. Intell. Med. 14:215–230, 1998.CrossRef Sierra, B., and Larranaga, P., Predicting survival in malignant skin melanoma using Bayesian networks automatically induced by genetic algorithms. An empirical comparison between different approaches. Artif. Intell. Med. 14:215–230, 1998.CrossRef
14.
go back to reference Eastwood, E. A., Magaziner, J., Wang, J., Silberzweig, S. B., Hannan, E. L., Strauss, E., et al., Patients with hip fracture: subgroups and their outcomes. J. Am. Geriatr. Soc. 50:1240–1249, 2002.CrossRef Eastwood, E. A., Magaziner, J., Wang, J., Silberzweig, S. B., Hannan, E. L., Strauss, E., et al., Patients with hip fracture: subgroups and their outcomes. J. Am. Geriatr. Soc. 50:1240–1249, 2002.CrossRef
15.
go back to reference Stel, V. S., Pluijm, S. M., Deeg, D. J., Smit, J. H., Bouter, L. M., and Lips, P., A classification tree for predicting recurrent falling in community-dwelling older persons. J. Am. Geriatr. Soc. 51:1356–1364, 2003.CrossRef Stel, V. S., Pluijm, S. M., Deeg, D. J., Smit, J. H., Bouter, L. M., and Lips, P., A classification tree for predicting recurrent falling in community-dwelling older persons. J. Am. Geriatr. Soc. 51:1356–1364, 2003.CrossRef
16.
go back to reference Yu, J. S., Ongarello, S., Fiedler, R., Chen, X. W., Toffolo, G., Cobelli, C., and Trajanoski, Z., Ovarian cancer identification based on dimensionality reduction for high-throughput mass spectrometry data. Bioinformatics 21:2200–2209, 2005.CrossRef Yu, J. S., Ongarello, S., Fiedler, R., Chen, X. W., Toffolo, G., Cobelli, C., and Trajanoski, Z., Ovarian cancer identification based on dimensionality reduction for high-throughput mass spectrometry data. Bioinformatics 21:2200–2209, 2005.CrossRef
17.
go back to reference Adam, B. L., Qu, Y., Davis, J. W., Ward, M. D., Clements, M. A., Cazares, L. H., et al., Serum protein fingerprinting coupled with a pattern-matching algorithm distinguishes prostate cancer from benign prostate hyperplasia and healthy men. Cancer Res. 62:3609–3614, 2002. Adam, B. L., Qu, Y., Davis, J. W., Ward, M. D., Clements, M. A., Cazares, L. H., et al., Serum protein fingerprinting coupled with a pattern-matching algorithm distinguishes prostate cancer from benign prostate hyperplasia and healthy men. Cancer Res. 62:3609–3614, 2002.
18.
go back to reference Petricoin, E. F., Ardekani, A. M., Hitt, B. A., Levine, P. J., Fusaro, V. A., Steinberg, S. M., et al., Use of proteomic patterns in serum to identify ovarian cancer. Lancet 359:572–577, 2002.CrossRef Petricoin, E. F., Ardekani, A. M., Hitt, B. A., Levine, P. J., Fusaro, V. A., Steinberg, S. M., et al., Use of proteomic patterns in serum to identify ovarian cancer. Lancet 359:572–577, 2002.CrossRef
19.
go back to reference Bellazzi, R., and Zupan, B., Predictive data mining in clinical medicine: current issues and guidelines. Int. J. Med. Inform. 77:81–97, 2008.CrossRef Bellazzi, R., and Zupan, B., Predictive data mining in clinical medicine: current issues and guidelines. Int. J. Med. Inform. 77:81–97, 2008.CrossRef
20.
21.
go back to reference Seifert, J. W., Data mining: An overview. CRS Report for Congress, The Library of Congress, Dec 2004. Seifert, J. W., Data mining: An overview. CRS Report for Congress, The Library of Congress, Dec 2004.
22.
go back to reference Hand, D., Statistics and data mining: intersecting disciplines. ACM SIGKDD 1(1):16–19, 1999.CrossRef Hand, D., Statistics and data mining: intersecting disciplines. ACM SIGKDD 1(1):16–19, 1999.CrossRef
23.
go back to reference Ichise, R., and Numao Learning, M., First-order rules to handle medical data. NII Journal 2:9–14, 2001. Ichise, R., and Numao Learning, M., First-order rules to handle medical data. NII Journal 2:9–14, 2001.
24.
go back to reference Jolins, J., Ancukiewicz, M., DeLong, E., Pryor, D., Muhlbaier, L., and Mark, D., Discordance of databases designed for claims payment versus clinical information systems: implications for outcomes research. Ann. Intern. Med. 119:844–850, 1993. Jolins, J., Ancukiewicz, M., DeLong, E., Pryor, D., Muhlbaier, L., and Mark, D., Discordance of databases designed for claims payment versus clinical information systems: implications for outcomes research. Ann. Intern. Med. 119:844–850, 1993.
25.
go back to reference Dans, P., Looking for answers in all the wrong places. Ann. Intern. Med. 119:855–857, 1993. Dans, P., Looking for answers in all the wrong places. Ann. Intern. Med. 119:855–857, 1993.
26.
go back to reference Prather, J. C., Lobach, D. F., Goodwin, L. F., Hales, J. W., Hage, M. L., and Hammond, W. E., Medical data mining knowledge discovery in a clinical data warehouse. AMIA 1091–8280:101–105, 1997. Prather, J. C., Lobach, D. F., Goodwin, L. F., Hales, J. W., Hage, M. L., and Hammond, W. E., Medical data mining knowledge discovery in a clinical data warehouse. AMIA 1091–8280:101–105, 1997.
27.
go back to reference Berman, J. J., Confidentiality issues for medical data miners. Artif. Intell. Med. 26:25–36, 2002.CrossRef Berman, J. J., Confidentiality issues for medical data miners. Artif. Intell. Med. 26:25–36, 2002.CrossRef
28.
go back to reference Cios, K., and Moore, G. W., Uniqueness of medical data mining. Artif. Intell. Med. 26(1–2):1–24, 2002.CrossRef Cios, K., and Moore, G. W., Uniqueness of medical data mining. Artif. Intell. Med. 26(1–2):1–24, 2002.CrossRef
29.
go back to reference Brachman, R. J., Khabaza, T., Kloesgen, W., Piatetsky-Shapiro, G., and Simoudis, E., Mining business databases. Commun. ACM 39(11):42–48, 1996.CrossRef Brachman, R. J., Khabaza, T., Kloesgen, W., Piatetsky-Shapiro, G., and Simoudis, E., Mining business databases. Commun. ACM 39(11):42–48, 1996.CrossRef
30.
go back to reference Velickov, S., Solomatine, D., Predictive data mining: practical examples. 2nd Joint Workshop on Applied AI in Civil Engineering, Cottbus, Germany, March 2000. Velickov, S., Solomatine, D., Predictive data mining: practical examples. 2nd Joint Workshop on Applied AI in Civil Engineering, Cottbus, Germany, March 2000.
31.
go back to reference Dunham, M., Data mining—Introductory and advanced topics. Pearson Education, 2003. Dunham, M., Data mining—Introductory and advanced topics. Pearson Education, 2003.
32.
go back to reference Kononenko, I., Machine learning for medical diagnosis: history, state of the art and perspective. Artif. Intell. Med. 23:89–109, 2001.CrossRef Kononenko, I., Machine learning for medical diagnosis: history, state of the art and perspective. Artif. Intell. Med. 23:89–109, 2001.CrossRef
33.
go back to reference Delen, D., Walker, G., and Kadam, A., Predicting breast cancer survivability: a comparison of three data mining methods. Artif. Intell. Med. 34:113–127, 2005.CrossRef Delen, D., Walker, G., and Kadam, A., Predicting breast cancer survivability: a comparison of three data mining methods. Artif. Intell. Med. 34:113–127, 2005.CrossRef
34.
go back to reference Anderson, J. A., and Davis, J., An introduction to neural networks. MIT, Cambride, 1995.MATH Anderson, J. A., and Davis, J., An introduction to neural networks. MIT, Cambride, 1995.MATH
35.
go back to reference Obenshain, M. K., Application of data mining techniques to healthcare data. Infect. Control Hosp. Epidemiol. 25(8):690–695, 2004.CrossRef Obenshain, M. K., Application of data mining techniques to healthcare data. Infect. Control Hosp. Epidemiol. 25(8):690–695, 2004.CrossRef
36.
go back to reference Übeyli, E. D., Comparison of different classification algorithms in clinical decision making. Expert syst 24(1):17–31, 2007.CrossRef Übeyli, E. D., Comparison of different classification algorithms in clinical decision making. Expert syst 24(1):17–31, 2007.CrossRef
37.
go back to reference Kaur, H., and Wasan, S. K., Empirical study on applications of data mining techniques in healthcare. J. Comput. Sci. 2(2):194–200, 2006.CrossRef Kaur, H., and Wasan, S. K., Empirical study on applications of data mining techniques in healthcare. J. Comput. Sci. 2(2):194–200, 2006.CrossRef
38.
go back to reference Romeo, M., Burden, F., Quinn, M., Wood, B., and McNaughton, D., Infrared microspectroscopy and artificial neural networks in the diagnosis of cervical cancer. Cell. Mol. Biol. (Noisy-le-Grand, France) 44(1):179, 1998. Romeo, M., Burden, F., Quinn, M., Wood, B., and McNaughton, D., Infrared microspectroscopy and artificial neural networks in the diagnosis of cervical cancer. Cell. Mol. Biol. (Noisy-le-Grand, France) 44(1):179, 1998.
39.
go back to reference Ball, G., Mian, S., Holding, F., Allibone, R., Lowe, J., Ali, S., et al., An integrated approach utilizing artificial neural networks and SELDI mass spectrometry for the classification of human tumours and rapid identification of potential biomarkers. Bioinformatics 18(3):395–404, 2002.CrossRef Ball, G., Mian, S., Holding, F., Allibone, R., Lowe, J., Ali, S., et al., An integrated approach utilizing artificial neural networks and SELDI mass spectrometry for the classification of human tumours and rapid identification of potential biomarkers. Bioinformatics 18(3):395–404, 2002.CrossRef
40.
go back to reference Aleynikov, S., and Micheli-Tzanakou, E., Classification of retinal damage by a neural network based system. J. Med. Syst. 22(3):129–136, 1998.CrossRef Aleynikov, S., and Micheli-Tzanakou, E., Classification of retinal damage by a neural network based system. J. Med. Syst. 22(3):129–136, 1998.CrossRef
41.
go back to reference Potter, R., Comparison of classification algorithms applied to breast cancer diagnosis and prognosis, advances in data mining, 7th Industrial Conference, ICDM 2007, Leipzig, Germany, July 2007, pp.40–49. Potter, R., Comparison of classification algorithms applied to breast cancer diagnosis and prognosis, advances in data mining, 7th Industrial Conference, ICDM 2007, Leipzig, Germany, July 2007, pp.40–49.
42.
go back to reference Kononenko, I., Bratko, I., and Kukar, M., Application of machine learning to medical diagnosis. Machine Learning and Data Mining: Methods and Applications 389:408, 1997. Kononenko, I., Bratko, I., and Kukar, M., Application of machine learning to medical diagnosis. Machine Learning and Data Mining: Methods and Applications 389:408, 1997.
43.
go back to reference Sharma, A., and Roy, R. J., Design of a recognition system to predict movement during anesthesia. IEEE Trans. Biomed. Eng. 44(6):505–511, 1997.CrossRef Sharma, A., and Roy, R. J., Design of a recognition system to predict movement during anesthesia. IEEE Trans. Biomed. Eng. 44(6):505–511, 1997.CrossRef
44.
go back to reference Einstein, A. J., Wu, H. S., Sanchez, M., and Gil, J., Fractal characterization of chromatin appearance for diagnosis in breast cytology. J. Pathol. 185(4):366–381, 1998.CrossRef Einstein, A. J., Wu, H. S., Sanchez, M., and Gil, J., Fractal characterization of chromatin appearance for diagnosis in breast cytology. J. Pathol. 185(4):366–381, 1998.CrossRef
45.
go back to reference Brickley, M., Shepherd, J. P., and Armstrong, R. A., Neural networks: a new technique for development of decision support systems in dentistry. J. Dent. 26(4):305–309, 1998.CrossRef Brickley, M., Shepherd, J. P., and Armstrong, R. A., Neural networks: a new technique for development of decision support systems in dentistry. J. Dent. 26(4):305–309, 1998.CrossRef
46.
go back to reference Schwarzer, G., Vach, W., and Schumacher, M., On the misuses of artificial neural networks for prognostic and diagnostic classification in oncology. Stat. Med. 19:541–561, 2000.CrossRef Schwarzer, G., Vach, W., and Schumacher, M., On the misuses of artificial neural networks for prognostic and diagnostic classification in oncology. Stat. Med. 19:541–561, 2000.CrossRef
47.
go back to reference Craven, M. W., Shavlik, J. W., Learning symbolic rules using artificial neural networks. Proc. 10th International Conference on Machine Learning. Amherst, MA, 1993. Craven, M. W., Shavlik, J. W., Learning symbolic rules using artificial neural networks. Proc. 10th International Conference on Machine Learning. Amherst, MA, 1993.
48.
go back to reference Quinlan, J. R., Discovering rules by induction from large collections of examples. In: Michie, D., (Ed.), Expert Systems in the Micro Electronic Age. Edinburgh University Press, 1979. Quinlan, J. R., Discovering rules by induction from large collections of examples. In: Michie, D., (Ed.), Expert Systems in the Micro Electronic Age. Edinburgh University Press, 1979.
49.
go back to reference Quinlan, J. R., Learning efficient classification procedures and their application to chess endgames. In: Michalski, R. S., Carbonell, J. G., and Mitchell, T. M. (Eds.), Machine learning: an artificial intelligence approach. Tioga Publishing Company, Palo Alto, 1983. Quinlan, J. R., Learning efficient classification procedures and their application to chess endgames. In: Michalski, R. S., Carbonell, J. G., and Mitchell, T. M. (Eds.), Machine learning: an artificial intelligence approach. Tioga Publishing Company, Palo Alto, 1983.
50.
go back to reference Quinlan, J. R., C4.5: programs for machine learning. Morgan Kaufmann, Amsterdam, 1993. Quinlan, J. R., C4.5: programs for machine learning. Morgan Kaufmann, Amsterdam, 1993.
51.
go back to reference Boser, B. E., Guyon, I. M., and Vapnik, V. N., A training algorithm for optimal margin classifiers, Fifth Annual Workshop on Computational Learning Theory. ACM, Pittsburgh, pp. 144–152, 1992. Boser, B. E., Guyon, I. M., and Vapnik, V. N., A training algorithm for optimal margin classifiers, Fifth Annual Workshop on Computational Learning Theory. ACM, Pittsburgh, pp. 144–152, 1992.
52.
go back to reference Vapnik, V. N., The nature of statistical learning theory. Springer, NY, 1995.MATH Vapnik, V. N., The nature of statistical learning theory. Springer, NY, 1995.MATH
53.
go back to reference Vapnik, V. N., and Lerner, A., Pattern recognition using generalized portrait method. Autom. Remote Control 24:774–780, 1963. Vapnik, V. N., and Lerner, A., Pattern recognition using generalized portrait method. Autom. Remote Control 24:774–780, 1963.
54.
go back to reference Vapnik, V. N., and Chervonenkis, Y., On the uniform convergence of relative frequencies of events to their probabilities. Theory Probab. Appl. 16:264–280, 1971.MATHCrossRef Vapnik, V. N., and Chervonenkis, Y., On the uniform convergence of relative frequencies of events to their probabilities. Theory Probab. Appl. 16:264–280, 1971.MATHCrossRef
55.
go back to reference Meyer, D., Leischa, F., and Hornikb, K., The support vector machine under test. Neurocomputing 55(1–2):169–186, 2003.CrossRef Meyer, D., Leischa, F., and Hornikb, K., The support vector machine under test. Neurocomputing 55(1–2):169–186, 2003.CrossRef
56.
go back to reference Liu, B., Hsu, W., Ma, Y., Integrating classification and association rule mining, KDD’98. New York, NY, Aug. 1998. Liu, B., Hsu, W., Ma, Y., Integrating classification and association rule mining, KDD’98. New York, NY, Aug. 1998.
57.
go back to reference Cho, S. B., and Won, H. H., Cancer classification using ensemble of neural networks with multiple significant gene subsets. Appl. Intell. 26:243–250, 2007.MATHCrossRef Cho, S. B., and Won, H. H., Cancer classification using ensemble of neural networks with multiple significant gene subsets. Appl. Intell. 26:243–250, 2007.MATHCrossRef
58.
go back to reference Whitehead, M., and Yaeger, L., Sentiment mining using ensemble classification models. In: Sobh, T. (Ed.), Innovations and advances in computer sciences and engineering. Springer, Netherlands, pp. 509–514, 2010.CrossRef Whitehead, M., and Yaeger, L., Sentiment mining using ensemble classification models. In: Sobh, T. (Ed.), Innovations and advances in computer sciences and engineering. Springer, Netherlands, pp. 509–514, 2010.CrossRef
59.
go back to reference Moon, H., Ahn, H., Kodell, R. L., Baek, S., Lin, C. J., and Chen, J. J., Ensemble methods for classification of patients for personalized medicine with high-dimensional data. Artif. Intell. Med. 41(3):197–207, 2007.CrossRef Moon, H., Ahn, H., Kodell, R. L., Baek, S., Lin, C. J., and Chen, J. J., Ensemble methods for classification of patients for personalized medicine with high-dimensional data. Artif. Intell. Med. 41(3):197–207, 2007.CrossRef
60.
go back to reference Schapire, R. E., The strength of weak learnability. Mach. Learn. 5(2):197–227, 1990. Schapire, R. E., The strength of weak learnability. Mach. Learn. 5(2):197–227, 1990.
62.
go back to reference Ho, T. K., The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 20(8):832–844, 1998.CrossRef Ho, T. K., The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 20(8):832–844, 1998.CrossRef
63.
go back to reference Ahn, H., Moon, H., Fazzari, M. J., Lim, N., Chen, J. J., and Kodell, R. L., Classification by ensembles from random partitions of high-dimensional data. Comput. Stat. Data Anal. 51:6166–6179, 2007.MathSciNetMATHCrossRef Ahn, H., Moon, H., Fazzari, M. J., Lim, N., Chen, J. J., and Kodell, R. L., Classification by ensembles from random partitions of high-dimensional data. Comput. Stat. Data Anal. 51:6166–6179, 2007.MathSciNetMATHCrossRef
64.
go back to reference Zhou, Z. H., et al., Lung cancer cell identification based on artificial neural network ensembles. Artif. Intell. Med. 24(1):25–36, 2002.MATHCrossRef Zhou, Z. H., et al., Lung cancer cell identification based on artificial neural network ensembles. Artif. Intell. Med. 24(1):25–36, 2002.MATHCrossRef
65.
go back to reference Santos-Garcia, G., Varela, G., Novoa, N., and Jiménez, M. F., Prediction of postoperative morbidity after lung resection using an artificial neural network ensemble. Artif. Intell. Med. 30(1):61–69, 2004.CrossRef Santos-Garcia, G., Varela, G., Novoa, N., and Jiménez, M. F., Prediction of postoperative morbidity after lung resection using an artificial neural network ensemble. Artif. Intell. Med. 30(1):61–69, 2004.CrossRef
66.
go back to reference Freund, Y., and Schapire, R., A desicion-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55:119–139, 1997.MathSciNetMATHCrossRef Freund, Y., and Schapire, R., A desicion-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55:119–139, 1997.MathSciNetMATHCrossRef
67.
go back to reference Morra, J. H., Tu, Z., Apostolova, L. G., Green, A. E., Toga, A. W., and Thompson, P. M., Comparison of Adaboost and support vector machines for detecting Alzheimer’s disease through automated hippocampal segmentation. IEEE Trans. Med. Imag. 29(1):30–43, 2010.CrossRef Morra, J. H., Tu, Z., Apostolova, L. G., Green, A. E., Toga, A. W., and Thompson, P. M., Comparison of Adaboost and support vector machines for detecting Alzheimer’s disease through automated hippocampal segmentation. IEEE Trans. Med. Imag. 29(1):30–43, 2010.CrossRef
68.
go back to reference Situ, N., Yuan, X., Zouridakis, G., Boosting instance prototypes to detect local dermoscopic features, 32nd Annual International Conference of the IEEE EMBS (Buenos Aires, Argentina, 2010, Aug 31–Sep 4), pp. 5561–5564. Situ, N., Yuan, X., Zouridakis, G., Boosting instance prototypes to detect local dermoscopic features, 32nd Annual International Conference of the IEEE EMBS (Buenos Aires, Argentina, 2010, Aug 31–Sep 4), pp. 5561–5564.
69.
go back to reference Douglas, P. K., Harris, S., Yuille, A., Cohen, M. S., Performance comparison of machine learning algorithms and number of independent components used in fMRI decoding of belief vs. disbelief. Neuroimage, 2010. doi:10.1016/j.neuroimage.2010.11.002. Douglas, P. K., Harris, S., Yuille, A., Cohen, M. S., Performance comparison of machine learning algorithms and number of independent components used in fMRI decoding of belief vs. disbelief. Neuroimage, 2010. doi:10.​1016/​j.​neuroimage.​2010.​11.​002.
70.
go back to reference Lopes, R., Ayache, A., Makni, N., Puech, P., Villers, A., Mordon, S., et al., Prostate cancer characterization on MR images using fractal features. Med. Phys. 38:83–95, 2011.CrossRef Lopes, R., Ayache, A., Makni, N., Puech, P., Villers, A., Mordon, S., et al., Prostate cancer characterization on MR images using fractal features. Med. Phys. 38:83–95, 2011.CrossRef
71.
go back to reference Kaufman, L., Rousseeuw, P. J., Finding groups in data: an introduction to cluster analysis. Wiley, 1990. Kaufman, L., Rousseeuw, P. J., Finding groups in data: an introduction to cluster analysis. Wiley, 1990.
72.
go back to reference Yoo, I., and Hu, X., A comprehensive comparison study of document clustering for a biomedical digital library MDELINE. ACM/IEEE Joint Conference on Digital Libraries 11–15:220–229, 2006. Chapel Hill, NC, June 11–15, 2006. Yoo, I., and Hu, X., A comprehensive comparison study of document clustering for a biomedical digital library MDELINE. ACM/IEEE Joint Conference on Digital Libraries 11–15:220–229, 2006. Chapel Hill, NC, June 11–15, 2006.
73.
go back to reference Yoo, I., Hu, X., and Song, I.-Y., Biomedical ontology improves biomedical literature clustering performance: a comparison study. Int. J. Bioinform. Res. Appl. 3(3):414–428, 2007.CrossRef Yoo, I., Hu, X., and Song, I.-Y., Biomedical ontology improves biomedical literature clustering performance: a comparison study. Int. J. Bioinform. Res. Appl. 3(3):414–428, 2007.CrossRef
74.
go back to reference Piatetsky-Shapiro, G., Discovery, analysis, and presentation of strong rules. In: Piatetsky-Shapiro, G., (Ed.), Knowledge Discovery in Databases. AAAI/MIT Press, 1991, pp. 229–248. Piatetsky-Shapiro, G., Discovery, analysis, and presentation of strong rules. In: Piatetsky-Shapiro, G., (Ed.), Knowledge Discovery in Databases. AAAI/MIT Press, 1991, pp. 229–248.
75.
go back to reference Agrawal, R., Imielinski, T., and Swami, A., Mining association rules between sets of items in large databases, Proceedings of the ACM SIGMOD International Conference on the Management of Data. ACM, Washington DC, pp. 207–216, 1993. Agrawal, R., Imielinski, T., and Swami, A., Mining association rules between sets of items in large databases, Proceedings of the ACM SIGMOD International Conference on the Management of Data. ACM, Washington DC, pp. 207–216, 1993.
76.
go back to reference Agrawal, R., and Srikant, R., Fast algorithms for mining association rules, Proceedings of the 20th International Conference on Very Large Data Bases (VLDB’94). Morgan Kaufmann, Santiago, pp. 487–499, 1994. Agrawal, R., and Srikant, R., Fast algorithms for mining association rules, Proceedings of the 20th International Conference on Very Large Data Bases (VLDB’94). Morgan Kaufmann, Santiago, pp. 487–499, 1994.
77.
go back to reference Park, J. S., Chen, M. S., Yu, P. S., An effective hash-based algorithm for mining association rules, Proceedings 1995 ACM SIGMOD International Conference on Management of Data (SIGMOD’95), San Jose, CA (May 1995), pp. 175–186. Park, J. S., Chen, M. S., Yu, P. S., An effective hash-based algorithm for mining association rules, Proceedings 1995 ACM SIGMOD International Conference on Management of Data (SIGMOD’95), San Jose, CA (May 1995), pp. 175–186.
78.
go back to reference Toivonen, H., Sampling large databases for association rules, Proceedings 1996 International Conference on Very Large Databases (VLDB’96), Bombay, India (Sept. 1996), pp.134–145. Toivonen, H., Sampling large databases for association rules, Proceedings 1996 International Conference on Very Large Databases (VLDB’96), Bombay, India (Sept. 1996), pp.134–145.
79.
go back to reference Steinbach, M., Karypis, G., Kumar, V., A comparison of document clustering techniques, Technical Report #00-034. Department of Computer Science and Engineering, University of Minnesota, 2000. Steinbach, M., Karypis, G., Kumar, V., A comparison of document clustering techniques, Technical Report #00-034. Department of Computer Science and Engineering, University of Minnesota, 2000.
83.
go back to reference Golub, T. R., et al., Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531–537, 1999.CrossRef Golub, T. R., et al., Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531–537, 1999.CrossRef
84.
go back to reference Hu, H., Li, J., Plank, A., Wang, H., Daggard, G., A comparative study of classification methods for microarray data analysis. CRPIT Volume 61, Proceedings Fifth Australasian Data Mining Conference. 2006. p. 33–37. Hu, H., Li, J., Plank, A., Wang, H., Daggard, G., A comparative study of classification methods for microarray data analysis. CRPIT Volume 61, Proceedings Fifth Australasian Data Mining Conference. 2006. p. 33–37.
85.
go back to reference Ries, L. A. G., Harkins, D., Krapcho, M., et al., SEER Cancer Statistics Review, 1975–2003. National Cancer Institute, Bethesda, 2006. Ries, L. A. G., Harkins, D., Krapcho, M., et al., SEER Cancer Statistics Review, 1975–2003. National Cancer Institute, Bethesda, 2006.
86.
go back to reference Van’t Veer, L. J., Dai, H., van de Vijver, M. J., He, Y. D., Hart, A. A., Mao, M., Peterse, H. L., et al., Gene expression profiling predicts clinical outcome of breast cancer. Nature 415:530–536, 2002.CrossRef Van’t Veer, L. J., Dai, H., van de Vijver, M. J., He, Y. D., Hart, A. A., Mao, M., Peterse, H. L., et al., Gene expression profiling predicts clinical outcome of breast cancer. Nature 415:530–536, 2002.CrossRef
88.
go back to reference Cox, D. R., Analysis of survival data. Chapman & Hall, London, 1984. Cox, D. R., Analysis of survival data. Chapman & Hall, London, 1984.
89.
go back to reference Shah, S., Kusiak, A., and Dixon, B., Data Mining in predicting survival of kidney dialysis patients, Proceedings of Photonics West—Bios 2003. In: Bass, L. S., et al. (Eds.), Lasers in surgery: advanced characterization, therapeutics, and systems XIII, 4949. SPIE, Belingham, 2003. Shah, S., Kusiak, A., and Dixon, B., Data Mining in predicting survival of kidney dialysis patients, Proceedings of Photonics West—Bios 2003. In: Bass, L. S., et al. (Eds.), Lasers in surgery: advanced characterization, therapeutics, and systems XIII, 4949. SPIE, Belingham, 2003.
90.
go back to reference Beller, G., The rising cost of health care in the United States: is it making the United States globally noncompetitive? J. Nucl. Cardiol. 15(4):481–482, 2008.CrossRef Beller, G., The rising cost of health care in the United States: is it making the United States globally noncompetitive? J. Nucl. Cardiol. 15(4):481–482, 2008.CrossRef
91.
go back to reference Bertsimas, D., Bjarnadóttir, M. V., Kane, M. A., Kryder, J. C., Pandey, R., Vempala, S., and Wang, G., Algorithmic prediction of health-care costs. Oper. Res. 56(6):1382–1392, 2008.MATHCrossRef Bertsimas, D., Bjarnadóttir, M. V., Kane, M. A., Kryder, J. C., Pandey, R., Vempala, S., and Wang, G., Algorithmic prediction of health-care costs. Oper. Res. 56(6):1382–1392, 2008.MATHCrossRef
92.
go back to reference Kerr, G., Ruskin, H. J., Crane, M., and Doolan, P., Techniques for clustering gene expression data. Comput. Biol. Med. 38(3):283–293, 2008.CrossRef Kerr, G., Ruskin, H. J., Crane, M., and Doolan, P., Techniques for clustering gene expression data. Comput. Biol. Med. 38(3):283–293, 2008.CrossRef
93.
go back to reference Do, J. H., and Choi, D. K., Clustering approaches to identifying gene expression patterns from DNA microarray data. Mol. Cells 25(2):279–288, 2008. Do, J. H., and Choi, D. K., Clustering approaches to identifying gene expression patterns from DNA microarray data. Mol. Cells 25(2):279–288, 2008.
94.
go back to reference Chae, Y. M., Ho, S. H., Cho, K. W., Lee, D. H., and Ji, S. H., Data mining approach to policy analysis in a health insurance domain. Int. J. Med. Inform. 62:103–111, 2001.CrossRef Chae, Y. M., Ho, S. H., Cho, K. W., Lee, D. H., and Ji, S. H., Data mining approach to policy analysis in a health insurance domain. Int. J. Med. Inform. 62:103–111, 2001.CrossRef
95.
go back to reference Adler, L. D., and Nierenberg, A. A., Review of medication adherence in children and adults with ADHD. Postgrad. Med. 122(1):184–191, 2010.CrossRef Adler, L. D., and Nierenberg, A. A., Review of medication adherence in children and adults with ADHD. Postgrad. Med. 122(1):184–191, 2010.CrossRef
96.
go back to reference Tsai, M. H., and Huang, Y. S., Attention-deficit/hyperactivity disorder and sleep disorders in children. Med. Clin. North Am. 94(3):615–632, 2010.CrossRef Tsai, M. H., and Huang, Y. S., Attention-deficit/hyperactivity disorder and sleep disorders in children. Med. Clin. North Am. 94(3):615–632, 2010.CrossRef
97.
go back to reference Kessler, R. C., Adler, L. A., Barkley, R., et al., The prevalence and correlates of adult ADHD in the United States: results from the National Comorbidity Survey Replication. Am. J. Psychiatry 163(4):716–723, 2006.CrossRef Kessler, R. C., Adler, L. A., Barkley, R., et al., The prevalence and correlates of adult ADHD in the United States: results from the National Comorbidity Survey Replication. Am. J. Psychiatry 163(4):716–723, 2006.CrossRef
98.
go back to reference Gau, S., Chong, M., Chen, T., and Cheng, A., A 3-year panel study of mental disorders among adolescents in Taiwan. Am. J. Psychiatry 162(7):1344–1350, 2005.CrossRef Gau, S., Chong, M., Chen, T., and Cheng, A., A 3-year panel study of mental disorders among adolescents in Taiwan. Am. J. Psychiatry 162(7):1344–1350, 2005.CrossRef
99.
go back to reference Tai, Y. M., and Chiu, H. W., Comorbidity study of ADHD: applying association rule mining (ARM) to National Health Insurance Database of Taiwan. Int. J. Med. Inform. 78:75–83, 2009.CrossRef Tai, Y. M., and Chiu, H. W., Comorbidity study of ADHD: applying association rule mining (ARM) to National Health Insurance Database of Taiwan. Int. J. Med. Inform. 78:75–83, 2009.CrossRef
100.
go back to reference Chen, T. J., Chou, L. F., and Hwang, S. J., Application of a data-mining technique to analyze coprescription patterns for antacids in Taiwan. Clin. Ther. 25(9):2453–2463, 2003.CrossRef Chen, T. J., Chou, L. F., and Hwang, S. J., Application of a data-mining technique to analyze coprescription patterns for antacids in Taiwan. Clin. Ther. 25(9):2453–2463, 2003.CrossRef
101.
go back to reference Breault, J. L., Data mining diabetic databases: are rough sets a useful addition? Proceedings of the 33rd Symposium on the Interface. Computing Science and Statistics, Fairfax, 2001. Breault, J. L., Data mining diabetic databases: are rough sets a useful addition? Proceedings of the 33rd Symposium on the Interface. Computing Science and Statistics, Fairfax, 2001.
102.
go back to reference Goodwin, L., and Iannacchione, M. A., Data mining methods for improving birth outcomes prediction. Outcomes Manage. 6(2):80–85, 2002. Goodwin, L., and Iannacchione, M. A., Data mining methods for improving birth outcomes prediction. Outcomes Manage. 6(2):80–85, 2002.
103.
go back to reference Breault, J. L., Goodall, C. R., and Fos, P. J., Data mining a diabetic data warehouse. Artif. Intell. Med. 26:37–54, 2002.CrossRef Breault, J. L., Goodall, C. R., and Fos, P. J., Data mining a diabetic data warehouse. Artif. Intell. Med. 26:37–54, 2002.CrossRef
104.
go back to reference Andrews, P. J., Sleeman, D. H., Statham, P. F. X., Mcquatt, A., Corruble, V., Jones, P. A., et al., Predicting recovery in patients suffering from traumatic brain injury by using admission variables and physiological data: a comparison between decision tree analysis and logistic regression. J. Neurosurg. 97:326–336, 2002.CrossRef Andrews, P. J., Sleeman, D. H., Statham, P. F. X., Mcquatt, A., Corruble, V., Jones, P. A., et al., Predicting recovery in patients suffering from traumatic brain injury by using admission variables and physiological data: a comparison between decision tree analysis and logistic regression. J. Neurosurg. 97:326–336, 2002.CrossRef
105.
go back to reference Goodwin, L., VanDyne, M., Lin, S., and Talbert, S., Data mining issues and opportunities for building nursing knowledge. J. Biomed. Inform. 36:379–388, 2003.CrossRef Goodwin, L., VanDyne, M., Lin, S., and Talbert, S., Data mining issues and opportunities for building nursing knowledge. J. Biomed. Inform. 36:379–388, 2003.CrossRef
106.
go back to reference Nevins, J. R., Huang, E. S., Dressman, H., Pittman, J., Huang, A. T., and West, M., Towards integrated clinico-genomic models for personalized medicine: combining gene expression signatures and clinical factors in breast cancer outcomes prediction, Human Molecular Genetics 12. Review Issue 2:R153–R157, 2003. Nevins, J. R., Huang, E. S., Dressman, H., Pittman, J., Huang, A. T., and West, M., Towards integrated clinico-genomic models for personalized medicine: combining gene expression signatures and clinical factors in breast cancer outcomes prediction, Human Molecular Genetics 12. Review Issue 2:R153–R157, 2003.
107.
go back to reference Sigurdardottir, A. K., Jonsdottir, H., and Benediktsson, R., Outcomes of educational interventions in type 2 diabetes: WEKA data-mining analysis. Patient Educ. Couns. 67:21–31, 2007.CrossRef Sigurdardottir, A. K., Jonsdottir, H., and Benediktsson, R., Outcomes of educational interventions in type 2 diabetes: WEKA data-mining analysis. Patient Educ. Couns. 67:21–31, 2007.CrossRef
108.
go back to reference Huang, L., Hsu, S., Lin, E., A comparison of classification methods for predicting Chronic Fatigue Syndrome based on genetic data. Journal of Translational Medicine. 7–81, 2009. Huang, L., Hsu, S., Lin, E., A comparison of classification methods for predicting Chronic Fatigue Syndrome based on genetic data. Journal of Translational Medicine. 7–81, 2009.
109.
go back to reference Toussi, M., Lamy, J., Le Toumelin, P., Venot, A., Using data mining techniques to explore physicians’ therapeutic decisions when clinical guidelines do not provide recommendations: methods and example for type 2 diabetes. BMC Med. Informat. Decis. Making 9–28, 2009. Toussi, M., Lamy, J., Le Toumelin, P., Venot, A., Using data mining techniques to explore physicians’ therapeutic decisions when clinical guidelines do not provide recommendations: methods and example for type 2 diabetes. BMC Med. Informat. Decis. Making 9–28, 2009.
110.
go back to reference Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I. H., The WEKA data mining software: an update. SIGKDD Explorations 11(1), 2009. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I. H., The WEKA data mining software: an update. SIGKDD Explorations 11(1), 2009.
Metadata
Title
Data Mining in Healthcare and Biomedicine: A Survey of the Literature
Authors
Illhoi Yoo
Patricia Alafaireet
Miroslav Marinov
Keila Pena-Hernandez
Rajitha Gopidi
Jia-Fu Chang
Lei Hua
Publication date
01-08-2012
Publisher
Springer US
Published in
Journal of Medical Systems / Issue 4/2012
Print ISSN: 0148-5598
Electronic ISSN: 1573-689X
DOI
https://doi.org/10.1007/s10916-011-9710-5

Other articles of this Issue 4/2012

Journal of Medical Systems 4/2012 Go to the issue