Skip to main content
Top
Published in: Journal of Medical Systems 3/2015

01-03-2015 | Transactional Processing Systems

Design and Development of a Medical Big Data Processing System Based on Hadoop

Authors: Qin Yao, Yu Tian, Peng-Fei Li, Li-Li Tian, Yang-Ming Qian, Jing-Song Li

Published in: Journal of Medical Systems | Issue 3/2015

Login to get access

Abstract

Secondary use of medical big data is increasingly popular in healthcare services and clinical research. Understanding the logic behind medical big data demonstrates tendencies in hospital information technology and shows great significance for hospital information systems that are designing and expanding services. Big data has four characteristics – Volume, Variety, Velocity and Value (the 4 Vs) – that make traditional systems incapable of processing these data using standalones. Apache Hadoop MapReduce is a promising software framework for developing applications that process vast amounts of data in parallel with large clusters of commodity hardware in a reliable, fault-tolerant manner. With the Hadoop framework and MapReduce application program interface (API), we can more easily develop our own MapReduce applications to run on a Hadoop framework that can scale up from a single node to thousands of machines. This paper investigates a practical case of a Hadoop-based medical big data processing system. We developed this system to intelligently process medical big data and uncover some features of hospital information system user behaviors. This paper studies user behaviors regarding various data produced by different hospital information systems for daily work. In this paper, we also built a five-node Hadoop cluster to execute distributed MapReduce algorithms. Our distributed algorithms show promise in facilitating efficient data processing with medical big data in healthcare services and clinical research compared with single nodes. Additionally, with medical big data analytics, we can design our hospital information systems to be much more intelligent and easier to use by making personalized recommendations.
Literature
1.
go back to reference Lin, C., Lin, I.-C., and Roan, J., Barriers to physicians’ adoption of healthcare information technology: an empirical study on multiple hospitals. J. Med. Syst. 36(3):1965–1977, 2012.CrossRef Lin, C., Lin, I.-C., and Roan, J., Barriers to physicians’ adoption of healthcare information technology: an empirical study on multiple hospitals. J. Med. Syst. 36(3):1965–1977, 2012.CrossRef
2.
go back to reference Poon, E. G., Jha, A. K., Christino, M., Honour, M. M., Fernandopulle, R., Middleton, B., Newhouse, J., Leape, L., Bates, D. W., and Blumenthal, D., Assessing the level of healthcare information technology adoption in the United States: a snapshot. BMC Med. Inform. Decis. Mak. 6(1):1, 2006.CrossRef Poon, E. G., Jha, A. K., Christino, M., Honour, M. M., Fernandopulle, R., Middleton, B., Newhouse, J., Leape, L., Bates, D. W., and Blumenthal, D., Assessing the level of healthcare information technology adoption in the United States: a snapshot. BMC Med. Inform. Decis. Mak. 6(1):1, 2006.CrossRef
3.
go back to reference Miller, R. H., and Sim, I., Physicians’ use of electronic medical records: barriers and solutions. Health Aff. 23(2):116–126, 2004.CrossRef Miller, R. H., and Sim, I., Physicians’ use of electronic medical records: barriers and solutions. Health Aff. 23(2):116–126, 2004.CrossRef
4.
go back to reference Blumenthal, D., Stimulating the adoption of health information technology. N. Engl. J. Med. 360(15):1477–1479, 2009.CrossRef Blumenthal, D., Stimulating the adoption of health information technology. N. Engl. J. Med. 360(15):1477–1479, 2009.CrossRef
7.
go back to reference Horiguchi, H., Yasunaga, H., Hashimoto, H., and Ohe, K., A user-friendly tool to transform large scale administrative data into wide table format using a mapreduce program with a pig latin based script. BMC Med. Inform. Decis. Mak. 12:8, 2012. doi:10.1186/1472-6947-12-151.CrossRef Horiguchi, H., Yasunaga, H., Hashimoto, H., and Ohe, K., A user-friendly tool to transform large scale administrative data into wide table format using a mapreduce program with a pig latin based script. BMC Med. Inform. Decis. Mak. 12:8, 2012. doi:10.​1186/​1472-6947-12-151.CrossRef
8.
go back to reference Liu, B., Madduri, R. K., Sotomayor, B., Chard, K., Lacinski, L., Dave, U. J., Li, J. Q., Liu, C. C., and Foster, I. T., Cloud-based bioinformatics workflow platform for large-scale next-generation sequencing analyses. J. Biomed. Inform. 49:119–133, 2014. doi:10.1016/j.jbi.2014.01.005.CrossRef Liu, B., Madduri, R. K., Sotomayor, B., Chard, K., Lacinski, L., Dave, U. J., Li, J. Q., Liu, C. C., and Foster, I. T., Cloud-based bioinformatics workflow platform for large-scale next-generation sequencing analyses. J. Biomed. Inform. 49:119–133, 2014. doi:10.​1016/​j.​jbi.​2014.​01.​005.CrossRef
9.
go back to reference Santana-Quintero, L., Dingerdissen, H., Thierry-Mieg, J., Mazumder, R., and Simonyan, V., HIVE-Hexagon: high-performance, parallelized sequence alignment for next-generation sequencing data analysis. PLoS One 9(6):11, 2014. doi:10.1371/journal.pone.0099033.CrossRef Santana-Quintero, L., Dingerdissen, H., Thierry-Mieg, J., Mazumder, R., and Simonyan, V., HIVE-Hexagon: high-performance, parallelized sequence alignment for next-generation sequencing data analysis. PLoS One 9(6):11, 2014. doi:10.​1371/​journal.​pone.​0099033.CrossRef
14.
go back to reference Shim, J. M., Schneider, J., and Curlin, F. A., Patterns of user disclosure of Complementary and Alternative Medicine (CAM) use. Med. Care 52(8):704–708, 2014.CrossRef Shim, J. M., Schneider, J., and Curlin, F. A., Patterns of user disclosure of Complementary and Alternative Medicine (CAM) use. Med. Care 52(8):704–708, 2014.CrossRef
16.
go back to reference Gustafson, D. H., Hawkins, R., Boberg, E., Pingree, S., Serlin, R. E., Graziano, F., and Chan, C. L., Impact of a patient-centered, computer-based health information/support system. Am. J. Prev. Med. 16(1):1–9, 1999. doi:10.1016/s0749-3797(98)00108-1.CrossRef Gustafson, D. H., Hawkins, R., Boberg, E., Pingree, S., Serlin, R. E., Graziano, F., and Chan, C. L., Impact of a patient-centered, computer-based health information/support system. Am. J. Prev. Med. 16(1):1–9, 1999. doi:10.​1016/​s0749-3797(98)00108-1.CrossRef
17.
go back to reference Powell, J., Inglis, N., Ronnie, J., and Large, S., The characteristics and motivations of online health information seekers: cross-sectional survey and qualitative interview study. J. Med. Internet Res. 13(1):11, 2011. doi:10.2196/jmir.1600.CrossRef Powell, J., Inglis, N., Ronnie, J., and Large, S., The characteristics and motivations of online health information seekers: cross-sectional survey and qualitative interview study. J. Med. Internet Res. 13(1):11, 2011. doi:10.​2196/​jmir.​1600.CrossRef
19.
go back to reference Li, J.-S., Zhang, X.-G., Wang, H.-Q., Wang, Y., Wang, J.-M., and Shao, Q.-D., The meaningful use of EMR in Chinese hospitals: a case study on curbing antibiotic abuse. J. Med. Syst. 37(2):1–10, 2013.CrossRefMATH Li, J.-S., Zhang, X.-G., Wang, H.-Q., Wang, Y., Wang, J.-M., and Shao, Q.-D., The meaningful use of EMR in Chinese hospitals: a case study on curbing antibiotic abuse. J. Med. Syst. 37(2):1–10, 2013.CrossRefMATH
20.
go back to reference Chaudhry, B., Wang, J., Wu, S., Maglione, M., Mojica, W., Roth, E., Morton, S. C., and Shekelle, P. G., Systematic review: impact of health information technology on quality, efficiency, and costs of medical care. Ann. Intern. Med. 144(10):742–752, 2006. doi:10.7326/0003-4819-144-10-200605160-00125.CrossRef Chaudhry, B., Wang, J., Wu, S., Maglione, M., Mojica, W., Roth, E., Morton, S. C., and Shekelle, P. G., Systematic review: impact of health information technology on quality, efficiency, and costs of medical care. Ann. Intern. Med. 144(10):742–752, 2006. doi:10.​7326/​0003-4819-144-10-200605160-00125.CrossRef
21.
go back to reference Kobewka, D., Backman, C., Hendry, P., Hamstra, S. J., Suh, K. N., Code, C., and Forster, A. J., The feasibility of e-learning as a quality improvement tool. J. Eval. Clin. Pract. 20(5):606–610, 2014. doi:10.1111/jep.12169.CrossRef Kobewka, D., Backman, C., Hendry, P., Hamstra, S. J., Suh, K. N., Code, C., and Forster, A. J., The feasibility of e-learning as a quality improvement tool. J. Eval. Clin. Pract. 20(5):606–610, 2014. doi:10.​1111/​jep.​12169.CrossRef
22.
go back to reference Tian, Y., Zhou, T. S., Yao, Q., Zhang, M., and Li, J. S., Use of an agent-based simulation model to evaluate a mobile-based system for supporting emergency evacuation decision making. J. Med. Syst. 38(12):13, 2014. doi:10.1007/s10916-014-0149-3.CrossRef Tian, Y., Zhou, T. S., Yao, Q., Zhang, M., and Li, J. S., Use of an agent-based simulation model to evaluate a mobile-based system for supporting emergency evacuation decision making. J. Med. Syst. 38(12):13, 2014. doi:10.​1007/​s10916-014-0149-3.CrossRef
23.
go back to reference Deidda, M., Lupianez-Villanueva, F., Codagnone, C., and Maghiros, I., Using data envelopment analysis to analyse the efficiency of primary care units. J. Med. Syst. 38(10):10, 2014. doi:10.1007/s10916-014-0122-1.CrossRef Deidda, M., Lupianez-Villanueva, F., Codagnone, C., and Maghiros, I., Using data envelopment analysis to analyse the efficiency of primary care units. J. Med. Syst. 38(10):10, 2014. doi:10.​1007/​s10916-014-0122-1.CrossRef
27.
go back to reference Chen, Y., Pavlov, D., and Canny, J. F., Large-scale behavioral targeting. Paper presented at the Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, Paris, France, 2009. Chen, Y., Pavlov, D., and Canny, J. F., Large-scale behavioral targeting. Paper presented at the Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, Paris, France, 2009.
28.
go back to reference Ahmed, A., Low, Y., Aly, M., Josifovski, V., Smola, A. J., Scalable distributed inference of dynamic user interests for behavioral targeting. Paper presented at the Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, San Diego, California, USA, 2011 Ahmed, A., Low, Y., Aly, M., Josifovski, V., Smola, A. J., Scalable distributed inference of dynamic user interests for behavioral targeting. Paper presented at the Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, San Diego, California, USA, 2011
29.
go back to reference Kim, M., Jung, Y., Jung, D., and Hur, C., Investigating the congruence of crowdsourced information with official government data: the case of pediatric clinics. J. Med. Internet Res. 16(2):12, 2014. doi:10.2196/jmir.3078. Kim, M., Jung, Y., Jung, D., and Hur, C., Investigating the congruence of crowdsourced information with official government data: the case of pediatric clinics. J. Med. Internet Res. 16(2):12, 2014. doi:10.​2196/​jmir.​3078.
30.
go back to reference Alor-Hernandez, G., Perez-Gallardo, Y., Posada-Gomez, R., Cortes-Robles, G., Rodriguez-Gonzalez, A., and Aguilar-Laserre, A. A., iPixel: a visual content-based and semantic search engine for retrieving digitized mammograms by using collective intelligence. Inform. Health Soc. Care 37(3):159–176, 2012. doi:10.3109/17538157.2012.654840.CrossRef Alor-Hernandez, G., Perez-Gallardo, Y., Posada-Gomez, R., Cortes-Robles, G., Rodriguez-Gonzalez, A., and Aguilar-Laserre, A. A., iPixel: a visual content-based and semantic search engine for retrieving digitized mammograms by using collective intelligence. Inform. Health Soc. Care 37(3):159–176, 2012. doi:10.​3109/​17538157.​2012.​654840.CrossRef
31.
go back to reference Gagnon, M. P., Ghandour, E. K., Talla, P. K., Simonyan, D., Godin, G., Labrecque, M., Ouimet, M., and Rousseau, M., Electronic health record acceptance by physicians: testing an integrated theoretical model. J. Biomed. Inform. 48:17–27, 2014. doi:10.1016/j.jbi.2013.10.010.CrossRef Gagnon, M. P., Ghandour, E. K., Talla, P. K., Simonyan, D., Godin, G., Labrecque, M., Ouimet, M., and Rousseau, M., Electronic health record acceptance by physicians: testing an integrated theoretical model. J. Biomed. Inform. 48:17–27, 2014. doi:10.​1016/​j.​jbi.​2013.​10.​010.CrossRef
33.
go back to reference Thusoo, A., Sarma, J. S., Jain, N., Shao, Z., Chakka, P., Anthony, S., Liu, H., Wyckoff, P., and Murthy, R., Hive: a warehousing solution over a map-reduce framework. Proc. VLDB Endowment 2(2):1626–1629, 2009.CrossRef Thusoo, A., Sarma, J. S., Jain, N., Shao, Z., Chakka, P., Anthony, S., Liu, H., Wyckoff, P., and Murthy, R., Hive: a warehousing solution over a map-reduce framework. Proc. VLDB Endowment 2(2):1626–1629, 2009.CrossRef
34.
go back to reference Inmon, W. H., Building the data warehouse. Wiley, New York, 2005. Inmon, W. H., Building the data warehouse. Wiley, New York, 2005.
35.
go back to reference Giacomelli, P., Apache mahout cookbook. Packt Publishing Ltd., 2013 Giacomelli, P., Apache mahout cookbook. Packt Publishing Ltd., 2013
36.
go back to reference Bonabeau, E., Decisions 2.0: the power of collective intelligence. MIT Sloan Manag. Rev. 50(2):45–52, 2009. Bonabeau, E., Decisions 2.0: the power of collective intelligence. MIT Sloan Manag. Rev. 50(2):45–52, 2009.
37.
go back to reference Ting, K.M., Precision and recall. In: Encyclopedia of machine learning. Springer, pp 781–781, 2010. Ting, K.M., Precision and recall. In: Encyclopedia of machine learning. Springer, pp 781–781, 2010.
38.
go back to reference Yao Q, Wang Y, Li J-s Hospital information system integration based on cloud computing. In: 1st international workshop on cloud computing and information security. Atlantis Press, 2013. Yao Q, Wang Y, Li J-s Hospital information system integration based on cloud computing. In: 1st international workshop on cloud computing and information security. Atlantis Press, 2013.
39.
go back to reference Yoo, S., Kim, S., Kim, T., Baek, R.-M., Suh, C. S., Chung, C. Y., and Hwang, H., Economic analysis of cloud-based desktop virtualization implementation at a hospital. BMC Med. Inform. Decis. Mak. 12(1):119, 2012.CrossRef Yoo, S., Kim, S., Kim, T., Baek, R.-M., Suh, C. S., Chung, C. Y., and Hwang, H., Economic analysis of cloud-based desktop virtualization implementation at a hospital. BMC Med. Inform. Decis. Mak. 12(1):119, 2012.CrossRef
40.
go back to reference Yao, Q., Han, X., Ma, X.-K., Xue, Y.-F., Chen, Y.-J., and Li, J.-S., Cloud-based hospital information system as a service for grassroots healthcare institutions. J. Med. Syst. 38(9):1–7, 2014.CrossRef Yao, Q., Han, X., Ma, X.-K., Xue, Y.-F., Chen, Y.-J., and Li, J.-S., Cloud-based hospital information system as a service for grassroots healthcare institutions. J. Med. Syst. 38(9):1–7, 2014.CrossRef
Metadata
Title
Design and Development of a Medical Big Data Processing System Based on Hadoop
Authors
Qin Yao
Yu Tian
Peng-Fei Li
Li-Li Tian
Yang-Ming Qian
Jing-Song Li
Publication date
01-03-2015
Publisher
Springer US
Published in
Journal of Medical Systems / Issue 3/2015
Print ISSN: 0148-5598
Electronic ISSN: 1573-689X
DOI
https://doi.org/10.1007/s10916-015-0220-8

Other articles of this Issue 3/2015

Journal of Medical Systems 3/2015 Go to the issue