Skip to main content
Top
Published in: BMC Medical Informatics and Decision Making 1/2020

Open Access 01-12-2020 | Technical Advance

Sparse multi-output Gaussian processes for online medical time series prediction

Authors: Li-Fang Cheng, Bianca Dumitrascu, Gregory Darnell, Corey Chivers, Michael Draugelis, Kai Li, Barbara E Engelhardt

Published in: BMC Medical Informatics and Decision Making | Issue 1/2020

Login to get access

Abstract

Background

For real-time monitoring of hospital patients, high-quality inference of patients’ health status using all information available from clinical covariates and lab test results is essential to enable successful medical interventions and improve patient outcomes. Developing a computational framework that can learn from observational large-scale electronic health records (EHRs) and make accurate real-time predictions is a critical step. In this work, we develop and explore a Bayesian nonparametric model based on multi-output Gaussian process (GP) regression for hospital patient monitoring.

Methods

We propose MedGP, a statistical framework that incorporates 24 clinical covariates and supports a rich reference data set from which relationships between observed covariates may be inferred and exploited for high-quality inference of patient state over time. To do this, we develop a highly structured sparse GP kernel to enable tractable computation over tens of thousands of time points while estimating correlations among clinical covariates, patients, and periodicity in patient observations. MedGP has a number of benefits over current methods, including (i) not requiring an alignment of the time series data, (ii) quantifying confidence regions in the predictions, (iii) exploiting a vast and rich database of patients, and (iv) inferring interpretable relationships among clinical covariates.

Results

We evaluate and compare results from MedGP on the task of online prediction for three patient subgroups from two medical data sets across 8,043 patients. We find MedGP improves online prediction over baseline and state-of-the-art methods for nearly all covariates across different disease subgroups and hospitals.

Conclusions

The MedGP framework is robust and efficient in estimating the temporal dependencies from sparse and irregularly sampled medical time series data for online prediction. The publicly available code is at https://​github.​com/​bee-hive/​MedGP.
Appendix
Available only for authorised users
Literature
1.
go back to reference Murdoch TB, Detsky AS. The inevitable application of big data to health care. JAMA. 2013; 309(13):1351–2.CrossRef Murdoch TB, Detsky AS. The inevitable application of big data to health care. JAMA. 2013; 309(13):1351–2.CrossRef
2.
go back to reference Hripcsak G, Albers DJ. Next-generation phenotyping of electronic health records. J Am Med Inform Assoc. 2013; 20(1):117–21.CrossRef Hripcsak G, Albers DJ. Next-generation phenotyping of electronic health records. J Am Med Inform Assoc. 2013; 20(1):117–21.CrossRef
3.
go back to reference Ghassemi M, Celi LA, Stone DJ. State of the art review: the data revolution in critical care. Crit Care. 2015; 19(1):118.CrossRef Ghassemi M, Celi LA, Stone DJ. State of the art review: the data revolution in critical care. Crit Care. 2015; 19(1):118.CrossRef
4.
go back to reference Johnson AE, Pollard TJ, Shen L, Lehman L-wH, Feng M, Ghassemi M, Moody B, Szolovits P, Celi LA, Mark RG. MiMIC-III, a freely accessible critical care database. Sci Data. 2016;3(1). Johnson AE, Pollard TJ, Shen L, Lehman L-wH, Feng M, Ghassemi M, Moody B, Szolovits P, Celi LA, Mark RG. MiMIC-III, a freely accessible critical care database. Sci Data. 2016;3(1).
5.
go back to reference Hotchkiss RS, Karl IE. The pathophysiology and treatment of sepsis. N Engl J Med. 2003; 348(2):138–50.CrossRef Hotchkiss RS, Karl IE. The pathophysiology and treatment of sepsis. N Engl J Med. 2003; 348(2):138–50.CrossRef
6.
go back to reference Angus DC, Linde-Zwirble WT, Lidicker J, Clermont G, Carcillo J, Pinsky MR. Epidemiology of severe sepsis in the United States: analysis of incidence, outcome, and associated costs of care. Crit Care Med. 2001; 29(7):1303–10.CrossRef Angus DC, Linde-Zwirble WT, Lidicker J, Clermont G, Carcillo J, Pinsky MR. Epidemiology of severe sepsis in the United States: analysis of incidence, outcome, and associated costs of care. Crit Care Med. 2001; 29(7):1303–10.CrossRef
7.
go back to reference Kumar G, Kumar N, Taneja A, Kaleekal T, Tarima S, McGinley E, Jimenez E, Mohan A, Khan RA, Whittle J, Jacobs E, Nanchal R. Nationwide trends of severe sepsis in the 21st century (2000-2007). Chest. 2011; 140(5):1223–31.CrossRef Kumar G, Kumar N, Taneja A, Kaleekal T, Tarima S, McGinley E, Jimenez E, Mohan A, Khan RA, Whittle J, Jacobs E, Nanchal R. Nationwide trends of severe sepsis in the 21st century (2000-2007). Chest. 2011; 140(5):1223–31.CrossRef
8.
go back to reference Pierrakos C, Vincent J-L. Sepsis biomarkers: review. Crit Care. 2010; 14(1):15.CrossRef Pierrakos C, Vincent J-L. Sepsis biomarkers: review. Crit Care. 2010; 14(1):15.CrossRef
9.
go back to reference Newgard CD, Lewis RJ. Missing data: How to best account for what is not known. JAMA. 2015; 314(9):940–1.CrossRef Newgard CD, Lewis RJ. Missing data: How to best account for what is not known. JAMA. 2015; 314(9):940–1.CrossRef
10.
go back to reference Kim J, Blum JM, Scott CD. Temporal features and kernel methods for predicting sepsis in postoperative patients. Technical Report,, University of Michigan, USA. 2010. Kim J, Blum JM, Scott CD. Temporal features and kernel methods for predicting sepsis in postoperative patients. Technical Report,, University of Michigan, USA. 2010.
11.
go back to reference Ho JC, Lee CH, Ghosh J. Septic shock prediction for patients with missing data. ACM Trans Manag Inf Syst. 2014; 5(1):1–1115.CrossRef Ho JC, Lee CH, Ghosh J. Septic shock prediction for patients with missing data. ACM Trans Manag Inf Syst. 2014; 5(1):1–1115.CrossRef
12.
go back to reference Stanculescu I, Williams CKI, Freer Y. A hierarchical switching linear dynamical system applied to the detection of sepsis in neonatal condition monitoring. In: Proceedings of the Thirtieth Conference on Uncertainty in Artificial Intelligence. AUAI Press: 2014. Stanculescu I, Williams CKI, Freer Y. A hierarchical switching linear dynamical system applied to the detection of sepsis in neonatal condition monitoring. In: Proceedings of the Thirtieth Conference on Uncertainty in Artificial Intelligence. AUAI Press: 2014.
13.
go back to reference Marlin BM, Kale DC, Khemani RG, Wetzel RC. Unsupervised pattern discovery in electronic health care data using probabilistic clustering models. In: Proceedings of the 2nd ACM SIGHIT symposium on International Health Informatics - IHI ’12. ACM Press: 2012. Marlin BM, Kale DC, Khemani RG, Wetzel RC. Unsupervised pattern discovery in electronic health care data using probabilistic clustering models. In: Proceedings of the 2nd ACM SIGHIT symposium on International Health Informatics - IHI ’12. ACM Press: 2012.
14.
go back to reference Roberts S, Osborne M, Ebden M, Reece S, Gibson N, Aigrain S. Gaussian processes for time-series modelling. Philos Trans R Soc Lond A Math Phys Eng Sci. 2012;371(1984). Roberts S, Osborne M, Ebden M, Reece S, Gibson N, Aigrain S. Gaussian processes for time-series modelling. Philos Trans R Soc Lond A Math Phys Eng Sci. 2012;371(1984).
15.
go back to reference Stegle O, Fallert SV, MacKay DJC, Brage S. Gaussian process robust regression for noisy heart rate data. IEEE Trans Biomed Eng. 2008; 55(9):2143–51.CrossRef Stegle O, Fallert SV, MacKay DJC, Brage S. Gaussian process robust regression for noisy heart rate data. IEEE Trans Biomed Eng. 2008; 55(9):2143–51.CrossRef
16.
go back to reference Lasko TA, Denny JC, Levy MA. Computational phenotype discovery using unsupervised feature learning over noisy, sparse, and irregular clinical data. PloS One. 2013; 8(6):1–13.CrossRef Lasko TA, Denny JC, Levy MA. Computational phenotype discovery using unsupervised feature learning over noisy, sparse, and irregular clinical data. PloS One. 2013; 8(6):1–13.CrossRef
17.
go back to reference Schulam P, Wigley F, Saria S. Clustering longitudinal clinical marker trajectories from electronic health data: Applications to phenotyping and endotype discovery. In: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence. AAAI Press: 2015. Schulam P, Wigley F, Saria S. Clustering longitudinal clinical marker trajectories from electronic health data: Applications to phenotyping and endotype discovery. In: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence. AAAI Press: 2015.
18.
go back to reference Schulam P, Saria S. A framework for individualizing predictions of disease trajectories by exploiting multi-resolution structure. In: Advances in Neural Information Processing Systems 28. Curran Associates, Inc.: 2015. p. 748–56. Schulam P, Saria S. A framework for individualizing predictions of disease trajectories by exploiting multi-resolution structure. In: Advances in Neural Information Processing Systems 28. Curran Associates, Inc.: 2015. p. 748–56.
19.
go back to reference Nemati S, Lehman L-WH, Adams RP, Malhotra A. Discovering shared cardiovascular dynamics within a patient cohort. In: 2012 Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE: 2012. p. 6526–9. Nemati S, Lehman L-WH, Adams RP, Malhotra A. Discovering shared cardiovascular dynamics within a patient cohort. In: 2012 Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE: 2012. p. 6526–9.
20.
go back to reference Lehman L-WH, Adams RP, Mayaud L, Moody GB, Malhotra A, Mark RG, Nemati S. A physiological time series dynamics-based approach to patient monitoring and outcome prediction. IEEE J Biomed Health Inform. 2015; 19(3):1068–76.CrossRef Lehman L-WH, Adams RP, Mayaud L, Moody GB, Malhotra A, Mark RG, Nemati S. A physiological time series dynamics-based approach to patient monitoring and outcome prediction. IEEE J Biomed Health Inform. 2015; 19(3):1068–76.CrossRef
21.
go back to reference Rizopoulos D, Ghosh P. A Bayesian semiparametric multivariate joint model for multiple longitudinal outcomes and a time-to-event. Stat Med. 2011; 30(12):1366–80.CrossRef Rizopoulos D, Ghosh P. A Bayesian semiparametric multivariate joint model for multiple longitudinal outcomes and a time-to-event. Stat Med. 2011; 30(12):1366–80.CrossRef
22.
go back to reference Dahlhaus R. Graphical interaction models for multivariate time series. Metrika. 2000; 51(2):157–72.CrossRef Dahlhaus R. Graphical interaction models for multivariate time series. Metrika. 2000; 51(2):157–72.CrossRef
23.
go back to reference Tank A, Foti N, Fox E. Bayesian structure learning for stationary time series. In: Proceedings of the Thirty-first Conference on Uncertainty in Artificial Intelligence. AUAI Press: 2015. Tank A, Foti N, Fox E. Bayesian structure learning for stationary time series. In: Proceedings of the Thirty-first Conference on Uncertainty in Artificial Intelligence. AUAI Press: 2015.
24.
go back to reference Gather U, Imhoff M, Fried R. Graphical models for multivariate time series from intensive care monitoring. Stat Med. 2002; 21(18):2685–701.CrossRef Gather U, Imhoff M, Fried R. Graphical models for multivariate time series from intensive care monitoring. Stat Med. 2002; 21(18):2685–701.CrossRef
25.
go back to reference Schulam P, Saria S. Integrative analysis using coupled latent variable models for individualizing prognoses. J Mach Learn Res. 2016; 17(234):1–35. Schulam P, Saria S. Integrative analysis using coupled latent variable models for individualizing prognoses. J Mach Learn Res. 2016; 17(234):1–35.
26.
go back to reference Journel AG, Huijbregts CJ. Mining Geostatistics: Academic Press; 1978. Journel AG, Huijbregts CJ. Mining Geostatistics: Academic Press; 1978.
27.
go back to reference Goovaerts P. Geostatistics for Natural Resources Evaluation: Oxford university press; 1997. Goovaerts P. Geostatistics for Natural Resources Evaluation: Oxford university press; 1997.
28.
go back to reference Bonilla EV, Chai KM, Williams CKI. Multi-task Gaussian process prediction. In: Advances in Neural Information Processing Systems 20: 2008. p. 153–60. Bonilla EV, Chai KM, Williams CKI. Multi-task Gaussian process prediction. In: Advances in Neural Information Processing Systems 20: 2008. p. 153–60.
29.
go back to reference Teh YW, Seeger M, Jordan MI. Semiparametric latent factor models. In: Proceedings of the International Workshop on Artificial Intelligence and Statistics, vol. 10: 2005. Teh YW, Seeger M, Jordan MI. Semiparametric latent factor models. In: Proceedings of the International Workshop on Artificial Intelligence and Statistics, vol. 10: 2005.
30.
go back to reference Titsias MK, Lázaro-Gredilla M. Spike and slab variational inference for multi-task and multiple kernel learning. In: Advances in Neural Information Processing Systems 24. Curran Associates, Inc.: 2011. p. 2339–47. Titsias MK, Lázaro-Gredilla M. Spike and slab variational inference for multi-task and multiple kernel learning. In: Advances in Neural Information Processing Systems 24. Curran Associates, Inc.: 2011. p. 2339–47.
31.
go back to reference Álvarez MA, Lawrence ND. Computationally efficient convolved multiple output Gaussian processes. J Mach Learn Res. 2011; 12:1459–500. Álvarez MA, Lawrence ND. Computationally efficient convolved multiple output Gaussian processes. J Mach Learn Res. 2011; 12:1459–500.
32.
go back to reference Ghassemi M, Pimentel MAF, Naumann T, Brennan T, Clifton DA, Szolovits P, Feng M. A multivariate timeseries modeling approach to severity of illness assessment and forecasting in ICU with sparse, heterogeneous clinical data. In: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence: 2015. p. 446–53. Ghassemi M, Pimentel MAF, Naumann T, Brennan T, Clifton DA, Szolovits P, Feng M. A multivariate timeseries modeling approach to severity of illness assessment and forecasting in ICU with sparse, heterogeneous clinical data. In: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence: 2015. p. 446–53.
33.
go back to reference Dürichen R, Pimentel MAF, Clifton L, Schweikard A, Clifton DA. Multitask Gaussian processes for multivariate physiological time-series analysis. IEEE Trans Biomed Eng. 2015; 62(1):314–22.CrossRef Dürichen R, Pimentel MAF, Clifton L, Schweikard A, Clifton DA. Multitask Gaussian processes for multivariate physiological time-series analysis. IEEE Trans Biomed Eng. 2015; 62(1):314–22.CrossRef
34.
go back to reference Wilson AG, Adams RP. Gaussian process kernels for pattern discovery and extrapolation. In: Proceedings of the 30th International Conference on Machine Learning. JMLR.org: 2013. p. 1067–75. Wilson AG, Adams RP. Gaussian process kernels for pattern discovery and extrapolation. In: Proceedings of the 30th International Conference on Machine Learning. JMLR.org: 2013. p. 1067–75.
35.
go back to reference Rasmussen CE, Williams CKI. Gaussian Processes for Machine Learning: The MIT Press; 2006. Rasmussen CE, Williams CKI. Gaussian Processes for Machine Learning: The MIT Press; 2006.
36.
go back to reference Widmaier EP, Raff H, Strang KT. Vander, Sherman, Luciano’s Human Physiology: the Mechanisms of Body Function. 9th Edition. Boston: McGraw-Hill Higher Education; 2004. Widmaier EP, Raff H, Strang KT. Vander, Sherman, Luciano’s Human Physiology: the Mechanisms of Body Function. 9th Edition. Boston: McGraw-Hill Higher Education; 2004.
37.
go back to reference Polson NG, Scott JG. Shrink globally, act locally: Sparse Bayesian regularization and prediction. Bayesian Stat. 2010; 9:501–38. Polson NG, Scott JG. Shrink globally, act locally: Sparse Bayesian regularization and prediction. Bayesian Stat. 2010; 9:501–38.
38.
go back to reference Gao C, Brown CD, Engelhardt BE. A latent factor model with a mixture of sparse and dense factors to model gene expression data with confounding effects. arXiv preprint arXiv:1310.4792. 2013. Gao C, Brown CD, Engelhardt BE. A latent factor model with a mixture of sparse and dense factors to model gene expression data with confounding effects. arXiv preprint arXiv:1310.4792. 2013.
39.
go back to reference Álvarez MA, Rosasco L, Lawrence ND. Kernels for vector-valued functions: a review. Found Trends Mach Learn. 2012; 4(3):195–266.CrossRef Álvarez MA, Rosasco L, Lawrence ND. Kernels for vector-valued functions: a review. Found Trends Mach Learn. 2012; 4(3):195–266.CrossRef
40.
go back to reference Carvalho CM, Polson NG, Scott JG. The horseshoe estimator for sparse signals. Biometrika. 2010; 97(2):465–80.CrossRef Carvalho CM, Polson NG, Scott JG. The horseshoe estimator for sparse signals. Biometrika. 2010; 97(2):465–80.CrossRef
41.
go back to reference Armagan A, Clyde M, Dunson DB. Generalized beta mixtures of Gaussians. In: Advances in Neural Information Processing Systems 24. Curran Associates, Inc.: 2011. p. 523–31. Armagan A, Clyde M, Dunson DB. Generalized beta mixtures of Gaussians. In: Advances in Neural Information Processing Systems 24. Curran Associates, Inc.: 2011. p. 523–31.
42.
go back to reference Zhao S, Gao C, Mukherjee S, Engelhardt BE. Bayesian group factor analysis with structured sparsity. J Mach Learn Res. 2016; 17(196):1–47. Zhao S, Gao C, Mukherjee S, Engelhardt BE. Bayesian group factor analysis with structured sparsity. J Mach Learn Res. 2016; 17(196):1–47.
43.
go back to reference Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E. Scikit-learn: Machine learning in Python. J Mach Learn Res. 2011; 12:2825–30. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E. Scikit-learn: Machine learning in Python. J Mach Learn Res. 2011; 12:2825–30.
44.
go back to reference Silverman BW. Density Estimation for Statistics and Data Analysis: CRC press; 1986. Silverman BW. Density Estimation for Statistics and Data Analysis: CRC press; 1986.
45.
go back to reference Rumelhart DE, Hinton GE, Williams RJ. Learning representations by back-propagating errors. In Nature. 1986;:533–536. Rumelhart DE, Hinton GE, Williams RJ. Learning representations by back-propagating errors. In Nature. 1986;:533–536.
46.
go back to reference GPy. GPy: A Gaussian process framework in Python. 2012. Version 1.8.5. GPy. GPy: A Gaussian process framework in Python. 2012. Version 1.8.5.
47.
go back to reference Adams RP, MacKay DJC. Bayesian online changepoint detection. arXiv preprint arXiv:0710.3742. 2007. Adams RP, MacKay DJC. Bayesian online changepoint detection. arXiv preprint arXiv:0710.3742. 2007.
48.
go back to reference Saatçi Y, Turner R, Rasmussen CE. Gaussian process change point models. In: Proceedings of the 27th International Conference on Machine Learning: 2010. p. 927–34. Saatçi Y, Turner R, Rasmussen CE. Gaussian process change point models. In: Proceedings of the 27th International Conference on Machine Learning: 2010. p. 927–34.
49.
go back to reference Feinberg V, Cheng L-F, Li K, Engelhardt BE. Large linear multi-output gaussian process learning for time series. arXiv preprint arXiv:1705.10813. 2017. Feinberg V, Cheng L-F, Li K, Engelhardt BE. Large linear multi-output gaussian process learning for time series. arXiv preprint arXiv:1705.10813. 2017.
50.
go back to reference Nguyen TV, Bonilla EV. Collaborative multi-output Gaussian processes. In: Proceedings of the Thirtieth Conference on Uncertainty in Artificial Intelligence. AUAI Press: 2014. Nguyen TV, Bonilla EV. Collaborative multi-output Gaussian processes. In: Proceedings of the Thirtieth Conference on Uncertainty in Artificial Intelligence. AUAI Press: 2014.
51.
go back to reference Henry KE, Hager DN, Pronovost PJ, Saria S. A targeted real-time early warning score (TREWScore) for septic shock. Sci Transl Med. 2015; 7(299):ra122.CrossRef Henry KE, Hager DN, Pronovost PJ, Saria S. A targeted real-time early warning score (TREWScore) for septic shock. Sci Transl Med. 2015; 7(299):ra122.CrossRef
Metadata
Title
Sparse multi-output Gaussian processes for online medical time series prediction
Authors
Li-Fang Cheng
Bianca Dumitrascu
Gregory Darnell
Corey Chivers
Michael Draugelis
Kai Li
Barbara E Engelhardt
Publication date
01-12-2020
Publisher
BioMed Central
Published in
BMC Medical Informatics and Decision Making / Issue 1/2020
Electronic ISSN: 1472-6947
DOI
https://doi.org/10.1186/s12911-020-1069-4

Other articles of this Issue 1/2020

BMC Medical Informatics and Decision Making 1/2020 Go to the issue