Skip to main content
Top
Published in: BMC Medical Research Methodology 1/2021

Open Access 01-12-2021 | Technical advance

High performance implementation of the hierarchical likelihood for generalized linear mixed models: an application to estimate the potassium reference range in massive electronic health records datasets

Authors: Cristian G. Bologa, Vernon Shane Pankratz, Mark L. Unruh, Maria Eleni Roumelioti, Vallabh Shah, Saeed Kamran Shaffi, Soraya Arzhan, John Cook, Christos Argyropoulos

Published in: BMC Medical Research Methodology | Issue 1/2021

Login to get access

Abstract

Background

Converting electronic health record (EHR) entries to useful clinical inferences requires one to address the poor scalability of existing implementations of Generalized Linear Mixed Models (GLMM) for repeated measures. The major computational bottleneck concerns the numerical evaluation of multivariable integrals, which even for the simplest EHR analyses may involve millions of dimensions (one for each patient). The hierarchical likelihood (h-lik) approach to GLMMs is a methodologically rigorous framework for the estimation of GLMMs that is based on the Laplace Approximation (LA), which replaces integration with numerical optimization, and thus scales very well with dimensionality.

Methods

We present a high-performance, direct implementation of the h-lik for GLMMs in the R package TMB. Using this approach, we examined the relation of repeated serum potassium measurements and survival in the Cerner Real World Data (CRWD) EHR database. Analyzing this data requires the evaluation of an integral in over 3 million dimensions, putting this problem beyond the reach of conventional approaches. We also assessed the scalability and accuracy of LA in smaller samples of 1 and 10% size of the full dataset that were analyzed via the a) original, interconnected Generalized Linear Models (iGLM), approach to h-lik, b) Adaptive Gaussian Hermite (AGH) and c) the gold standard for multivariate integration Markov Chain Monte Carlo (MCMC).

Results

Random effects estimates generated by the LA were within 10% of the values obtained by the iGLMs, AGH and MCMC techniques. The H-lik approach was 4–30 times faster than AGH and nearly 800 times faster than MCMC. The major clinical inferences in this problem are the establishment of the non-linear relationship between the potassium level and the risk of mortality, as well as estimates of the individual and health care facility sources of variations for mortality risk in CRWD.

Conclusions

We found that the direct implementation of the h-lik offers a computationally efficient, numerically accurate approach for the analysis of extremely large, real world repeated measures data via the h-lik approach to GLMMs. The clinical inference from our analysis may guide choices of treatment thresholds for treating potassium disorders in the clinic.
Appendix
Available only for authorised users
Literature
5.
go back to reference Rajkomar A, Oren E, Chen K, Dai AM, Hajaj N, Hardt M, et al. Scalable and accurate deep learning with electronic health records. NPJ Digital Med. 2018;1:1–10.CrossRef Rajkomar A, Oren E, Chen K, Dai AM, Hajaj N, Hardt M, et al. Scalable and accurate deep learning with electronic health records. NPJ Digital Med. 2018;1:1–10.CrossRef
7.
go back to reference Perry PO. Fast moment-based estimation for hierarchical models. J R Stat Soc Ser B Stat Methodol. 2017;79:267–91.CrossRef Perry PO. Fast moment-based estimation for hierarchical models. J R Stat Soc Ser B Stat Methodol. 2017;79:267–91.CrossRef
9.
go back to reference Zhang X, Zhou Y, Ma Y, Chen B-C, Zhang L, Agarwal D. GLMix: generalized linear mixed models for large-scale response prediction. In: Proceedings of the 22Nd ACM SIGKDD international conference on knowledge discovery and data mining. New York: ACM; 2016. p. 363–72. https://doi.org/10.1145/2939672.2939684.CrossRef Zhang X, Zhou Y, Ma Y, Chen B-C, Zhang L, Agarwal D. GLMix: generalized linear mixed models for large-scale response prediction. In: Proceedings of the 22Nd ACM SIGKDD international conference on knowledge discovery and data mining. New York: ACM; 2016. p. 363–72. https://​doi.​org/​10.​1145/​2939672.​2939684.CrossRef
10.
go back to reference Lee Y, Nelder JA. Hierarchical Generalized Linear Models. J R Stat Soc Ser B Methodol. 1996;58:619–78. Lee Y, Nelder JA. Hierarchical Generalized Linear Models. J R Stat Soc Ser B Methodol. 1996;58:619–78.
11.
go back to reference Lee Y, Nelder JA. Hierarchical generalised linear models: a synthesis of generalised linear models, Random-Effect Models and Structured Dispersions Biometrika, vol. 88; 2001. p. 987–1006. Lee Y, Nelder JA. Hierarchical generalised linear models: a synthesis of generalised linear models, Random-Effect Models and Structured Dispersions Biometrika, vol. 88; 2001. p. 987–1006.
12.
go back to reference Lee Y, Nelder JA, Pawitan Y. Generalized linear models with random effects: unified analysis via H-likelihood, second edition. 2nd ed. Boca Raton: Chapman and Hall/CRC; 2017. Lee Y, Nelder JA, Pawitan Y. Generalized linear models with random effects: unified analysis via H-likelihood, second edition. 2nd ed. Boca Raton: Chapman and Hall/CRC; 2017.
15.
go back to reference Neil C, Joanna D, Sarah W, Björn D. Sever Peter S., wedel Hans, et al. effect of spironolactone on blood pressure in subjects with resistant hypertension. Hypertension. 2007;49:839–45.CrossRef Neil C, Joanna D, Sarah W, Björn D. Sever Peter S., wedel Hans, et al. effect of spironolactone on blood pressure in subjects with resistant hypertension. Hypertension. 2007;49:839–45.CrossRef
19.
go back to reference Bakris GL, Agarwal R, Anker SD, Pitt B, Ruilope LM, Rossing P, et al. Effect of Finerenone on Chronic Kidney Disease Outcomes in Type 2 Diabetes. New England J Med. 2020;0:null. Bakris GL, Agarwal R, Anker SD, Pitt B, Ruilope LM, Rossing P, et al. Effect of Finerenone on Chronic Kidney Disease Outcomes in Type 2 Diabetes. New England J Med. 2020;0:null.
22.
go back to reference Bologa C, Pankratz VS, Unruh ML, Roumelioti ME, Shah V, Shaffi SK, et al. Generalized mixed modeling in massive electronic health record databases: what is a healthy serum potassium? arXiv:191008179 [stat]. 2019. http://arxiv.org/abs/1910.08179. Accessed 16 Nov 2020. Bologa C, Pankratz VS, Unruh ML, Roumelioti ME, Shah V, Shaffi SK, et al. Generalized mixed modeling in massive electronic health record databases: what is a healthy serum potassium? arXiv:191008179 [stat]. 2019. http://​arxiv.​org/​abs/​1910.​08179. Accessed 16 Nov 2020.
23.
go back to reference Argyropoulos C, George BC, Pankratz VS, Unruh ML, Roumelioti ME, Shah V, et al. Association of Potassium Level and Mortality in Massive Health Record Databases. J Am Soc Nephrol San Diego. 2018;29:499–500. Argyropoulos C, George BC, Pankratz VS, Unruh ML, Roumelioti ME, Shah V, et al. Association of Potassium Level and Mortality in Massive Health Record Databases. J Am Soc Nephrol San Diego. 2018;29:499–500.
25.
go back to reference Liu Q, Pierce DA. A note on gauss-Hermite quadrature. Biometrika. 1994;81:624–9. Liu Q, Pierce DA. A note on gauss-Hermite quadrature. Biometrika. 1994;81:624–9.
26.
go back to reference Pinheiro JC, Bates DM. Approximations to the log-likelihood function in the nonlinear mixed-effects model. J Comput Graph Stat. 1995;4:12–35. Pinheiro JC, Bates DM. Approximations to the log-likelihood function in the nonlinear mixed-effects model. J Comput Graph Stat. 1995;4:12–35.
27.
go back to reference Pinheiro JC, Bates DM. Mixed Effects Models in S and S-Plus: Springer; 2000. Pinheiro JC, Bates DM. Mixed Effects Models in S and S-Plus: Springer; 2000.
37.
go back to reference Cox DR, Reid N. Parameter orthogonality and approximate conditional inference. J R Stat Soc Ser B Methodol. 1987;49:1–18. Cox DR, Reid N. Parameter orthogonality and approximate conditional inference. J R Stat Soc Ser B Methodol. 1987;49:1–18.
38.
go back to reference Lee Y, Nelder JA. Double hierarchical generalized linear models (with discussion). J R Stat Soc Ser C Appl Stat. 2006;55:139–85.CrossRef Lee Y, Nelder JA. Double hierarchical generalized linear models (with discussion). J R Stat Soc Ser C Appl Stat. 2006;55:139–85.CrossRef
40.
go back to reference Rönnegård L, Shen X, Alam M. hglm: A Package for Fitting Hierarchical Generalized Linear Models. The R Journal. 2010;2:20–8.CrossRef Rönnegård L, Shen X, Alam M. hglm: A Package for Fitting Hierarchical Generalized Linear Models. The R Journal. 2010;2:20–8.CrossRef
41.
go back to reference Molas M, Lesaffre E. Hierarchical generalized linear models: the R package HGLMMM. J Stat Softw. 2011;39:1–20.CrossRef Molas M, Lesaffre E. Hierarchical generalized linear models: the R package HGLMMM. J Stat Softw. 2011;39:1–20.CrossRef
46.
go back to reference Baydin AG, Pearlmutter BA, Radul AA, Siskind JM. Automatic differentiation in machine learning: a survey. J Mach Learn Res. 2018;18:1–43. Baydin AG, Pearlmutter BA, Radul AA, Siskind JM. Automatic differentiation in machine learning: a survey. J Mach Learn Res. 2018;18:1–43.
53.
go back to reference Brooks ME, Kristensen K, Benthem KJ v, Magnusson A, Berg CW, Nielsen A, et al. glmmTMB Balances Speed and Flexibility Among Packages for Zero-inflated Generalized Linear Mixed Modeling. The R Journal. 2017;9:378–400.CrossRef Brooks ME, Kristensen K, Benthem KJ v, Magnusson A, Berg CW, Nielsen A, et al. glmmTMB Balances Speed and Flexibility Among Packages for Zero-inflated Generalized Linear Mixed Modeling. The R Journal. 2017;9:378–400.CrossRef
54.
go back to reference Blackford LS, Petitet A, Pozo R, Remington K, Whaley RC, Demmel J, et al. An updated set of basic linear algebra subprograms (BLAS). ACM Trans Math Softw. 2002;28:135–51.CrossRef Blackford LS, Petitet A, Pozo R, Remington K, Whaley RC, Demmel J, et al. An updated set of basic linear algebra subprograms (BLAS). ACM Trans Math Softw. 2002;28:135–51.CrossRef
55.
58.
go back to reference Bates D, Mächler M, Bolker B, Walker S. Fitting linear mixed-effects models using lme4. J Stat Softw. 2015;67:1–48.CrossRef Bates D, Mächler M, Bolker B, Walker S. Fitting linear mixed-effects models using lme4. J Stat Softw. 2015;67:1–48.CrossRef
61.
go back to reference Hoffman MD, Gelman A. The no-U-turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo. J Mach Learn Res. 2014;15:1593–623. Hoffman MD, Gelman A. The no-U-turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo. J Mach Learn Res. 2014;15:1593–623.
67.
go back to reference Shun Z, McCullagh P. Laplace approximation of high dimensional integrals. J R Stat Soc Ser B Methodol. 1995;57:749–60. Shun Z, McCullagh P. Laplace approximation of high dimensional integrals. J R Stat Soc Ser B Methodol. 1995;57:749–60.
70.
Metadata
Title
High performance implementation of the hierarchical likelihood for generalized linear mixed models: an application to estimate the potassium reference range in massive electronic health records datasets
Authors
Cristian G. Bologa
Vernon Shane Pankratz
Mark L. Unruh
Maria Eleni Roumelioti
Vallabh Shah
Saeed Kamran Shaffi
Soraya Arzhan
John Cook
Christos Argyropoulos
Publication date
01-12-2021
Publisher
BioMed Central
Published in
BMC Medical Research Methodology / Issue 1/2021
Electronic ISSN: 1471-2288
DOI
https://doi.org/10.1186/s12874-021-01318-6

Other articles of this Issue 1/2021

BMC Medical Research Methodology 1/2021 Go to the issue