Top

Published in:

Open Access 01-12-2021 | Technical advance

High performance implementation of the hierarchical likelihood for generalized linear mixed models: an application to estimate the potassium reference range in massive electronic health records datasets

Authors: Cristian G. Bologa, Vernon Shane Pankratz, Mark L. Unruh, Maria Eleni Roumelioti, Vallabh Shah, Saeed Kamran Shaffi, Soraya Arzhan, John Cook, Christos Argyropoulos

Published in: BMC Medical Research Methodology | Issue 1/2021

Abstract

Background

Converting electronic health record (EHR) entries to useful clinical inferences requires one to address the poor scalability of existing implementations of Generalized Linear Mixed Models (GLMM) for repeated measures. The major computational bottleneck concerns the numerical evaluation of multivariable integrals, which even for the simplest EHR analyses may involve millions of dimensions (one for each patient). The hierarchical likelihood (h-lik) approach to GLMMs is a methodologically rigorous framework for the estimation of GLMMs that is based on the Laplace Approximation (LA), which replaces integration with numerical optimization, and thus scales very well with dimensionality.

Methods

We present a high-performance, direct implementation of the h-lik for GLMMs in the R package TMB. Using this approach, we examined the relation of repeated serum potassium measurements and survival in the Cerner Real World Data (CRWD) EHR database. Analyzing this data requires the evaluation of an integral in over 3 million dimensions, putting this problem beyond the reach of conventional approaches. We also assessed the scalability and accuracy of LA in smaller samples of 1 and 10% size of the full dataset that were analyzed via the a) original, interconnected Generalized Linear Models (iGLM), approach to h-lik, b) Adaptive Gaussian Hermite (AGH) and c) the gold standard for multivariate integration Markov Chain Monte Carlo (MCMC).

Results

Random effects estimates generated by the LA were within 10% of the values obtained by the iGLMs, AGH and MCMC techniques. The H-lik approach was 4–30 times faster than AGH and nearly 800 times faster than MCMC. The major clinical inferences in this problem are the establishment of the non-linear relationship between the potassium level and the risk of mortality, as well as estimates of the individual and health care facility sources of variations for mortality risk in CRWD.

Conclusions

We found that the direct implementation of the h-lik offers a computationally efficient, numerically accurate approach for the analysis of extremely large, real world repeated measures data via the h-lik approach to GLMMs. The clinical inference from our analysis may guide choices of treatment thresholds for treating potassium disorders in the clinic.

Available only for authorised users

Bates DW, Saria S, Ohno-Machado L, Shah A, Escobar G. Big data in health care: using analytics to identify and manage high-risk and high-cost patients. Health Aff. 2014;33(7):1123–31. https://doi.org/10.1377/hlthaff.2014.0041.CrossRef

Goldstein BA, Navar AM, Pencina MJ, Ioannidis JPA. Opportunities and challenges in developing risk prediction models with electronic health records data: a systematic review. J Am Med Inform Assoc. 2017;24(1):198–208. https://doi.org/10.1093/jamia/ocw042.CrossRefPubMed

Krumholz HM. Big data and new knowledge in medicine: the thinking, training, and tools needed for a learning health system. Health Aff. 2014;33(7):1163–70. https://doi.org/10.1377/hlthaff.2014.0053.CrossRef

Silverio A, Cavallo P, De Rosa R, Galasso G. Big Health Data and Cardiovascular Diseases: A Challenge for Research, an Opportunity for Clinical Care. Front Med (Lausanne). 2019;6. https://doi.org/10.3389/fmed.2019.00036.

Rajkomar A, Oren E, Chen K, Dai AM, Hajaj N, Hardt M, et al. Scalable and accurate deep learning with electronic health records. NPJ Digital Med. 2018;1:1–10.CrossRef

Gebregziabher M, Egede L, Gilbert GE, Hunt K, Nietert PJ, Mauldin P. Fitting parametric random effects models in very large data sets with application to VHA national data. BMC Med Res Methodol. 2012;12(1):163. https://doi.org/10.1186/1471-2288-12-163.CrossRefPubMedPubMedCentral

Perry PO. Fast moment-based estimation for hierarchical models. J R Stat Soc Ser B Stat Methodol. 2017;79:267–91.CrossRef

Lee JYL, Brown JJ, Ryan LM. Sufficiency revisited: rethinking statistical algorithms in the big data era. Am Stat. 2017;71(3):202–8. https://doi.org/10.1080/00031305.2016.1255659.CrossRef

Zhang X, Zhou Y, Ma Y, Chen B-C, Zhang L, Agarwal D. GLMix: generalized linear mixed models for large-scale response prediction. In: Proceedings of the 22Nd ACM SIGKDD international conference on knowledge discovery and data mining. New York: ACM; 2016. p. 363–72. https://doi.org/10.1145/2939672.2939684.CrossRef

10.

Lee Y, Nelder JA. Hierarchical Generalized Linear Models. J R Stat Soc Ser B Methodol. 1996;58:619–78.

11.

Lee Y, Nelder JA. Hierarchical generalised linear models: a synthesis of generalised linear models, Random-Effect Models and Structured Dispersions Biometrika, vol. 88; 2001. p. 987–1006.

12.

Lee Y, Nelder JA, Pawitan Y. Generalized linear models with random effects: unified analysis via H-likelihood, second edition. 2nd ed. Boca Raton: Chapman and Hall/CRC; 2017.

13.

Nilsson E, Gasparini A, Ärnlöv J, Xu H, Henriksson KM, Coresh J, et al. Incidence and determinants of hyperkalemia and hypokalemia in a large healthcare system. Int J Cardiol. 2017;245:277–84. https://doi.org/10.1016/j.ijcard.2017.07.035.CrossRefPubMed

14.

Luo J, Brunelli SM, Jensen DE, Yang A. Association between serum potassium and outcomes in patients with reduced kidney function. CJASN. 2016;11(1):90–100. https://doi.org/10.2215/CJN.01730215.CrossRefPubMed

15.

Neil C, Joanna D, Sarah W, Björn D. Sever Peter S., wedel Hans, et al. effect of spironolactone on blood pressure in subjects with resistant hypertension. Hypertension. 2007;49:839–45.CrossRef

16.

Zannad F, McMurray JJV, Krum H, van Veldhuisen DJ, Swedberg K, Shi H, et al. Eplerenone in Patients with Systolic Heart Failure and Mild Symptoms; 2011. https://doi.org/10.1056/NEJMoa1009492.CrossRef

17.

Pitt B, Zannad F, Remme WJ, Cody R, Castaigne A, Perez A, et al. The Effect of Spironolactone on Morbidity and Mortality in Patients with Severe Heart Failure. 2008. https://doi.org/10.1056/NEJM199909023411001.

18.

Linde C, Qin L, Bakhai A, Furuland H, Evans M, Ayoubkhani D, et al. Serum potassium and clinical outcomes in heart failure patients: results of risk calculations in 21 334 patients in the UK. ESC Heart Fail. 2019;6(2):280–90. https://doi.org/10.1002/ehf2.12402.CrossRefPubMedPubMedCentral

19.

Bakris GL, Agarwal R, Anker SD, Pitt B, Ruilope LM, Rossing P, et al. Effect of Finerenone on Chronic Kidney Disease Outcomes in Type 2 Diabetes. New England J Med. 2020;0:null.

20.

Navaneethan SD, Nigwekar SU, Sehgal AR, Strippoli GF. Aldosterone antagonists for preventing the progression of chronic kidney disease. Cochrane Database Syst Rev. 2009;(3):CD007004. https://doi.org/10.1002/14651858.CD007004.pub2.

21.

Trevisan M, de Deco P, Xu H, Evans M, Lindholm B, Bellocco R, et al. Incidence, predictors and clinical management of hyperkalaemia in new users of mineralocorticoid receptor antagonists. Eur J Heart Fail. 2018;20(8):1217–26. https://doi.org/10.1002/ejhf.1199.CrossRefPubMed

22.

Bologa C, Pankratz VS, Unruh ML, Roumelioti ME, Shah V, Shaffi SK, et al. Generalized mixed modeling in massive electronic health record databases: what is a healthy serum potassium? arXiv:191008179 [stat]. 2019. http://arxiv.org/abs/1910.08179. Accessed 16 Nov 2020.

23.

Argyropoulos C, George BC, Pankratz VS, Unruh ML, Roumelioti ME, Shah V, et al. Association of Potassium Level and Mortality in Massive Health Record Databases. J Am Soc Nephrol San Diego. 2018;29:499–500.

24.

Argyropoulos C, Unruh ML. Analysis of time to event outcomes in randomized controlled trials by generalized additive models. PLoS One. 2015;10(4):e0123784. https://doi.org/10.1371/journal.pone.0123784.CrossRefPubMedPubMedCentral

25.

Liu Q, Pierce DA. A note on gauss-Hermite quadrature. Biometrika. 1994;81:624–9.

26.

Pinheiro JC, Bates DM. Approximations to the log-likelihood function in the nonlinear mixed-effects model. J Comput Graph Stat. 1995;4:12–35.

27.

Pinheiro JC, Bates DM. Mixed Effects Models in S and S-Plus: Springer; 2000.

28.

Pinheiro JC, Chao EC. Efficient Laplacian and adaptive Gaussian quadrature algorithms for multilevel generalized linear mixed models. J Comput Graph Stat. 2006;15(1):58–81. https://doi.org/10.1198/106186006X96962.CrossRef

29.

Liu Q. Laplace approximations to likelihood functions for generalized linear mixed models. 1994. https://ir.library.oregonstate.edu/concern/graduate_thesis_or_dissertations/b8515r59n?locale=en.

30.

Wolfinger R. Laplace’s approximation for nonlinear mixed models. Biometrika. 1993;80(4):791–5. https://doi.org/10.1093/biomet/80.4.791.CrossRef

31.

Skinner L. Note on the asymptotic behavior of multidimensional Laplace integrals. SIAM J Math Anal. 1980;11(5):911–7. https://doi.org/10.1137/0511081.CrossRef

32.

Collins D. The performance of estimation methods for generalized linear mixed models: University of Wollongong; 2008. https://ro.uow.edu.au/theses/1737

33.

Capanu M, Gönen M, Begg CB. An assessment of estimation methods for generalized linear mixed models with binary outcomes. Stat Med. 2013;32(26):4550–66. https://doi.org/10.1002/sim.5866.CrossRefPubMed

34.

McGilchrist CA, Yau KKW. The derivation of blup, ML, REML estimation methods for generalised linear mixed models. Commun Stat Theory Methods. 1995;24(12):2963–80. https://doi.org/10.1080/03610929508831663.CrossRef

35.

Noh M, Lee Y. REML estimation for binary data in GLMMs. J Multivar Anal. 2007;98(5):896–915. https://doi.org/10.1016/j.jmva.2006.11.009.CrossRef

36.

Lee W, Lee Y. Modifications of REML algorithm for HGLMs. Stat Comput. 2012;22(4):959–66. https://doi.org/10.1007/s11222-011-9265-9.CrossRef

37.

Cox DR, Reid N. Parameter orthogonality and approximate conditional inference. J R Stat Soc Ser B Methodol. 1987;49:1–18.

38.

Lee Y, Nelder JA. Double hierarchical generalized linear models (with discussion). J R Stat Soc Ser C Appl Stat. 2006;55:139–85.CrossRef

39.

Nelder JA, Pregibon D. An extended quasi-likelihood function. Biometrika. 1987;74(2):221–32. https://doi.org/10.1093/biomet/74.2.221.CrossRef

40.

Rönnegård L, Shen X, Alam M. hglm: A Package for Fitting Hierarchical Generalized Linear Models. The R Journal. 2010;2:20–8.CrossRef

41.

Molas M, Lesaffre E. Hierarchical generalized linear models: the R package HGLMMM. J Stat Softw. 2011;39:1–20.CrossRef

42.

Beck A, Tetruashvili L. On the convergence of block coordinate descent type methods. SIAM J Optim. 2013;23(4):2037–60. https://doi.org/10.1137/120887679.CrossRef

43.

Lange K, Chi EC, Zhou H. A brief survey of modern optimization for statisticians: modern optimization for statisticians. Int Stat Rev. 2014;82(1):46–70. https://doi.org/10.1111/insr.12022.CrossRefPubMedPubMedCentral

44.

Andreas G, Andrea W. Evaluating derivatives. Society for Industrial and Applied Mathematics; 2008. https://doi.org/10.1137/1.9780898717761.CrossRef

45.

Bartholomew-Biggs M, Brown S, Christianson B, Dixon L. Automatic differentiation of algorithms. J Comput Appl Math. 2000;124(1-2):171–90. https://doi.org/10.1016/S0377-0427(00)00422-2.CrossRef

46.

Baydin AG, Pearlmutter BA, Radul AA, Siskind JM. Automatic differentiation in machine learning: a survey. J Mach Learn Res. 2018;18:1–43.

47.

Gebremedhin AH, Manne F, Pothen A. What color is your Jacobian? Graph coloring for computing derivatives. SIAM Rev. 2005;47(4):629–705. https://doi.org/10.1137/S0036144504444711.CrossRef

48.

Coleman TF, Moré JJ. Estimation of sparse hessian matrices and graph coloring problems. Math Program. 1984;28(3):243–70. https://doi.org/10.1007/BF02612334.CrossRef

49.

Skaug HJ. Automatic differentiation to facilitate maximum likelihood estimation in nonlinear random effects models. J Comput Graph Stat. 2002;11(2):458–70. https://doi.org/10.1198/106186002760180617.CrossRef

50.

Skaug HJ, Fournier DA. Automatic approximation of the marginal likelihood in non-Gaussian hierarchical models. Comput Stat Data Anal. 2006;51(2):699–709. https://doi.org/10.1016/j.csda.2006.03.005.CrossRef

51.

Kristensen K, Nielsen A, Berg CW, Skaug H, Bell BM. TMB: Automatic Differentiation and Laplace Approximation. J Stat Softw. 2016;70. https://doi.org/10.18637/jss.v070.i05.

52.

Yun S, Lee Y. Comparison of hierarchical and marginal likelihood estimators for binary outcomes. Comput Stat Data Anal. 2004;45(3):639–50. https://doi.org/10.1016/S0167-9473(03)00033-1.CrossRef

53.

Brooks ME, Kristensen K, Benthem KJ v, Magnusson A, Berg CW, Nielsen A, et al. glmmTMB Balances Speed and Flexibility Among Packages for Zero-inflated Generalized Linear Mixed Modeling. The R Journal. 2017;9:378–400.CrossRef

54.

Blackford LS, Petitet A, Pozo R, Remington K, Whaley RC, Demmel J, et al. An updated set of basic linear algebra subprograms (BLAS). ACM Trans Math Softw. 2002;28:135–51.CrossRef

55.

Meng X-L. Decoding the H-likelihood. Stat Sci. 2009;24:280–93.CrossRef

56.

Bender A, Groll A, Scheipl F. A generalized additive model approach to time-to-event analysis. Statistical Modelling. 2018;18(3-4):299-321. https://doi.org/10.1177/1471082X17748083.

57.

Burton A, Altman DG, Royston P, Holder RL. The design of simulation studies in medical statistics. Stat Med. 2006;25(24):4279–92. https://doi.org/10.1002/sim.2673.CrossRefPubMed

58.

Bates D, Mächler M, Bolker B, Walker S. Fitting linear mixed-effects models using lme4. J Stat Softw. 2015;67:1–48.CrossRef

59.

Stan Development Team. Stan Modeling Language Users Guide and Reference Manual, Version 2.25. 2019. http://mc-stan.org/.

60.

Stan Development Team. RStan: the R interface to Stan. 2020. http://mc-stan.org/.

61.

Hoffman MD, Gelman A. The no-U-turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo. J Mach Learn Res. 2014;15:1593–623.

62.

Ruppert D, Wand MP, Carroll RJ. Semiparametric regression during 2003–2007. Electron J Stat. 2009;3:1193–256. https://doi.org/10.1214/09-EJS525.CrossRefPubMedPubMedCentral

63.

Vonesh EF. A note on the use of Laplace’s approximation for nonlinear mixed-effects models. Biometrika. 1996;83(2):447–52. https://doi.org/10.1093/biomet/83.2.447.CrossRef

64.

Breslow NE, Lin X. Bias correction in generalised linear mixed models with a single component of dispersion. Biometrika. 1995;82(1):81–91. https://doi.org/10.1093/biomet/82.1.81.CrossRef

65.

Lin X, Breslow NE. Bias correction in generalized linear mixed models with multiple components of dispersion. J Am Stat Assoc. 1996;91(435):1007–16. https://doi.org/10.1080/01621459.1996.10476971.CrossRef

66.

Shun Z. Another look at the salamander mating data: a modified Laplace approximation approach. J Am Stat Assoc. 1997;92(437):341–9. https://doi.org/10.1080/01621459.1997.10473632.CrossRef

67.

Shun Z, McCullagh P. Laplace approximation of high dimensional integrals. J R Stat Soc Ser B Methodol. 1995;57:749–60.

68.

Kirwin WD. Higher asymptotics of Laplace’s approximation. Asymptot Anal. 2010;70(3-4):231–48. https://doi.org/10.3233/ASY-2010-1016.CrossRef

69.

Lee W, Lim J, Lee Y, del Castillo J. The hierarchical-likelihood approach to autoregressive stochastic volatility models. Comput Stat Data Anal. 2011;55(1):248–60. https://doi.org/10.1016/j.csda.2010.04.014.CrossRef

70.

Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. 2015. https://www.tensorflow.org/.

Title: High performance implementation of the hierarchical likelihood for generalized linear mixed models: an application to estimate the potassium reference range in massive electronic health records datasets
Authors: Cristian G. Bologa
Vernon Shane Pankratz
Mark L. Unruh
Maria Eleni Roumelioti
Vallabh Shah
Saeed Kamran Shaffi
Soraya Arzhan
John Cook
Christos Argyropoulos
Publication date: 01-12-2021
Publisher: BioMed Central
Published in: BMC Medical Research Methodology / Issue 1/2021
Electronic ISSN: 1471-2288
DOI: https://doi.org/10.1186/s12874-021-01318-6

Keynote webinar | Spotlight on medication adherence

Springer Medicine

High performance implementation of the hierarchical likelihood for generalized linear mixed models: an application to estimate the potassium reference range in massive electronic health records datasets

Abstract

Background

Methods

Results

Conclusions

Keynote webinar | Spotlight on medication adherence

Springer Medicine

Abstract

Background

Methods

Results

Conclusions

Please log in to get access to this content

Other articles of this Issue 1/2021

A new likelihood model for analyses of pharmacoepidemiologic case–control studies which avoids decision rules for determining latent exposure status

Estimating restricted mean survival time and expected life-years lost in the presence of competing risks within flexible parametric survival models

Early experience with an opt-in research register - Scottish Health Research Register (SHARE): a multi-method evaluation of participant recruitment performance

Feasibility of a hybrid clinical trial for respiratory virus detection in toddlers during the influenza season

Ordinal outcome analysis improves the detection of between-hospital differences in outcome

An improved method for the effect estimation of the intermediate event on the outcome based on the susceptible pre-identification