Skip to main content
Top
Published in: BMC Medical Research Methodology 1/2021

Open Access 01-12-2021 | Research article

Modelling hospital outcome: problems with endogeneity

Authors: John L. Moran, John D. Santamaria, Graeme J. Duke, The Australian & New Zealand Intensive Care Society (ANZICS) Centre for Outcomes & Resource Evaluation (CORE)

Published in: BMC Medical Research Methodology | Issue 1/2021

Login to get access

Abstract

Background

Mortality modelling in the critical care paradigm traditionally uses logistic regression, despite the availability of estimators commonly used in alternate disciplines. Little attention has been paid to covariate endogeneity and the status of non-randomized treatment assignment. Using a large registry database, various binary outcome modelling strategies and methods to account for covariate endogeneity were explored.

Methods

Patient mortality data was sourced from the Australian & New Zealand Intensive Society Adult Patient Database for 2016. Hospital mortality was modelled using logistic, probit and linear probability (LPM) models with intensive care (ICU) providers as fixed (FE) and random (RE) effects. Model comparison entailed indices of discrimination and calibration, information criteria (AIC and BIC) and binned residual analysis. Suspect covariate and ventilation treatment assignment endogeneity was identified by correlation between predictor variable and hospital mortality error terms, using the Stata™ “eprobit” estimator. Marginal effects were used to demonstrate effect estimate differences between probit and “eprobit” models.

Results

The cohort comprised 92,693 patients from 124 intensive care units (ICU) in calendar year 2016. Patients mean age was 61.8 (SD 17.5) years, 41.6% were female and APACHE III severity of illness score 54.5(25.6); 43.7% were ventilated. Of the models considered in predicting hospital mortality, logistic regression (with or without ICU FE) and RE logistic regression dominated, more so the latter using information criteria indices. The LPM suffered from many predictions outside the unit [0,1] interval and both poor discrimination and calibration. Error terms of hospital length of stay, an independent risk of death score and ventilation status were correlated with the mortality error term. Marked differences in the ventilation mortality marginal effect was demonstrated between the probit and the "eprobit" models which were scenario dependent. Endogeneity was not demonstrated for the APACHE III score.

Conclusions

Logistic regression accounting for provider effects was the preferred estimator for hospital mortality modelling. Endogeneity of covariates and treatment variables may be identified using appropriate modelling, but failure to do so yields problematic effect estimates.
Appendix
Available only for authorised users
Literature
8.
go back to reference Cameron AC, Trivedi PK. Binary outcome models. In: Microeconometrics Using Stata: Revised Edition. College Station: Stata Press; 2010. p. 459–89. Cameron AC, Trivedi PK. Binary outcome models. In: Microeconometrics Using Stata: Revised Edition. College Station: Stata Press; 2010. p. 459–89.
11.
go back to reference Moons KGM, Altman DG, Reitsma JB, Ioannidis JPA, Macaskill P, Steyerberg EW, et al. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): explanation and elaboration. Ann Intern Med. 2015;162(1):W1–W73. https://doi.org/10.7326/M14-0698.CrossRefPubMed Moons KGM, Altman DG, Reitsma JB, Ioannidis JPA, Macaskill P, Steyerberg EW, et al. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): explanation and elaboration. Ann Intern Med. 2015;162(1):W1–W73. https://​doi.​org/​10.​7326/​M14-0698.CrossRefPubMed
20.
go back to reference Hazlett C. Estimating causal effects of new treatments despite self-selection: the case of experimental medical treatments. J Causal Inference. 2019;7:1.CrossRef Hazlett C. Estimating causal effects of new treatments despite self-selection: the case of experimental medical treatments. J Causal Inference. 2019;7:1.CrossRef
26.
go back to reference Basu AP, Manning WGP. Issues for the Next Generation of Health Care Cost Analyses. Med Care. 2009;47(7_Supplement_1):S109–14.CrossRefPubMed Basu AP, Manning WGP. Issues for the Next Generation of Health Care Cost Analyses. Med Care. 2009;47(7_Supplement_1):S109–14.CrossRefPubMed
33.
go back to reference Gelman A, Hill J. Data analysis using regression and Multilelvel/ hierarchal models. New York: Cambridge University Press; 2007. Gelman A, Hill J. Data analysis using regression and Multilelvel/ hierarchal models. New York: Cambridge University Press; 2007.
34.
go back to reference Rabe-Hesketh S, Skrondal A. Random intercept models with covariates. In: Multilevel and longitudinal modeling using Stata volume 1: continuous responses. 3rd ed. College Station, TX: Stata Press; 2012. p. 123–71. Rabe-Hesketh S, Skrondal A. Random intercept models with covariates. In: Multilevel and longitudinal modeling using Stata volume 1: continuous responses. 3rd ed. College Station, TX: Stata Press; 2012. p. 123–71.
35.
go back to reference Allison PD, Williams RA, Hippel V: Better Predicted Probabilities from Linear Probability Models. Available @ https://wwwstatacom/meeting/us20/slides/us20_Allisonpdf; downloaded 15th September 2020 2020. Allison PD, Williams RA, Hippel V: Better Predicted Probabilities from Linear Probability Models. Available @ https://​wwwstatacom/​meeting/​us20/​slides/​us20_​Allisonpdf; downloaded 15th September 2020 2020.
36.
go back to reference Haggstrom GW. Logistic regression and discriminant analysis by ordinary least squares. J Bus Econ Stat. 1983;1(3):229–38. Haggstrom GW. Logistic regression and discriminant analysis by ordinary least squares. J Bus Econ Stat. 1983;1(3):229–38.
37.
go back to reference Allison PD: Better Predicted Probabilities from Linear Probability Models. Available @https://statisticalhorizonscom/better-predicted-probabilities; Downloaded 7th November 2020 2020. Allison PD: Better Predicted Probabilities from Linear Probability Models. Available @https://​statisticalhoriz​onscom/​better-predicted-probabilities; Downloaded 7th November 2020 2020.
38.
go back to reference von Hippel P, Williams R, Allison P: reg2logit -- Approximate logistic regression parameters using OLS linear regression. Avaiable @ https://econpapersrepecorg/software/bocbocode/S458865htm; Downloaded 7th November 2020 2020. von Hippel P, Williams R, Allison P: reg2logit -- Approximate logistic regression parameters using OLS linear regression. Avaiable @ https://​econpapersrepeco​rg/​software/​bocbocode/​S458865htm; Downloaded 7th November 2020 2020.
39.
go back to reference Cox NJ, Steichen T: CONCORD: Stata module for concordance correlation. Statistical Software Components S404501, Boston College Department of Economics; Version 310, revised 10 Nov 2010. Cox NJ, Steichen T: CONCORD: Stata module for concordance correlation. Statistical Software Components S404501, Boston College Department of Economics; Version 310, revised 10 Nov 2010.
42.
go back to reference Bilger M: overfit: module to calculate shrinkage statistics to measure overfitting as well as out- and in-sample predictive bias. @ http://econpapersrepecorg/scripts/searchpf?ft=overfit; Downloaded 1st March 2016. Bilger M: overfit: module to calculate shrinkage statistics to measure overfitting as well as out- and in-sample predictive bias. @ http://​econpapersrepeco​rg/​scripts/​searchpf?​ft=​overfit; Downloaded 1st March 2016.
43.
go back to reference Esnor J, Snell KI, Martins EC: ovefit: Stata module to produce calibration plot of prediction model performance. Statistical Software Components S458486, Boston College Department of Economics; revised 04 January 2020. 2020. Esnor J, Snell KI, Martins EC: ovefit: Stata module to produce calibration plot of prediction model performance. Statistical Software Components S458486, Boston College Department of Economics; revised 04 January 2020. 2020.
45.
go back to reference Gelman A, Hill J. Logistic Regression. In: Data analysis using Regression and Multilelvel/ Hierarchal Models. New York: Cambridge University Press; 2007. p. 79–108. Gelman A, Hill J. Logistic Regression. In: Data analysis using Regression and Multilelvel/ Hierarchal Models. New York: Cambridge University Press; 2007. p. 79–108.
48.
go back to reference Breen R, Karlson KB, Holm A. Interpreting and Understanding Logits, Probits, and Other Nonlinear Probability Models. In: Cook KS, Massey DS, editors. Annual Review of Sociology, vol. 44; 2018. p. 39–54. Breen R, Karlson KB, Holm A. Interpreting and Understanding Logits, Probits, and Other Nonlinear Probability Models. In: Cook KS, Massey DS, editors. Annual Review of Sociology, vol. 44; 2018. p. 39–54.
49.
go back to reference Chatla SB, Shmueli G. An Extensive Examination of Regression Models with a Binary Outcome Variable. J Assoc Inf Syst. 2017;18(4):1. Chatla SB, Shmueli G. An Extensive Examination of Regression Models with a Binary Outcome Variable. J Assoc Inf Syst. 2017;18(4):1.
51.
go back to reference Long JS, Freese J. Methods of interpretation. In: Regression Models for Categorical Dependent Variables using Stata. College Station: Stata Press; 2014. p. 133–84. Long JS, Freese J. Methods of interpretation. In: Regression Models for Categorical Dependent Variables using Stata. College Station: Stata Press; 2014. p. 133–84.
53.
go back to reference Hintze JL, Nelson RD. Violin plots: a box plot-density trace synergism. Am Stat. 1998;52(2):181–4. Hintze JL, Nelson RD. Violin plots: a box plot-density trace synergism. Am Stat. 1998;52(2):181–4.
57.
go back to reference Wolfe R, Hanley J. If we’re so different, why do we keep overlapping? When 1 plus 1 doesn't make 2. Can Med Assoc J. 2002;166(1):65–6. Wolfe R, Hanley J. If we’re so different, why do we keep overlapping? When 1 plus 1 doesn't make 2. Can Med Assoc J. 2002;166(1):65–6.
58.
go back to reference Long JS, Freese J. Models for binary outcomes: Interpretation. In: Regression Models for Categorical Dependent Variables using Stata. College Station: Stata Press; 2014. p. 227–308. Long JS, Freese J. Models for binary outcomes: Interpretation. In: Regression Models for Categorical Dependent Variables using Stata. College Station: Stata Press; 2014. p. 227–308.
59.
go back to reference Leeper TJ, Arnold J, Arel-Bundock V: margins: Marginal Effects for Model Objects: version 0.3.23. Available @ https://cranr-projectorg/web/packages/margins/indexhtml 2018. Leeper TJ, Arnold J, Arel-Bundock V: margins: Marginal Effects for Model Objects: version 0.3.23. Available @ https://​cranr-projectorg/​web/​packages/​margins/​indexhtml 2018.
60.
go back to reference Roberts MR, Whited TM: Endogeneity in Empirical Corporate Finance. Soimon School Working Paper No FR11–29; Available at SSRN: https://ssrncom/abstract=1748604 2012. Roberts MR, Whited TM: Endogeneity in Empirical Corporate Finance. Soimon School Working Paper No FR11–29; Available at SSRN: https://​ssrncom/​abstract=​1748604 2012.
62.
go back to reference Cameron AC, Trivedi PK. Endoegenous regressors. In: Microeconometircs in Stata: Revise Edition. edn. Clloege Station: Stata Press; 2010. p. 479–86. Cameron AC, Trivedi PK. Endoegenous regressors. In: Microeconometircs in Stata: Revise Edition. edn. Clloege Station: Stata Press; 2010. p. 479–86.
64.
go back to reference Odgaard-Jensen J, Vist GE, Timmer A, Kunz R, Akl EA, Schünemann H, et al. Randomisation to protect against selection bias in healthcare trials. Cochrane Database Syst Rev. 2011;4:MR000012. Odgaard-Jensen J, Vist GE, Timmer A, Kunz R, Akl EA, Schünemann H, et al. Randomisation to protect against selection bias in healthcare trials. Cochrane Database Syst Rev. 2011;4:MR000012.
67.
go back to reference StataCorp CST: Stata extended regression models reference manual release 16. Available @ https://wwwstatacom/manuals/ermpdf; Accessed 19th September 2020. StataCorp CST: Stata extended regression models reference manual release 16. Available @ https://​wwwstatacom/​manuals/​ermpdf; Accessed 19th September 2020.
69.
go back to reference Martens EP, Pestman WR, Klungel OH. Conditioning on the propensity score can result in biased estimation of common measures of treatment effect: A Monte Carlo study (p n/a) by Peter C. Austin, Paul Grootendorst, Sharon-Lise T. Normand, Geoffrey M. Anderson, Statistics in Medicine, Published Online: 16 June 2006. Stat Med. 2007;26(16):3208–10. https://doi.org/10.1002/sim.2618.CrossRefPubMed Martens EP, Pestman WR, Klungel OH. Conditioning on the propensity score can result in biased estimation of common measures of treatment effect: A Monte Carlo study (p n/a) by Peter C. Austin, Paul Grootendorst, Sharon-Lise T. Normand, Geoffrey M. Anderson, Statistics in Medicine, Published Online: 16 June 2006. Stat Med. 2007;26(16):3208–10. https://​doi.​org/​10.​1002/​sim.​2618.CrossRefPubMed
73.
go back to reference StataCorp: margins. Marginal means, predictive margins, and marginal effects. Available @ https://wwwstatacom/manuals13/rmarginspdf 2019. StataCorp: margins. Marginal means, predictive margins, and marginal effects. Available @ https://​wwwstatacom/​manuals13/​rmarginspdf 2019.
74.
go back to reference ANZICS CORE - Adult patient database: APD data dictionary: version 5.10, March 2020. Available @ https://wwwanzicscomau/adult-patient-database-apd/; downloaded 7th September 2020. ANZICS CORE - Adult patient database: APD data dictionary: version 5.10, March 2020. Available @ https://​wwwanzicscomau/​adult-patient-database-apd/​; downloaded 7th September 2020.
76.
go back to reference Vach W. Specific Regression Models. In: Regression models as a Tool in Medical research. edn. Boca Raton: CRC Press; 2013. p. 407–8. Vach W. Specific Regression Models. In: Regression models as a Tool in Medical research. edn. Boca Raton: CRC Press; 2013. p. 407–8.
77.
go back to reference Cameron AC, Trivedi PK. Comparioson of binary models and parameter estimates. In: Microeconometircs in Stata: Revise Edition. edn. Clloege Station: Stata Press; 2010. p. 465–6. Cameron AC, Trivedi PK. Comparioson of binary models and parameter estimates. In: Microeconometircs in Stata: Revise Edition. edn. Clloege Station: Stata Press; 2010. p. 465–6.
83.
go back to reference Cameron AC, Trivedi PK. Nonlinear regression methods. In: Microeconometircs in Stata: Revise Edition. edn. Clloege Station: Stata Press; 2010. p. 341–54. Cameron AC, Trivedi PK. Nonlinear regression methods. In: Microeconometircs in Stata: Revise Edition. edn. Clloege Station: Stata Press; 2010. p. 341–54.
84.
go back to reference Angrist JD, Pischke JS. Making regression make sense. In: Mostly harmless econometrics: An empiricist's companion. edn. Princeton: Princeton University Press; 2008. p. 27–110.CrossRef Angrist JD, Pischke JS. Making regression make sense. In: Mostly harmless econometrics: An empiricist's companion. edn. Princeton: Princeton University Press; 2008. p. 27–110.CrossRef
86.
go back to reference van Calster B, McLernon DJ, van Smeden M, Wynants L, Steyerberg EW. Initiative S: Calibration: the Achilles heel of predictive analytics. BMC Med. 2019;17:1.CrossRef van Calster B, McLernon DJ, van Smeden M, Wynants L, Steyerberg EW. Initiative S: Calibration: the Achilles heel of predictive analytics. BMC Med. 2019;17:1.CrossRef
88.
go back to reference Cortese G. How to use statistical models and methods for clinical prediction. Ann Transl Med. 2020;8:4.CrossRef Cortese G. How to use statistical models and methods for clinical prediction. Ann Transl Med. 2020;8:4.CrossRef
95.
96.
go back to reference Chen Y, Senturk D, Estes JP, Campos LF, Rhee CM, Dalrymple LS, et al. Performance characteristics of profiling methods and the impact of inadequate case-mix adjustment. Commun Stat Simul Comput. 2019;2019:1. Chen Y, Senturk D, Estes JP, Campos LF, Rhee CM, Dalrymple LS, et al. Performance characteristics of profiling methods and the impact of inadequate case-mix adjustment. Commun Stat Simul Comput. 2019;2019:1.
98.
go back to reference Roessler M, Schmitt J, Schoffer O. Ranking hospitals when performance and risk factors are correlated: A simulation-based comparison of risk adjustment approaches for binary outcomes. PLoS One. 2019;14:12.CrossRef Roessler M, Schmitt J, Schoffer O. Ranking hospitals when performance and risk factors are correlated: A simulation-based comparison of risk adjustment approaches for binary outcomes. PLoS One. 2019;14:12.CrossRef
100.
go back to reference Danks L, Duckett SJ: All complications should count: Using our data to make hospitals safer (Methodological supplement). Available @ https://grattaneduau/wp-content/uploads/2018/02/897-All-complications-should-count-methodological-supplementpdf; Downloaded 19th February 2021 2018. Danks L, Duckett SJ: All complications should count: Using our data to make hospitals safer (Methodological supplement). Available @ https://​grattaneduau/​wp-content/​uploads/​2018/​02/​897-All-complications-should-count-methodological-supplementpdf; Downloaded 19th February 2021 2018.
101.
go back to reference Snijders TAB, Bosker RJ. Discrete Dependent Variables. In: Multilevel Ahalysis: an introduction to basic and advanced multilevel modeling. 2nd ed. London: Sage Publications Inc; 2012. p. 289–320. Snijders TAB, Bosker RJ. Discrete Dependent Variables. In: Multilevel Ahalysis: an introduction to basic and advanced multilevel modeling. 2nd ed. London: Sage Publications Inc; 2012. p. 289–320.
103.
go back to reference Mogstad M, Romano JP, Shaikh AM, Wilhelm D: Inference on Ranks with Applications to Mobility Across Neighborhoods and Academic Achievement Across Countries. Available @ https://bfiuchicagoedu/wp-content/uploads/BFI_WP_202016pdf; Downloaded 16th Feb 2021 2020. Mogstad M, Romano JP, Shaikh AM, Wilhelm D: Inference on Ranks with Applications to Mobility Across Neighborhoods and Academic Achievement Across Countries. Available @ https://​bfiuchicagoedu/​wp-content/​uploads/​BFI_​WP_​202016pdf; Downloaded 16th Feb 2021 2020.
104.
go back to reference Uanhoro JO, Wang Y, Oconnell AA. Problems With Using Odds Ratios as Effect Sizes in Binary Logistic Regression and Alternative Approaches. J Exp Educ. 2019;1:1.CrossRef Uanhoro JO, Wang Y, Oconnell AA. Problems With Using Odds Ratios as Effect Sizes in Binary Logistic Regression and Alternative Approaches. J Exp Educ. 2019;1:1.CrossRef
109.
go back to reference Allison PD: Convergence Failures in Logistic Regression. Available @http://wwwpeoplevcuedu/~dbandyop/BIOS625/Convergence_Logisticpdf; downloaded 7 Nov 2020 2008. Allison PD: Convergence Failures in Logistic Regression. Available @http://​wwwpeoplevcuedu/​~dbandyop/​BIOS625/​Convergence_​Logisticpdf; downloaded 7 Nov 2020 2008.
115.
go back to reference Greenland S, Mansournia MA, Altman DG. Sparse data bias: a problem hiding in plain sight. BMJ. 2016;352:i1981.CrossRefPubMed Greenland S, Mansournia MA, Altman DG. Sparse data bias: a problem hiding in plain sight. BMJ. 2016;352:i1981.CrossRefPubMed
126.
go back to reference Qin D. Resurgence of the Endogeneity-backed instrumental variable methods. Econ Open Access Open Assess E-J. 2015;9:1. Qin D. Resurgence of the Endogeneity-backed instrumental variable methods. Econ Open Access Open Assess E-J. 2015;9:1.
Metadata
Title
Modelling hospital outcome: problems with endogeneity
Authors
John L. Moran
John D. Santamaria
Graeme J. Duke
The Australian & New Zealand Intensive Care Society (ANZICS) Centre for Outcomes & Resource Evaluation (CORE)
Publication date
01-12-2021
Publisher
BioMed Central
Published in
BMC Medical Research Methodology / Issue 1/2021
Electronic ISSN: 1471-2288
DOI
https://doi.org/10.1186/s12874-021-01251-8

Other articles of this Issue 1/2021

BMC Medical Research Methodology 1/2021 Go to the issue