Skip to main content
Top
Published in: Trials 1/2023

Open Access 01-12-2023 | Telemedicine | Research

Evaluation of negative binomial and zero-inflated negative binomial models for the analysis of zero-inflated count data: application to the telemedicine for children with medical complexity trial

Authors: Kyung Hyun Lee, Claudia Pedroza, Elenir B. C. Avritscher, Ricardo A. Mosquera, Jon E. Tyson

Published in: Trials | Issue 1/2023

Login to get access

Abstract

Background

Two characteristics of commonly used outcomes in medical research are zero inflation and non-negative integers; examples include the number of hospital admissions or emergency department visits, where the majority of patients will have zero counts. Zero-inflated regression models were devised to analyze this type of data. However, the performance of zero-inflated regression models or the properties of data best suited for these analyses have not been thoroughly investigated.

Methods

We conducted a simulation study to evaluate the performance of two generalized linear models, negative binomial and zero-inflated negative binomial, for analyzing zero-inflated count data. Simulation scenarios assumed a randomized controlled trial design and varied the true underlying distribution, sample size, and rate of zero inflation. We compared the models in terms of bias, mean squared error, and coverage. Additionally, we used logistic regression to determine which data properties are most important for predicting the best-fitting model.

Results

We first found that, regardless of the rate of zero inflation, there was little difference between the conventional negative binomial and its zero-inflated counterpart in terms of bias of the marginal treatment group coefficient. Second, even when the outcome was simulated from a zero-inflated distribution, a negative binomial model was favored above its ZI counterpart in terms of the Akaike Information Criterion. Third, the mean and skewness of the non-zero part of the data were stronger predictors of model preference than the percentage of zero counts. These results were not affected by the sample size, which ranged from 60 to 800.

Conclusions

We recommend that the rate of zero inflation and overdispersion in the outcome should not be the sole and main justification for choosing zero-inflated regression models. Investigators should also consider other data characteristics when choosing a model for count data. In addition, if the performance of the NB and ZINB regression models is reasonably comparable even with ZI outcomes, we advocate the use of the NB regression model due to its clear and straightforward interpretation of the results.
Appendix
Available only for authorised users
Literature
1.
go back to reference Hilbe JM. Modeling count data. Cambridge University Press, 2014. Hilbe JM. Modeling count data. Cambridge University Press, 2014.
2.
go back to reference Yang Z, Hardin JW, Addy CL. Score tests for zero-inflation in overdispersed count data. Commun Stat Methods. 2010;39:2008–30.CrossRef Yang Z, Hardin JW, Addy CL. Score tests for zero-inflation in overdispersed count data. Commun Stat Methods. 2010;39:2008–30.CrossRef
3.
go back to reference Prost V, Gazut S, Brüls T. A zero inflated log-normal model for inference of sparse microbial association networks. PLOS Comput Biol. 2021;17:e1009089.CrossRefPubMedPubMedCentral Prost V, Gazut S, Brüls T. A zero inflated log-normal model for inference of sparse microbial association networks. PLOS Comput Biol. 2021;17:e1009089.CrossRefPubMedPubMedCentral
4.
go back to reference Yano K, Kaneko R, Komaki F. Minimax predictive density for sparse count data. Yano K, Kaneko R, Komaki F. Minimax predictive density for sparse count data.
5.
go back to reference Van Der Heijden PG, Cruyff M, Van Houwelingen HC. Estimating the size of a criminal population from police records using the truncated Poisson regression model. Stat Neerlandica. 2003;57:289–304.CrossRef Van Der Heijden PG, Cruyff M, Van Houwelingen HC. Estimating the size of a criminal population from police records using the truncated Poisson regression model. Stat Neerlandica. 2003;57:289–304.CrossRef
6.
go back to reference Daraghmi Y-A, Yi C-W, Chiang T-C. Negative binomial additive models for short-term traffic flow forecasting in urban areas. IEEE Trans Intell Transp Syst. 2013;15:784–93.CrossRef Daraghmi Y-A, Yi C-W, Chiang T-C. Negative binomial additive models for short-term traffic flow forecasting in urban areas. IEEE Trans Intell Transp Syst. 2013;15:784–93.CrossRef
7.
go back to reference Lloyd-Smith JO. Maximum likelihood estimation of the negative binomial dispersion parameter for highly overdispersed data, with applications to infectious diseases. PLoS ONE. 2007;2:e180.CrossRefPubMedPubMedCentral Lloyd-Smith JO. Maximum likelihood estimation of the negative binomial dispersion parameter for highly overdispersed data, with applications to infectious diseases. PLoS ONE. 2007;2:e180.CrossRefPubMedPubMedCentral
8.
go back to reference Ver Hoef JM, Boveng PL. Quasi‐Poisson vs. negative binomial regression: how should we model overdispersed count data? Ecology. 2007;88:2766–72.CrossRefPubMed Ver Hoef JM, Boveng PL. Quasi‐Poisson vs. negative binomial regression: how should we model overdispersed count data? Ecology. 2007;88:2766–72.CrossRefPubMed
9.
go back to reference Lambert D. Zero-inflated Poisson regression, with an application to defects in manufacturing. Technometrics. 1992;34:1–14.CrossRef Lambert D. Zero-inflated Poisson regression, with an application to defects in manufacturing. Technometrics. 1992;34:1–14.CrossRef
10.
go back to reference Greene WH. Accounting for excess zeros and sample selection in Poisson and negative binomial regression models. Greene WH. Accounting for excess zeros and sample selection in Poisson and negative binomial regression models.
11.
go back to reference Mullahy J. Specification and testing of some modified count data models. J Econom. 1986;33:341–65.CrossRef Mullahy J. Specification and testing of some modified count data models. J Econom. 1986;33:341–65.CrossRef
12.
go back to reference Du J, Park Y-T, Theera-Ampornpunt N, et al. The use of count data models in biomedical informatics evaluation research. J Am Med Inform Assoc. 2012;19:39–44.CrossRefPubMed Du J, Park Y-T, Theera-Ampornpunt N, et al. The use of count data models in biomedical informatics evaluation research. J Am Med Inform Assoc. 2012;19:39–44.CrossRefPubMed
13.
go back to reference Connelly DP, Park Y-T, Du J, et al. The impact of electronic health records on care of heart failure patients in the emergency room. J Am Med Inform Assoc. 2012;19:334–40.CrossRefPubMed Connelly DP, Park Y-T, Du J, et al. The impact of electronic health records on care of heart failure patients in the emergency room. J Am Med Inform Assoc. 2012;19:334–40.CrossRefPubMed
14.
go back to reference Speedie SM, Park Y-T, Du J, et al. The impact of electronic health records on people with diabetes in three different emergency departments. J Am Med Inform Assoc. 2014;21:e71–7.CrossRefPubMed Speedie SM, Park Y-T, Du J, et al. The impact of electronic health records on people with diabetes in three different emergency departments. J Am Med Inform Assoc. 2014;21:e71–7.CrossRefPubMed
15.
go back to reference Choi K, Chen Y, Skelly DA, et al. Bayesian model selection reveals biological origins of zero inflation in single-cell transcriptomics. Genome Biol. 2020;21:1–16. Choi K, Chen Y, Skelly DA, et al. Bayesian model selection reveals biological origins of zero inflation in single-cell transcriptomics. Genome Biol. 2020;21:1–16.
16.
go back to reference Naya H, Urioste JI, Chang Y-M, et al. A comparison between Poisson and zero-inflated Poisson regression models with an application to number of black spots in Corriedale sheep. Genet Sel Evol. 2008;40:1–16. Naya H, Urioste JI, Chang Y-M, et al. A comparison between Poisson and zero-inflated Poisson regression models with an application to number of black spots in Corriedale sheep. Genet Sel Evol. 2008;40:1–16.
17.
go back to reference Newton MA, Raftery AE. Approximate Bayesian inference with the weighted likelihood bootstrap. J R Stat Soc Ser B Methodol. 1994;56:3–26. Newton MA, Raftery AE. Approximate Bayesian inference with the weighted likelihood bootstrap. J R Stat Soc Ser B Methodol. 1994;56:3–26.
18.
go back to reference Mosquera RA, Avritscher EBC, Pedroza C, et al. Telemedicine for children with medical complexity: a randomized clinical trial. Pediatrics; 148. Mosquera RA, Avritscher EBC, Pedroza C, et al. Telemedicine for children with medical complexity: a randomized clinical trial. Pediatrics; 148.
19.
go back to reference Bjorndal KA, Bolten AB, Chaloupka M. Green turtle somatic growth dynamics: distributional regression reveals effects of differential emigration. Mar Ecol Prog Ser. 2019;616:185–95.CrossRef Bjorndal KA, Bolten AB, Chaloupka M. Green turtle somatic growth dynamics: distributional regression reveals effects of differential emigration. Mar Ecol Prog Ser. 2019;616:185–95.CrossRef
20.
go back to reference Greenwood M, Yule GU. An inquiry into the nature of frequency distributions representative of multiple happenings with particular reference to the occurrence of multiple attacks of disease or of repeated accidents. J R Stat Soc. 1920;83:255–79.CrossRef Greenwood M, Yule GU. An inquiry into the nature of frequency distributions representative of multiple happenings with particular reference to the occurrence of multiple attacks of disease or of repeated accidents. J R Stat Soc. 1920;83:255–79.CrossRef
21.
go back to reference Moghimbeigi A, Eshraghian MR, Mohammad K, et al. Multilevel zero-inflated negative binomial regression modeling for over-dispersed count data with extra zeros. J Appl Stat. 2008;35:1193–202.CrossRef Moghimbeigi A, Eshraghian MR, Mohammad K, et al. Multilevel zero-inflated negative binomial regression modeling for over-dispersed count data with extra zeros. J Appl Stat. 2008;35:1193–202.CrossRef
22.
go back to reference Preisser JS, Stamm JW, Long DL, et al. Review and recommendations for zero-inflated count regression modeling of dental caries indices in epidemiological studies. Caries Res. 2012;46:413–23.CrossRefPubMed Preisser JS, Stamm JW, Long DL, et al. Review and recommendations for zero-inflated count regression modeling of dental caries indices in epidemiological studies. Caries Res. 2012;46:413–23.CrossRefPubMed
23.
go back to reference Brooks ME, Kristensen K, van Benthem KJ, et al. Modeling zero-inflated count data with glmmTMB. BioRxiv 2017; 132753. Brooks ME, Kristensen K, van Benthem KJ, et al. Modeling zero-inflated count data with glmmTMB. BioRxiv 2017; 132753.
24.
go back to reference Tang W-X, Zhang L-F, Ai Y-Q, et al. Efficacy of internet-delivered cognitive-behavioral therapy for the management of chronic pain in children and adolescents: a systematic review and meta-analysis. Medicine (Baltimore); 97. Tang W-X, Zhang L-F, Ai Y-Q, et al. Efficacy of internet-delivered cognitive-behavioral therapy for the management of chronic pain in children and adolescents: a systematic review and meta-analysis. Medicine (Baltimore); 97.
25.
go back to reference Negi R, Sharma SK, Gaur R, et al. Efficacy of ginger in the treatment of primary dysmenorrhea: a systematic review and meta-analysis. Cureus; 13. Negi R, Sharma SK, Gaur R, et al. Efficacy of ginger in the treatment of primary dysmenorrhea: a systematic review and meta-analysis. Cureus; 13.
26.
go back to reference Nkhoma DE, Soko CJ, Bowrin P, et al. Digital interventions self-management education for type 1 and 2 diabetes: a systematic review and meta-analysis. Comput Methods Programs Biomed. 2021;210: 106370.CrossRefPubMed Nkhoma DE, Soko CJ, Bowrin P, et al. Digital interventions self-management education for type 1 and 2 diabetes: a systematic review and meta-analysis. Comput Methods Programs Biomed. 2021;210: 106370.CrossRefPubMed
27.
go back to reference Janicke DM, Mitchell TB, Basch MC, et al. Meta-analysis of lifestyle modification interventions addressing overweight and obesity in preschool-age children. Health Psychol. 2021;40:631.CrossRefPubMed Janicke DM, Mitchell TB, Basch MC, et al. Meta-analysis of lifestyle modification interventions addressing overweight and obesity in preschool-age children. Health Psychol. 2021;40:631.CrossRefPubMed
28.
go back to reference Li J, Liu Y, Jiang J, et al. Effect of telehealth interventions on quality of life in cancer survivors: a systematic review and meta-analysis of randomized controlled trials. Int J Nurs Stud. 2021;122: 103970.CrossRefPubMed Li J, Liu Y, Jiang J, et al. Effect of telehealth interventions on quality of life in cancer survivors: a systematic review and meta-analysis of randomized controlled trials. Int J Nurs Stud. 2021;122: 103970.CrossRefPubMed
29.
go back to reference Stryhn H, Christensen J. Confidence intervals by the profile likelihood method, with applications in veterinary epidemiology. 2003. Stryhn H, Christensen J. Confidence intervals by the profile likelihood method, with applications in veterinary epidemiology. 2003.
30.
go back to reference Jones RH. Bayesian information criterion for longitudinal and clustered data. Stat Med. 2011;30:3050–6.CrossRefPubMed Jones RH. Bayesian information criterion for longitudinal and clustered data. Stat Med. 2011;30:3050–6.CrossRefPubMed
31.
go back to reference Vuong QH. Likelihood ratio tests for model selection and non-nested hypotheses. Econom J Econom Soc 1989; 307–333. Vuong QH. Likelihood ratio tests for model selection and non-nested hypotheses. Econom J Econom Soc 1989; 307–333.
32.
go back to reference Tang Y, Tang W. Testing modified zeros for Poisson regression models. Stat Methods Med Res. 2019;28:3123–41.CrossRefPubMed Tang Y, Tang W. Testing modified zeros for Poisson regression models. Stat Methods Med Res. 2019;28:3123–41.CrossRefPubMed
33.
go back to reference He H, Zhang H, Ye P, et al. A test of inflated zeros for Poisson regression models. Stat Methods Med Res. 2019;28:1157–69.CrossRefPubMed He H, Zhang H, Ye P, et al. A test of inflated zeros for Poisson regression models. Stat Methods Med Res. 2019;28:1157–69.CrossRefPubMed
34.
go back to reference Wilson P. The misuse of the Vuong test for non-nested models to test for zero-inflation. Econ Lett. 2015;127:51–3.CrossRef Wilson P. The misuse of the Vuong test for non-nested models to test for zero-inflation. Econ Lett. 2015;127:51–3.CrossRef
35.
go back to reference Desmarais BA, Harden JJ. Testing for zero inflation in count models: bias correction for the Vuong test. Stata J. 2013;13:810–35.CrossRef Desmarais BA, Harden JJ. Testing for zero inflation in count models: bias correction for the Vuong test. Stata J. 2013;13:810–35.CrossRef
36.
go back to reference Merkle E, You D, Schneider L, et al. Package ‘nonnest2.’ Psychol Methods. 2018;21:151–63.CrossRef Merkle E, You D, Schneider L, et al. Package ‘nonnest2.’ Psychol Methods. 2018;21:151–63.CrossRef
37.
go back to reference Hand D. Solving the right problem. Bull Inst Math Stat; 43. Hand D. Solving the right problem. Bull Inst Math Stat; 43.
38.
go back to reference Hu M-C, Pavlicova M, Nunes EV. Zero-inflated and hurdle models of count data with extra zeros: examples from an HIV-risk reduction intervention trial. Am J Drug Alcohol Abuse. 2011;37:367–75.CrossRefPubMedPubMedCentral Hu M-C, Pavlicova M, Nunes EV. Zero-inflated and hurdle models of count data with extra zeros: examples from an HIV-risk reduction intervention trial. Am J Drug Alcohol Abuse. 2011;37:367–75.CrossRefPubMedPubMedCentral
39.
go back to reference Baughman A. Mixture model framework facilitates understanding of zero-inflated and hurdle models for count data. J Biopharm Stat. 2007;17:943–6.CrossRefPubMed Baughman A. Mixture model framework facilitates understanding of zero-inflated and hurdle models for count data. J Biopharm Stat. 2007;17:943–6.CrossRefPubMed
Metadata
Title
Evaluation of negative binomial and zero-inflated negative binomial models for the analysis of zero-inflated count data: application to the telemedicine for children with medical complexity trial
Authors
Kyung Hyun Lee
Claudia Pedroza
Elenir B. C. Avritscher
Ricardo A. Mosquera
Jon E. Tyson
Publication date
01-12-2023
Publisher
BioMed Central
Keyword
Telemedicine
Published in
Trials / Issue 1/2023
Electronic ISSN: 1745-6215
DOI
https://doi.org/10.1186/s13063-023-07648-8

Other articles of this Issue 1/2023

Trials 1/2023 Go to the issue