Skip to main content
Top
Published in: Health Services and Outcomes Research Methodology 3-4/2017

01-12-2017

Two parts are better than one: modeling marginal means of semicontinuous data

Authors: Valerie A. Smith, Brian Neelon, Matthew L. Maciejewski, John S. Preisser

Published in: Health Services and Outcomes Research Methodology | Issue 3-4/2017

Login to get access

Abstract

In health services research, it is common to encounter semicontinuous data characterized by a point mass at zero followed by a continuous distribution with positive support. These are often analyzed using two-part mixtures that separately model the probability of use to account for the portion of the sample with zero values. Commonly, but not always, the second component models the continuous values conditional on them being positive. Prior work examining whether such two-part models are needed to appropriately draw inference from semicontinuous data compared to standard one-part regression models has found mixed results. However, prior studies have generally used only measures of model fit on a single dataset, leaving a definitive conclusion uncertain. This paper provides a detailed evaluation using simulations of the appropriateness of standard one-part generalized linear models (GLMs) compared to a recently developed marginalized two-part (MTP) model. The MTP model, unlike the one-part GLMs, explicitly accounts for the point mass at zero, yet takes the same form for the marginal mean as the commonly used GLM with log link, making the covariate effects directly comparable. We simulate data scenarios with varying sample sizes and percentages of zeros. One-part GLMs resulted in increased bias, lower than nominal coverage of confidence intervals, and inflated type I error rates, rendering them inappropriate for use with semicontinuous data. Even when distributional assumptions were violated, estimates of covariate effects and type I error rates under the MTP model remained robust.
Appendix
Available only for authorised users
Literature
go back to reference Azzalini, A.: A class of distributions which includes the normal ones. Scand. J. Stat. 12, 171–178 (1985) Azzalini, A.: A class of distributions which includes the normal ones. Scand. J. Stat. 12, 171–178 (1985)
go back to reference Basu, A., Manning, W.G.: Issues for the next generation of health care cost analyses. Med. Care 47, S109–S114 (2009)CrossRefPubMed Basu, A., Manning, W.G.: Issues for the next generation of health care cost analyses. Med. Care 47, S109–S114 (2009)CrossRefPubMed
go back to reference Basu, A., Rathouz, P.J.: Estimating marginal and incremental effects on health outcomes using flexible link and variance function models. Biostatistics 6, 93–109 (2005)CrossRefPubMed Basu, A., Rathouz, P.J.: Estimating marginal and incremental effects on health outcomes using flexible link and variance function models. Biostatistics 6, 93–109 (2005)CrossRefPubMed
go back to reference Belotti, F., Deb, P., Manning, W.G., Norton, E.C.: twopm: two-part models. Stata J. 15, 3–20 (2015) Belotti, F., Deb, P., Manning, W.G., Norton, E.C.: twopm: two-part models. Stata J. 15, 3–20 (2015)
go back to reference Blough, D.K., Madden, C.W., Hornbrook, M.C.: Modeling risk using generalized linear models. J. Health Econ. 18, 153–171 (1999)CrossRefPubMed Blough, D.K., Madden, C.W., Hornbrook, M.C.: Modeling risk using generalized linear models. J. Health Econ. 18, 153–171 (1999)CrossRefPubMed
go back to reference Buntin, M.B., Zaslavsky, A.M.: Too much ado about two-part models and transformation?: comparing methods of modeling Medicare expenditures. J. Health Econ. 23, 525–542 (2004)CrossRefPubMed Buntin, M.B., Zaslavsky, A.M.: Too much ado about two-part models and transformation?: comparing methods of modeling Medicare expenditures. J. Health Econ. 23, 525–542 (2004)CrossRefPubMed
go back to reference Chai, H.S., Bailey, K.R.: Use of log-skew-normal distribution in analysis of continuous data with a discrete component at zero. Stat. Med. 27, 3643–3655 (2008)CrossRefPubMedPubMedCentral Chai, H.S., Bailey, K.R.: Use of log-skew-normal distribution in analysis of continuous data with a discrete component at zero. Stat. Med. 27, 3643–3655 (2008)CrossRefPubMedPubMedCentral
go back to reference Cragg, J.G.: Some statistical models for limited dependent variables with application to the demand for durable goods. Econometrica 39, 829–844 (1971)CrossRef Cragg, J.G.: Some statistical models for limited dependent variables with application to the demand for durable goods. Econometrica 39, 829–844 (1971)CrossRef
go back to reference Diehr, P., Yanez, D., Ash, A., Hornbrook, M., Lin, D.: Methods for analyzing health care utilization and costs. Annu. Rev. Public Health 20, 125–144 (1999)CrossRefPubMed Diehr, P., Yanez, D., Ash, A., Hornbrook, M., Lin, D.: Methods for analyzing health care utilization and costs. Annu. Rev. Public Health 20, 125–144 (1999)CrossRefPubMed
go back to reference Duan, N., Manning Jr., W.G., Morris, C.N., Newhouse, J.P.: A comparison of alternative models for the demand of medical care. J. Bus. Econ. Stat. 1, 115–126 (1983) Duan, N., Manning Jr., W.G., Morris, C.N., Newhouse, J.P.: A comparison of alternative models for the demand of medical care. J. Bus. Econ. Stat. 1, 115–126 (1983)
go back to reference Fitzmaurice, G.M., Laird, N.M., Ware, J.H.: Applied Longitudinal Analysis. Wiley, New York (2012) Fitzmaurice, G.M., Laird, N.M., Ware, J.H.: Applied Longitudinal Analysis. Wiley, New York (2012)
go back to reference Kahwati, L.C., Lance, T.X., Jones, K.R., Kinsinger, L.S.: RE-AIM evaluation of the Veterans Health Administration’s MOVE! weight management program. Transl. Behav. Med. 1, 551–560 (2011)CrossRefPubMedPubMedCentral Kahwati, L.C., Lance, T.X., Jones, K.R., Kinsinger, L.S.: RE-AIM evaluation of the Veterans Health Administration’s MOVE! weight management program. Transl. Behav. Med. 1, 551–560 (2011)CrossRefPubMedPubMedCentral
go back to reference Kauermann, G., Carroll, R.J.: A note on the efficiency of sandwich covariance matrix estimation. J. Am. Stat. Assoc. 96, 1387–1396 (2001)CrossRef Kauermann, G., Carroll, R.J.: A note on the efficiency of sandwich covariance matrix estimation. J. Am. Stat. Assoc. 96, 1387–1396 (2001)CrossRef
go back to reference Liu, L., Cowen, M.E., Strawderman, R.L., Shih, Y.-C.T.: A flexible two-part random effects model for correlated medical costs. J. Health Econ. 29, 110–123 (2010)CrossRefPubMed Liu, L., Cowen, M.E., Strawderman, R.L., Shih, Y.-C.T.: A flexible two-part random effects model for correlated medical costs. J. Health Econ. 29, 110–123 (2010)CrossRefPubMed
go back to reference Madden, C.W., Mackay, B.P., Skillman, S.M., Ciol, M., Diehr, P.K.: Risk adjusting capitation: applications in employed and disabled populations. Health Care Manag. Sci. 3, 101–109 (2000)CrossRefPubMed Madden, C.W., Mackay, B.P., Skillman, S.M., Ciol, M., Diehr, P.K.: Risk adjusting capitation: applications in employed and disabled populations. Health Care Manag. Sci. 3, 101–109 (2000)CrossRefPubMed
go back to reference Manning, W.G., Mullahy, J.: Estimating log models: to transform or not to transform? J. Health Econ. 20, 461–494 (2001)CrossRefPubMed Manning, W.G., Mullahy, J.: Estimating log models: to transform or not to transform? J. Health Econ. 20, 461–494 (2001)CrossRefPubMed
go back to reference Manning, W.G., Morris, C.N., Newhouse, J.P., Orr, L.L., Duan, N., Keeler, E., Leibowitz, A., Marquis, K., Marquis, M., Phelps, C.: A two-part model of the demand for medical care: preliminary results from the health insurance study. In: Health, Economics, and Health Economics, pp. 103–123. North-Holland, Amsterdam (1981) Manning, W.G., Morris, C.N., Newhouse, J.P., Orr, L.L., Duan, N., Keeler, E., Leibowitz, A., Marquis, K., Marquis, M., Phelps, C.: A two-part model of the demand for medical care: preliminary results from the health insurance study. In: Health, Economics, and Health Economics, pp. 103–123. North-Holland, Amsterdam (1981)
go back to reference Manning, W.G., Basu, A., Mullahy, J.: Generalized modeling approaches to risk adjustment of skewed outcomes data. J. Health Econ. 24, 465–488 (2005)CrossRefPubMed Manning, W.G., Basu, A., Mullahy, J.: Generalized modeling approaches to risk adjustment of skewed outcomes data. J. Health Econ. 24, 465–488 (2005)CrossRefPubMed
go back to reference Mullahy, J.: Much ado about two: reconsidering retransformation and the two-part model in health econometrics. J. Health Econ. 17, 247–281 (1998)CrossRefPubMed Mullahy, J.: Much ado about two: reconsidering retransformation and the two-part model in health econometrics. J. Health Econ. 17, 247–281 (1998)CrossRefPubMed
go back to reference Neelon, B., O’Malley, A.J., Smith, V.: Modeling zero-modified count and semicontinuous data in health services research, Part 2: Case studies. Stat. Med. 35, 5094–5112 (2016)CrossRefPubMed Neelon, B., O’Malley, A.J., Smith, V.: Modeling zero-modified count and semicontinuous data in health services research, Part 2: Case studies. Stat. Med. 35, 5094–5112 (2016)CrossRefPubMed
go back to reference Park, R.E.: Estimation with heteroscedastic error terms. Econometrica 34, 888 (1966)CrossRef Park, R.E.: Estimation with heteroscedastic error terms. Econometrica 34, 888 (1966)CrossRef
go back to reference Preisser, J.S., Das, K., Long, D.L., Divaris, K.: Marginalized zero-inflated negative binomial regression with application to dental caries. Stat. Med. 35, 1722–1735 (2016)CrossRefPubMed Preisser, J.S., Das, K., Long, D.L., Divaris, K.: Marginalized zero-inflated negative binomial regression with application to dental caries. Stat. Med. 35, 1722–1735 (2016)CrossRefPubMed
go back to reference Royall, R.M.: Model robust confidence intervals using maximum likelihood estimators. Int. Stat. Rev. 54, 221–226 (1986)CrossRef Royall, R.M.: Model robust confidence intervals using maximum likelihood estimators. Int. Stat. Rev. 54, 221–226 (1986)CrossRef
go back to reference Smith, V.A., Preisser, J.S.: Direct and flexible marginal inference for semicontinuous data. Stat. Methods Med. Res. (2015). doi:10.1177/0962280215602290 (published online September 1, 2015) Smith, V.A., Preisser, J.S.: Direct and flexible marginal inference for semicontinuous data. Stat. Methods Med. Res. (2015). doi:10.​1177/​0962280215602290​ (published online September 1, 2015)
go back to reference Smith, V.A., Preisser, J.S., Neelon, B., Maciejewski, M.L.: A marginalized two-part model for semicontinuous data. Stat. Med. 33, 4891–4903 (2014)CrossRefPubMed Smith, V.A., Preisser, J.S., Neelon, B., Maciejewski, M.L.: A marginalized two-part model for semicontinuous data. Stat. Med. 33, 4891–4903 (2014)CrossRefPubMed
go back to reference Smith, V.A., Neelon, B., Preisser, J.S., Maciejewski, M.L.: A marginalized two-part model for longitudinal semicontinuous data. Stat. Methods Med. Res. (2015). doi:10.1177/0962280215592908 (published online July 7, 2015) Smith, V.A., Neelon, B., Preisser, J.S., Maciejewski, M.L.: A marginalized two-part model for longitudinal semicontinuous data. Stat. Methods Med. Res. (2015). doi:10.​1177/​0962280215592908​ (published online July 7, 2015)
Metadata
Title
Two parts are better than one: modeling marginal means of semicontinuous data
Authors
Valerie A. Smith
Brian Neelon
Matthew L. Maciejewski
John S. Preisser
Publication date
01-12-2017
Publisher
Springer US
Published in
Health Services and Outcomes Research Methodology / Issue 3-4/2017
Print ISSN: 1387-3741
Electronic ISSN: 1572-9400
DOI
https://doi.org/10.1007/s10742-017-0169-9

Other articles of this Issue 3-4/2017

Health Services and Outcomes Research Methodology 3-4/2017 Go to the issue