Skip to main content
Top
Published in: BMC Medical Research Methodology 1/2018

Open Access 01-12-2018 | Research article

Heckman imputation models for binary or continuous MNAR outcomes and MAR predictors

Authors: Jacques-Emmanuel Galimard, Sylvie Chevret, Emmanuel Curis, Matthieu Resche-Rigon

Published in: BMC Medical Research Methodology | Issue 1/2018

Login to get access

Abstract

Background

Multiple imputation by chained equations (MICE) requires specifying a suitable conditional imputation model for each incomplete variable and then iteratively imputes the missing values. In the presence of missing not at random (MNAR) outcomes, valid statistical inference often requires joint models for missing observations and their indicators of missingness. In this study, we derived an imputation model for missing binary data with MNAR mechanism from Heckman’s model using a one-step maximum likelihood estimator. We applied this approach to improve a previously developed approach for MNAR continuous outcomes using Heckman’s model and a two-step estimator. These models allow us to use a MICE process and can thus also handle missing at random (MAR) predictors in the same MICE process.

Methods

We simulated 1000 datasets of 500 cases. We generated the following missing data mechanisms on 30% of the outcomes: MAR mechanism, weak MNAR mechanism, and strong MNAR mechanism. We then resimulated the first three cases and added an additional 30% of MAR data on a predictor, resulting in 50% of complete cases. We evaluated and compared the performance of the developed approach to that of a complete case approach and classical Heckman’s model estimates.

Results

With MNAR outcomes, only methods using Heckman’s model were unbiased, and with a MAR predictor, the developed imputation approach outperformed all the other approaches.

Conclusions

In the presence of MAR predictors, we proposed a simple approach to address MNAR binary or continuous outcomes under a Heckman assumption in a MICE procedure.
Appendix
Available only for authorised users
Literature
1.
go back to reference Little RJ, Rubin DB. Statistical Analysis with Missing Data. New York: Wiley; 2002.CrossRef Little RJ, Rubin DB. Statistical Analysis with Missing Data. New York: Wiley; 2002.CrossRef
2.
go back to reference van Buuren S. Flexible Imputation of Missing Data. Boca Raton: CRC press; 2012.CrossRef van Buuren S. Flexible Imputation of Missing Data. Boca Raton: CRC press; 2012.CrossRef
3.
go back to reference Thijs H, Molenberghs G, Michiels B, Verbeke G, Curran D. Strategies to fit pattern-mixture models. Biostatistics. 2002; 3(2):245–65.CrossRefPubMed Thijs H, Molenberghs G, Michiels B, Verbeke G, Curran D. Strategies to fit pattern-mixture models. Biostatistics. 2002; 3(2):245–65.CrossRefPubMed
4.
go back to reference Fitzmaurice GM, Kenward MG, Molenberghs G, Verbeke G, Tsiatis AA. Missing data: Introduction and statistical preliminaries. In: Handbook of Missing Data Methodology. Boca Raton: Chapman and Hall/CRC Press: 2014. p. 3–22. Fitzmaurice GM, Kenward MG, Molenberghs G, Verbeke G, Tsiatis AA. Missing data: Introduction and statistical preliminaries. In: Handbook of Missing Data Methodology. Boca Raton: Chapman and Hall/CRC Press: 2014. p. 3–22.
5.
go back to reference Little RJ. Pattern-mixture models for multivariate incomplete data. J Am Stat Assoc. 1993; 88(421):125–34. Little RJ. Pattern-mixture models for multivariate incomplete data. J Am Stat Assoc. 1993; 88(421):125–34.
6.
go back to reference Rubin DB. Formalizing subjective notions about the effect of nonrespondents in sample surveys. J Am Stat Assoc. 1977; 72(359):538–43.CrossRef Rubin DB. Formalizing subjective notions about the effect of nonrespondents in sample surveys. J Am Stat Assoc. 1977; 72(359):538–43.CrossRef
7.
go back to reference Glynn RJ, Laird NM, Rubin DB. Selection modeling versus mixture modeling with nonignorable nonresponse. In: Drawing Inferences from Self-selected Samples. New York: Springer: 1986. p. 115–42. Glynn RJ, Laird NM, Rubin DB. Selection modeling versus mixture modeling with nonignorable nonresponse. In: Drawing Inferences from Self-selected Samples. New York: Springer: 1986. p. 115–42.
8.
go back to reference van Buuren S, Boshuizen HC, Knook DL. Multiple imputation of missing blood pressure covariates in survival analysis. Stat Med. 1999; 18(6):681–94.CrossRefPubMed van Buuren S, Boshuizen HC, Knook DL. Multiple imputation of missing blood pressure covariates in survival analysis. Stat Med. 1999; 18(6):681–94.CrossRefPubMed
9.
go back to reference Ratitch B, O’Kelly M, Tosiello R. Missing data in clinical trials: From clinical assumptions to statistical analysis using pattern mixture models. Pharm Stat. 2013; 12(6):337–47.CrossRefPubMed Ratitch B, O’Kelly M, Tosiello R. Missing data in clinical trials: From clinical assumptions to statistical analysis using pattern mixture models. Pharm Stat. 2013; 12(6):337–47.CrossRefPubMed
10.
go back to reference Greene WH. Econometric Analysis: International Edition (7th Ed.)Edinburgh: Pearson; 2011. Greene WH. Econometric Analysis: International Edition (7th Ed.)Edinburgh: Pearson; 2011.
11.
12.
go back to reference Heckman JJ. The common structure of statistical models of truncation, sample selection and limited dependent variables and a simple estimator for such models. Ann Econ Soc Meas. 1976; 5(4):475–92. Heckman JJ. The common structure of statistical models of truncation, sample selection and limited dependent variables and a simple estimator for such models. Ann Econ Soc Meas. 1976; 5(4):475–92.
13.
go back to reference Heckman JJ. Sample selection bias as a specification error. Econometrica. 1979; 47(1):153–61.CrossRef Heckman JJ. Sample selection bias as a specification error. Econometrica. 1979; 47(1):153–61.CrossRef
14.
go back to reference Toomet O, Henningsen A. Sample selection models in R: Package sampleSelection. J Stat Softw. 2008; 27(7):1–23.CrossRef Toomet O, Henningsen A. Sample selection models in R: Package sampleSelection. J Stat Softw. 2008; 27(7):1–23.CrossRef
15.
go back to reference Van de Ven WPMM, Van Praag BMS. The demand for deductibles in private health insurance: A probit model with sample selection. J Econom. 1981; 17(2):229–52.CrossRef Van de Ven WPMM, Van Praag BMS. The demand for deductibles in private health insurance: A probit model with sample selection. J Econom. 1981; 17(2):229–52.CrossRef
16.
go back to reference Greene W. A stochastic frontier model with correction for sample selection. J Prod Anal. 2010; 34(1):15–24.CrossRef Greene W. A stochastic frontier model with correction for sample selection. J Prod Anal. 2010; 34(1):15–24.CrossRef
17.
go back to reference White IR, Royston P, Wood AM. Multiple imputation using chained equations: Issues and guidance for practice. Stat Med. 2011; 30(4):377–99.CrossRefPubMed White IR, Royston P, Wood AM. Multiple imputation using chained equations: Issues and guidance for practice. Stat Med. 2011; 30(4):377–99.CrossRefPubMed
18.
go back to reference Galimard J-E, Chevret S, Protopopescu C, Resche-Rigon M. A multiple imputation approach for MNAR mechanisms compatible with Heckman’s model. Stat Med. 2016; 35(17):2907–20.CrossRefPubMed Galimard J-E, Chevret S, Protopopescu C, Resche-Rigon M. A multiple imputation approach for MNAR mechanisms compatible with Heckman’s model. Stat Med. 2016; 35(17):2907–20.CrossRefPubMed
19.
go back to reference Marra G, Radice R. A penalized likelihood estimation approach to semiparametric sample selection binary response modeling. Electron J Stat. 2013; 7:1432–55.CrossRef Marra G, Radice R. A penalized likelihood estimation approach to semiparametric sample selection binary response modeling. Electron J Stat. 2013; 7:1432–55.CrossRef
20.
go back to reference Duval X, van der Werf S, Blanchon T, Mosnier A, Bouscambert-Duchamp M, Tibi A, Enouf V, Charlois-Ou C, Vincent C, Andreoletti L, Tubach F, Lina B, Mentré F, Leport C, and the Bivir Study Group. Efficacy of oseltamivir-zanamivir combination compared to each monotherapy for seasonal influenza: A randomized placebo-controlled trial. PLoS Med. 2010; 7(11):1000362.CrossRef Duval X, van der Werf S, Blanchon T, Mosnier A, Bouscambert-Duchamp M, Tibi A, Enouf V, Charlois-Ou C, Vincent C, Andreoletti L, Tubach F, Lina B, Mentré F, Leport C, and the Bivir Study Group. Efficacy of oseltamivir-zanamivir combination compared to each monotherapy for seasonal influenza: A randomized placebo-controlled trial. PLoS Med. 2010; 7(11):1000362.CrossRef
21.
go back to reference Treanor JJ, Hayden FG, Vrooman PS, Barbarash R, Bettis R, Riff D, Singh S, Kinnersley N, Ward P, Mills RG, et al. Efficacy and safety of the oral neuraminidase inhibitor oseltamivir in treating acute influenza: a randomized controlled trial. JAMA. 2000; 283(8):1016–24.CrossRefPubMed Treanor JJ, Hayden FG, Vrooman PS, Barbarash R, Bettis R, Riff D, Singh S, Kinnersley N, Ward P, Mills RG, et al. Efficacy and safety of the oral neuraminidase inhibitor oseltamivir in treating acute influenza: a randomized controlled trial. JAMA. 2000; 283(8):1016–24.CrossRefPubMed
22.
go back to reference Vella F. Estimating models with sample selection bias: A survey. J Hum Resour. 1998; 33(1):127–69.CrossRef Vella F. Estimating models with sample selection bias: A survey. J Hum Resour. 1998; 33(1):127–69.CrossRef
23.
go back to reference Puhani P. The Heckman correction for sample selection and its critique. J Econ Surveys. 2000; 14(1):53–68.CrossRef Puhani P. The Heckman correction for sample selection and its critique. J Econ Surveys. 2000; 14(1):53–68.CrossRef
24.
go back to reference Marra G, Radice R, Bärnighausen T, Wood SN, McGovern ME. A simultaneous equation approach to estimating hiv prevalence with nonignorable missing responses. J Am Stat Assoc. 2017; 112(518):484–96.CrossRef Marra G, Radice R, Bärnighausen T, Wood SN, McGovern ME. A simultaneous equation approach to estimating hiv prevalence with nonignorable missing responses. J Am Stat Assoc. 2017; 112(518):484–96.CrossRef
26.
go back to reference Hughes RA, White IR, Seaman SR, Carpenter JR, Tilling K, Sterne JA. Joint modelling rationale for chained equations. BMC Med Res Methodol. 2014; 14(1):28.CrossRefPubMedPubMedCentral Hughes RA, White IR, Seaman SR, Carpenter JR, Tilling K, Sterne JA. Joint modelling rationale for chained equations. BMC Med Res Methodol. 2014; 14(1):28.CrossRefPubMedPubMedCentral
27.
go back to reference van Buuren S. Multiple imputation of discrete and continuous data by fully conditional specification. Stat Meth Med Res. 2007; 16(3):219–42.CrossRef van Buuren S. Multiple imputation of discrete and continuous data by fully conditional specification. Stat Meth Med Res. 2007; 16(3):219–42.CrossRef
28.
go back to reference van Buuren S, Brand JP, Groothuis-Oudshoorn C, Rubin DB. Fully conditional specification in multivariate imputation. J Stat Comput Simul. 2006; 76(12):1049–64.CrossRef van Buuren S, Brand JP, Groothuis-Oudshoorn C, Rubin DB. Fully conditional specification in multivariate imputation. J Stat Comput Simul. 2006; 76(12):1049–64.CrossRef
29.
go back to reference Rubin DB. Multiple Imputation for Nonresponse in Surveys. New-York: Wiley; 1987.CrossRef Rubin DB. Multiple Imputation for Nonresponse in Surveys. New-York: Wiley; 1987.CrossRef
30.
go back to reference R Core Team. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing; 2016. R Foundation for Statistical Computing. http://www.R-project.org/. R Core Team. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing; 2016. R Foundation for Statistical Computing. http://​www.​R-project.​org/​.
31.
go back to reference van Buuren S, Groothuis-Oudshoorn K. mice: Multivariate imputation by chained equations in R. J Stat Softw. 2011; 45(3):1–67. van Buuren S, Groothuis-Oudshoorn K. mice: Multivariate imputation by chained equations in R. J Stat Softw. 2011; 45(3):1–67.
32.
go back to reference Marra G, Radice R. Estimation of a regression spline sample selection model. Comput Stat Data Anal. 2013; 61:158–73.CrossRef Marra G, Radice R. Estimation of a regression spline sample selection model. Comput Stat Data Anal. 2013; 61:158–73.CrossRef
33.
go back to reference Kaambwa B, Bryan S, Billingham L. Do the methods used to analyse missing data really matter? An examination of data from an observational study of intermediate care patients. BMC Res Notes. 2012; 5(1):330.CrossRefPubMedPubMedCentral Kaambwa B, Bryan S, Billingham L. Do the methods used to analyse missing data really matter? An examination of data from an observational study of intermediate care patients. BMC Res Notes. 2012; 5(1):330.CrossRefPubMedPubMedCentral
34.
go back to reference Bushway S, Johnson BD, Slocum LA. Is the magic still there? the use of the Heckman two-step correction for selection bias in criminology. J Quant Criminol. 2007; 23(2):151–78.CrossRef Bushway S, Johnson BD, Slocum LA. Is the magic still there? the use of the Heckman two-step correction for selection bias in criminology. J Quant Criminol. 2007; 23(2):151–78.CrossRef
35.
go back to reference Gilks WR, Richardson S, Spiegelhalter DJ. Introducing markov chain monte carlo. In: Markov Chain Monte Carlo in Practice. Boca Raton: CRC Press: 1996. p. 75–88. Gilks WR, Richardson S, Spiegelhalter DJ. Introducing markov chain monte carlo. In: Markov Chain Monte Carlo in Practice. Boca Raton: CRC Press: 1996. p. 75–88.
36.
go back to reference Meng X-L. Multiple-imputation inferences with uncongenial sources of input. Stat Sci. 1994; 9:538–58.CrossRef Meng X-L. Multiple-imputation inferences with uncongenial sources of input. Stat Sci. 1994; 9:538–58.CrossRef
37.
go back to reference Liu J, Gelman A, Hill J, Su Y-S, Kropko J. On the stationary distribution of iterative imputations. Biometrika. 2014; 101(1):155–73.CrossRef Liu J, Gelman A, Hill J, Su Y-S, Kropko J. On the stationary distribution of iterative imputations. Biometrika. 2014; 101(1):155–73.CrossRef
38.
go back to reference Marchenko YV, Genton MG. A heckman selection-t model. J Am Stat Assoc. 2012; 107(497):304–17.CrossRef Marchenko YV, Genton MG. A heckman selection-t model. J Am Stat Assoc. 2012; 107(497):304–17.CrossRef
40.
go back to reference Kai L. Bayesian inference in a simultaneous equation model with limited dependent variables. J Econom. 1998; 85(2):387–400.CrossRef Kai L. Bayesian inference in a simultaneous equation model with limited dependent variables. J Econom. 1998; 85(2):387–400.CrossRef
41.
go back to reference van Hasselt M. Bayesian inference in a sample selection model. J Econom. 2011; 165(2):221–32.CrossRef van Hasselt M. Bayesian inference in a sample selection model. J Econom. 2011; 165(2):221–32.CrossRef
Metadata
Title
Heckman imputation models for binary or continuous MNAR outcomes and MAR predictors
Authors
Jacques-Emmanuel Galimard
Sylvie Chevret
Emmanuel Curis
Matthieu Resche-Rigon
Publication date
01-12-2018
Publisher
BioMed Central
Published in
BMC Medical Research Methodology / Issue 1/2018
Electronic ISSN: 1471-2288
DOI
https://doi.org/10.1186/s12874-018-0547-1

Other articles of this Issue 1/2018

BMC Medical Research Methodology 1/2018 Go to the issue