Skip to main content
Top
Published in: BMC Medical Research Methodology 1/2011

Open Access 01-12-2011 | Research article

Imputation strategies for missing binary outcomes in cluster randomized trials

Authors: Jinhui Ma, Noori Akhtar-Danesh, Lisa Dolovich, Lehana Thabane, the CHAT investigators

Published in: BMC Medical Research Methodology | Issue 1/2011

Login to get access

Abstract

Background

Attrition, which leads to missing data, is a common problem in cluster randomized trials (CRTs), where groups of patients rather than individuals are randomized. Standard multiple imputation (MI) strategies may not be appropriate to impute missing data from CRTs since they assume independent data. In this paper, under the assumption of missing completely at random and covariate dependent missing, we compared six MI strategies which account for the intra-cluster correlation for missing binary outcomes in CRTs with the standard imputation strategies and complete case analysis approach using a simulation study.

Method

We considered three within-cluster and three across-cluster MI strategies for missing binary outcomes in CRTs. The three within-cluster MI strategies are logistic regression method, propensity score method, and Markov chain Monte Carlo (MCMC) method, which apply standard MI strategies within each cluster. The three across-cluster MI strategies are propensity score method, random-effects (RE) logistic regression approach, and logistic regression with cluster as a fixed effect. Based on the community hypertension assessment trial (CHAT) which has complete data, we designed a simulation study to investigate the performance of above MI strategies.

Results

The estimated treatment effect and its 95% confidence interval (CI) from generalized estimating equations (GEE) model based on the CHAT complete dataset are 1.14 (0.76 1.70). When 30% of binary outcome are missing completely at random, a simulation study shows that the estimated treatment effects and the corresponding 95% CIs from GEE model are 1.15 (0.76 1.75) if complete case analysis is used, 1.12 (0.72 1.73) if within-cluster MCMC method is used, 1.21 (0.80 1.81) if across-cluster RE logistic regression is used, and 1.16 (0.82 1.64) if standard logistic regression which does not account for clustering is used.

Conclusion

When the percentage of missing data is low or intra-cluster correlation coefficient is small, different approaches for handling missing binary outcome data generate quite similar results. When the percentage of missing data is large, standard MI strategies, which do not take into account the intra-cluster correlation, underestimate the variance of the treatment effect. Within-cluster and across-cluster MI strategies (except for random-effects logistic regression MI strategy), which take the intra-cluster correlation into account, seem to be more appropriate to handle the missing outcome from CRTs. Under the same imputation strategy and percentage of missingness, the estimates of the treatment effect from GEE and RE logistic regression models are similar.
Appendix
Available only for authorised users
Literature
1.
go back to reference Campbell MK, Grimshaw JM: Cluster randomised trials: time for improvement. The implications of adopting a cluster design are still largely being ignored. BMJ. 1998, 317 (7167): 1171-1172.CrossRefPubMedPubMedCentral Campbell MK, Grimshaw JM: Cluster randomised trials: time for improvement. The implications of adopting a cluster design are still largely being ignored. BMJ. 1998, 317 (7167): 1171-1172.CrossRefPubMedPubMedCentral
2.
go back to reference COMMIT Research Group: Community Intervention trial for Smoking Cessation (COMMIT): 1. Cohort results from a four-year community intervention. Am J Public Health. 1995, 85: 183-192. 10.2105/AJPH.85.2.183.CrossRef COMMIT Research Group: Community Intervention trial for Smoking Cessation (COMMIT): 1. Cohort results from a four-year community intervention. Am J Public Health. 1995, 85: 183-192. 10.2105/AJPH.85.2.183.CrossRef
3.
go back to reference Donner A, Klar N: Design and Analysis of Cluster Randomisation Trials in Health Research. 2000, London: Arnold Donner A, Klar N: Design and Analysis of Cluster Randomisation Trials in Health Research. 2000, London: Arnold
4.
go back to reference Cornfield J: Randomization by group: a formal analysis. Am J Epidemiol. 1978, 108 (2): 100-102.PubMed Cornfield J: Randomization by group: a formal analysis. Am J Epidemiol. 1978, 108 (2): 100-102.PubMed
5.
go back to reference Donner A, Brown KS, Brasher P: A methodological review of non-therapeutic intervention trials employing cluster randomization, 1979-1989. Int J Epidemiol. 1990, 19 (4): 795-800. 10.1093/ije/19.4.795.CrossRefPubMed Donner A, Brown KS, Brasher P: A methodological review of non-therapeutic intervention trials employing cluster randomization, 1979-1989. Int J Epidemiol. 1990, 19 (4): 795-800. 10.1093/ije/19.4.795.CrossRefPubMed
6.
go back to reference Rubin DB: Inference and missing data. Biometrika. 1976, 63: 581-592. 10.1093/biomet/63.3.581.CrossRef Rubin DB: Inference and missing data. Biometrika. 1976, 63: 581-592. 10.1093/biomet/63.3.581.CrossRef
7.
go back to reference Allison PD: Missing Data. 2001, SAGE Publications Inc Allison PD: Missing Data. 2001, SAGE Publications Inc
8.
go back to reference Schafer JL, Olsen MK: Multiple imputation for multivariate missing-data problems: a data analyst's perspective. Multivariate Behavioral Research. 1998, 33: 545-571. 10.1207/s15327906mbr3304_5.CrossRefPubMed Schafer JL, Olsen MK: Multiple imputation for multivariate missing-data problems: a data analyst's perspective. Multivariate Behavioral Research. 1998, 33: 545-571. 10.1207/s15327906mbr3304_5.CrossRefPubMed
9.
go back to reference McArdle JJ: Structural factor analysis experiments with incomplete data. Multivariate Behavioral Research. 1994, 29: 409-454. 10.1207/s15327906mbr2904_5.CrossRefPubMed McArdle JJ: Structural factor analysis experiments with incomplete data. Multivariate Behavioral Research. 1994, 29: 409-454. 10.1207/s15327906mbr2904_5.CrossRefPubMed
10.
go back to reference Little RJA, Rubin DB: Statistical Analysis with missing data. 2002, New York: John Wiley, Second Little RJA, Rubin DB: Statistical Analysis with missing data. 2002, New York: John Wiley, Second
11.
go back to reference Rubin DB: Multiple Imputation for Nonresponse in Surveys. 1987, New York, NY.: John Wiley & Sons, IncCrossRef Rubin DB: Multiple Imputation for Nonresponse in Surveys. 1987, New York, NY.: John Wiley & Sons, IncCrossRef
12.
go back to reference Yi GYY, Cook RJ: Marginal Methods for Incomplete Longitudinal Data Arising in Clusters. Journal of the American Statistical Association. 2002, 97 (460): 1071-1080. 10.1198/016214502388618889.CrossRef Yi GYY, Cook RJ: Marginal Methods for Incomplete Longitudinal Data Arising in Clusters. Journal of the American Statistical Association. 2002, 97 (460): 1071-1080. 10.1198/016214502388618889.CrossRef
13.
go back to reference Hunsberger S, Murray D, Davis CE, Fabsitz RR: Imputation strategies for missing data in a school-based multi-centre study: the Pathways study. Stat Med. 2001, 20 (2): 305-316. 10.1002/1097-0258(20010130)20:2<305::AID-SIM645>3.0.CO;2-M.CrossRefPubMed Hunsberger S, Murray D, Davis CE, Fabsitz RR: Imputation strategies for missing data in a school-based multi-centre study: the Pathways study. Stat Med. 2001, 20 (2): 305-316. 10.1002/1097-0258(20010130)20:2<305::AID-SIM645>3.0.CO;2-M.CrossRefPubMed
14.
go back to reference Nixon RM, Duffy SW, Fender GR: Imputation of a true endpoint from a surrogate: application to a cluster randomized controlled trial with partial information on the true endpoint. BMC Med Res Methodol. 2003, 3: 17-10.1186/1471-2288-3-17.CrossRefPubMedPubMedCentral Nixon RM, Duffy SW, Fender GR: Imputation of a true endpoint from a surrogate: application to a cluster randomized controlled trial with partial information on the true endpoint. BMC Med Res Methodol. 2003, 3: 17-10.1186/1471-2288-3-17.CrossRefPubMedPubMedCentral
15.
go back to reference Green SB, Corle DK, Gail MH, Mark SD, Pee D, Freedman LS, Graubard BI, Lynn WR: Interplay between design and analysis for behavioral intervention trials with community as the unit of randomization. Am J Epidemiol. 1995, 142 (6): 587-593.PubMed Green SB, Corle DK, Gail MH, Mark SD, Pee D, Freedman LS, Graubard BI, Lynn WR: Interplay between design and analysis for behavioral intervention trials with community as the unit of randomization. Am J Epidemiol. 1995, 142 (6): 587-593.PubMed
16.
go back to reference Green SB: The advantages of community-randomized trials for evaluating lifestyle modification. Control Clin Trials. 1997, 18 (6): 506-13. 10.1016/S0197-2456(97)00013-5. discussion 514-6CrossRefPubMed Green SB: The advantages of community-randomized trials for evaluating lifestyle modification. Control Clin Trials. 1997, 18 (6): 506-13. 10.1016/S0197-2456(97)00013-5. discussion 514-6CrossRefPubMed
17.
go back to reference Taljaard M, Donner A, Klar N: Imputation strategies for missing continuous outcomes in cluster randomized trials. Biom J. 2008, 50 (3): 329-345. 10.1002/bimj.200710423.CrossRefPubMed Taljaard M, Donner A, Klar N: Imputation strategies for missing continuous outcomes in cluster randomized trials. Biom J. 2008, 50 (3): 329-345. 10.1002/bimj.200710423.CrossRefPubMed
18.
go back to reference Kenward MG, Carpenter J: Multiple imputation: current perspectives. Stat Methods Med Res. 2007, 16 (3): 199-218. 10.1177/0962280206075304.CrossRefPubMed Kenward MG, Carpenter J: Multiple imputation: current perspectives. Stat Methods Med Res. 2007, 16 (3): 199-218. 10.1177/0962280206075304.CrossRefPubMed
19.
go back to reference Dobson AJ: An introduction to generalized linear models. 2002, Boca Raton: Chapman & Hall/CRC, 2 Dobson AJ: An introduction to generalized linear models. 2002, Boca Raton: Chapman & Hall/CRC, 2
20.
go back to reference Schafer JL: Analysis of Incomplete Multivariate Data. 1997, London: Chapman and HallCrossRef Schafer JL: Analysis of Incomplete Multivariate Data. 1997, London: Chapman and HallCrossRef
22.
go back to reference Rubin DB, Schenker N: Multiple imputation for interval estimation from simple random samples with ignorable nonresponse. Journal of the American Statistical Association. 1986, 81 (394): 366-374. 10.2307/2289225.CrossRef Rubin DB, Schenker N: Multiple imputation for interval estimation from simple random samples with ignorable nonresponse. Journal of the American Statistical Association. 1986, 81 (394): 366-374. 10.2307/2289225.CrossRef
23.
go back to reference Ma J, Thabane L, Kaczorowski J, Chambers L, Dolovich L, Karwalajtys T, Levitt C: Comparison of Bayesian and classical methods in the analysis of cluster randomized controlled trials with a binary outcome: the Community Hypertension Assessment Trial (CHAT). BMC Med Res Methodol. 2009, 9: 37-10.1186/1471-2288-9-37.CrossRefPubMedPubMedCentral Ma J, Thabane L, Kaczorowski J, Chambers L, Dolovich L, Karwalajtys T, Levitt C: Comparison of Bayesian and classical methods in the analysis of cluster randomized controlled trials with a binary outcome: the Community Hypertension Assessment Trial (CHAT). BMC Med Res Methodol. 2009, 9: 37-10.1186/1471-2288-9-37.CrossRefPubMedPubMedCentral
24.
go back to reference Levin KA: Study design VII. Randomised controlled trials. Evid Based Dent. 2007, 8 (1): 22-23. 10.1038/sj.ebd.6400473.CrossRefPubMed Levin KA: Study design VII. Randomised controlled trials. Evid Based Dent. 2007, 8 (1): 22-23. 10.1038/sj.ebd.6400473.CrossRefPubMed
25.
go back to reference Matthews FE, Chatfield M, Freeman C, McCracken C, Brayne C, MRC CFAS: Attrition and bias in the MRC cognitive function and ageing study: an epidemiological investigation. BMC Public Health. 2004, 4: 12-10.1186/1471-2458-4-12.CrossRefPubMedPubMedCentral Matthews FE, Chatfield M, Freeman C, McCracken C, Brayne C, MRC CFAS: Attrition and bias in the MRC cognitive function and ageing study: an epidemiological investigation. BMC Public Health. 2004, 4: 12-10.1186/1471-2458-4-12.CrossRefPubMedPubMedCentral
26.
go back to reference Ostbye T, Steenhuis R, Wolfson C, Walton R, Hill G: Predictors of five-year mortality in older Canadians: the Canadian Study of Health and Aging. J Am Geriatr Soc. 1999, 47 (10): 1249-1254.CrossRefPubMed Ostbye T, Steenhuis R, Wolfson C, Walton R, Hill G: Predictors of five-year mortality in older Canadians: the Canadian Study of Health and Aging. J Am Geriatr Soc. 1999, 47 (10): 1249-1254.CrossRefPubMed
27.
go back to reference Viera AJ, Garrett JM: Understanding interobserver agreement: the kappa statistic. Fam Med. 2005, 37 (5): 360-363.PubMed Viera AJ, Garrett JM: Understanding interobserver agreement: the kappa statistic. Fam Med. 2005, 37 (5): 360-363.PubMed
28.
go back to reference Laurenceau JP, Stanley SM, Olmos-Gallo A, Baucom B, Markman HJ: Community-based prevention of marital dysfunction: multilevel modeling of a randomized effectiveness study. J Consult Clin Psychol. 2004, 72 (6): 933-943. 10.1037/0022-006X.72.6.933.CrossRefPubMed Laurenceau JP, Stanley SM, Olmos-Gallo A, Baucom B, Markman HJ: Community-based prevention of marital dysfunction: multilevel modeling of a randomized effectiveness study. J Consult Clin Psychol. 2004, 72 (6): 933-943. 10.1037/0022-006X.72.6.933.CrossRefPubMed
29.
go back to reference Shrive FM, Stuart H, Quan H, Ghali WA: Dealing with missing data in a multi-question depression scale: a comparison of imputation methods. BMC Med Res Methodol. 2006, 6: 57-10.1186/1471-2288-6-57.CrossRefPubMedPubMedCentral Shrive FM, Stuart H, Quan H, Ghali WA: Dealing with missing data in a multi-question depression scale: a comparison of imputation methods. BMC Med Res Methodol. 2006, 6: 57-10.1186/1471-2288-6-57.CrossRefPubMedPubMedCentral
30.
go back to reference Elobeid MA, Padilla MA, McVie T, Thomas O, Brock DW, Musser B, Lu K, Coffey CS, Desmond RA, St-Onge MP, Gadde KM, Heymsfield SB, Allison DB: Missing data in randomized clinical trials for weight loss: scope of the problem, state of the field, and performance of statistical methods. PLoS One. 2009, 4 (8): e6624-10.1371/journal.pone.0006624.CrossRefPubMedPubMedCentral Elobeid MA, Padilla MA, McVie T, Thomas O, Brock DW, Musser B, Lu K, Coffey CS, Desmond RA, St-Onge MP, Gadde KM, Heymsfield SB, Allison DB: Missing data in randomized clinical trials for weight loss: scope of the problem, state of the field, and performance of statistical methods. PLoS One. 2009, 4 (8): e6624-10.1371/journal.pone.0006624.CrossRefPubMedPubMedCentral
31.
go back to reference McCulloch CE, Neuhaus JM: Prediction of Random Effects in Linear and Generalized Linear Models under Model Misspecification. Biometrics. McCulloch CE, Neuhaus JM: Prediction of Random Effects in Linear and Generalized Linear Models under Model Misspecification. Biometrics.
32.
go back to reference Neuhaus JM, McCulloch CE: Separating between- and within-cluster covariate effects using conditional and partitioning methods. Journal of the Royal Statistical Society. 2006, 859-872. Series B, 68 Neuhaus JM, McCulloch CE: Separating between- and within-cluster covariate effects using conditional and partitioning methods. Journal of the Royal Statistical Society. 2006, 859-872. Series B, 68
33.
go back to reference Heagerty PJ, Kurland BF: Misspecified maximum likelihood estimates and generalised linear mixed models. Biometrika. 2001, 88 (4): 973-985. 10.1093/biomet/88.4.973.CrossRef Heagerty PJ, Kurland BF: Misspecified maximum likelihood estimates and generalised linear mixed models. Biometrika. 2001, 88 (4): 973-985. 10.1093/biomet/88.4.973.CrossRef
34.
go back to reference Christopher FA: Rounding after multiple imputation with Non-binary categorical covariates. SAS Focus Session SUGI. 2004, 30: Christopher FA: Rounding after multiple imputation with Non-binary categorical covariates. SAS Focus Session SUGI. 2004, 30:
35.
go back to reference Horton NJ, Lipsitz SR, Parzen M: A potential for bias when rounding in multiple imputation. American Statistician. 2003, 229-232. 10.1198/0003130032314. 57 Horton NJ, Lipsitz SR, Parzen M: A potential for bias when rounding in multiple imputation. American Statistician. 2003, 229-232. 10.1198/0003130032314. 57
36.
go back to reference Li X, Mehrotra DV, Barnard J: Analysis of incomplete longitudinal binary data using multiple imputation. Stat Med. 2006, 25 (12): 2107-2124. 10.1002/sim.2343.CrossRefPubMed Li X, Mehrotra DV, Barnard J: Analysis of incomplete longitudinal binary data using multiple imputation. Stat Med. 2006, 25 (12): 2107-2124. 10.1002/sim.2343.CrossRefPubMed
37.
go back to reference Collins LM, Schafer JL, Kam CM: A comparison of inclusive and restrictive strategies in modern missing data procedures. Psychol Methods. 2001, 6 (4): 330-351. 10.1037/1082-989X.6.4.330.CrossRefPubMed Collins LM, Schafer JL, Kam CM: A comparison of inclusive and restrictive strategies in modern missing data procedures. Psychol Methods. 2001, 6 (4): 330-351. 10.1037/1082-989X.6.4.330.CrossRefPubMed
Metadata
Title
Imputation strategies for missing binary outcomes in cluster randomized trials
Authors
Jinhui Ma
Noori Akhtar-Danesh
Lisa Dolovich
Lehana Thabane
the CHAT investigators
Publication date
01-12-2011
Publisher
BioMed Central
Published in
BMC Medical Research Methodology / Issue 1/2011
Electronic ISSN: 1471-2288
DOI
https://doi.org/10.1186/1471-2288-11-18

Other articles of this Issue 1/2011

BMC Medical Research Methodology 1/2011 Go to the issue