Skip to main content
Top
Published in: BMC Medical Research Methodology 1/2014

Open Access 01-12-2014 | Research article

Joint modelling rationale for chained equations

Authors: Rachael A Hughes, Ian R White, Shaun R Seaman, James R Carpenter, Kate Tilling, Jonathan AC Sterne

Published in: BMC Medical Research Methodology | Issue 1/2014

Login to get access

Abstract

Background

Chained equations imputation is widely used in medical research. It uses a set of conditional models, so is more flexible than joint modelling imputation for the imputation of different types of variables (e.g. binary, ordinal or unordered categorical). However, chained equations imputation does not correspond to drawing from a joint distribution when the conditional models are incompatible. Concurrently with our work, other authors have shown the equivalence of the two imputation methods in finite samples.

Methods

Taking a different approach, we prove, in finite samples, sufficient conditions for chained equations and joint modelling to yield imputations from the same predictive distribution. Further, we apply this proof in four specific cases and conduct a simulation study which explores the consequences when the conditional models are compatible but the conditions otherwise are not satisfied.

Results

We provide an additional “non-informative margins” condition which, together with compatibility, is sufficient. We show that the non-informative margins condition is not satisfied, despite compatible conditional models, in a situation as simple as two continuous variables and one binary variable. Our simulation study demonstrates that as a consequence of this violation order effects can occur; that is, systematic differences depending upon the ordering of the variables in the chained equations algorithm. However, the order effects appear to be small, especially when associations between variables are weak.

Conclusions

Since chained equations is typically used in medical research for datasets with different types of variables, researchers must be aware that order effects are likely to be ubiquitous, but our results suggest they may be small enough to be negligible.
Appendix
Available only for authorised users
Literature
1.
go back to reference Rubin DB: Multiple Imputation for Nonresponse in Surveys. 1987, New York: John Wiley & Sons, IncCrossRef Rubin DB: Multiple Imputation for Nonresponse in Surveys. 1987, New York: John Wiley & Sons, IncCrossRef
2.
go back to reference Schafer JL: Analysis of Incomplete Multivariate Data. 1997, London: Chapman & HallCrossRef Schafer JL: Analysis of Incomplete Multivariate Data. 1997, London: Chapman & HallCrossRef
3.
go back to reference van Buuren S: Multiple imputation of discrete and continuous data by fully conditional specification. Stat Methods Med Res. 2007, 16: 219-242. 10.1177/0962280206074463.CrossRefPubMed van Buuren S: Multiple imputation of discrete and continuous data by fully conditional specification. Stat Methods Med Res. 2007, 16: 219-242. 10.1177/0962280206074463.CrossRefPubMed
4.
go back to reference Raghunathan TE, Lepkowski JM, van Hoewyk J, Solenberger P: A multivariate technique for multiply imputing missing values using a sequence of regression models. Surv Methodol. 2001, 27: 85-95. Raghunathan TE, Lepkowski JM, van Hoewyk J, Solenberger P: A multivariate technique for multiply imputing missing values using a sequence of regression models. Surv Methodol. 2001, 27: 85-95.
5.
go back to reference Gelman A, Raghunathan TE: [Conditionally specified distributions: An introduction]: comment. Stat Sci. 2001, 16: 268-269. Gelman A, Raghunathan TE: [Conditionally specified distributions: An introduction]: comment. Stat Sci. 2001, 16: 268-269.
6.
go back to reference van Buuren S, Boshuizen HC, Knook DL: Multiple imputation of missing blood pressure covariates in survival analysis. Statist Med. 1999, 18: 681-694. 10.1002/(SICI)1097-0258(19990330)18:6<681::AID-SIM71>3.0.CO;2-R.CrossRef van Buuren S, Boshuizen HC, Knook DL: Multiple imputation of missing blood pressure covariates in survival analysis. Statist Med. 1999, 18: 681-694. 10.1002/(SICI)1097-0258(19990330)18:6<681::AID-SIM71>3.0.CO;2-R.CrossRef
7.
go back to reference van Buuren S, Groothuis-Oudshoorn K: mice: Multivariate Imputation by Chained Equations in R. J Stat Softw. 2011, 45: 1-67. van Buuren S, Groothuis-Oudshoorn K: mice: Multivariate Imputation by Chained Equations in R. J Stat Softw. 2011, 45: 1-67.
8.
go back to reference Morgenstern M, Wiborg G, Isensee B, Hanewinkel R: School-based alcohol education: Results of a cluster-randomized controlled trial. Addiction. 2009, 104: 402-412. 10.1111/j.1360-0443.2008.02471.x.CrossRefPubMed Morgenstern M, Wiborg G, Isensee B, Hanewinkel R: School-based alcohol education: Results of a cluster-randomized controlled trial. Addiction. 2009, 104: 402-412. 10.1111/j.1360-0443.2008.02471.x.CrossRefPubMed
9.
go back to reference Mueller B, Cummings P, Rivara F, Brooks M, Terasaki R: Injuries of the head, face, and neck in relation to ski helmet use. Epidemiology. 2008, 19: 270-276. 10.1097/EDE.0b013e318163567c.CrossRefPubMed Mueller B, Cummings P, Rivara F, Brooks M, Terasaki R: Injuries of the head, face, and neck in relation to ski helmet use. Epidemiology. 2008, 19: 270-276. 10.1097/EDE.0b013e318163567c.CrossRefPubMed
10.
go back to reference Nash D, Katyal M, Brinkhof M, Keiser O, May M, Hughes R, Dabis F, Wood R, Sprinz E, Schechter M, Egger M: Long-term immunologic response to antiretroviral therapy in low-income countries: A collaborative analysis of prospective studies. AIDS. 2008, 22: 2291-2302. 10.1097/QAD.0b013e3283121ca9.CrossRefPubMedPubMedCentral Nash D, Katyal M, Brinkhof M, Keiser O, May M, Hughes R, Dabis F, Wood R, Sprinz E, Schechter M, Egger M: Long-term immunologic response to antiretroviral therapy in low-income countries: A collaborative analysis of prospective studies. AIDS. 2008, 22: 2291-2302. 10.1097/QAD.0b013e3283121ca9.CrossRefPubMedPubMedCentral
11.
go back to reference Souverein O, Zwinderman A, Tanck T: Multiple imputation of missing genotype data for unrelated individuals. Ann Hum Genet. 2006, 70: 372-381.CrossRefPubMed Souverein O, Zwinderman A, Tanck T: Multiple imputation of missing genotype data for unrelated individuals. Ann Hum Genet. 2006, 70: 372-381.CrossRefPubMed
12.
go back to reference Huo D, Adebamowo C, Ogundiran T, Akang E, Campbell O, Adenipekun A, Cummings S, Fackenthal J, Ademuyiwa F, Ahsan H, Olopade O: Parity and breastfeeding are protective against breast cancer in nigerian women. Br J Cancer. 2008, 98: 992-996. 10.1038/sj.bjc.6604275.CrossRefPubMedPubMedCentral Huo D, Adebamowo C, Ogundiran T, Akang E, Campbell O, Adenipekun A, Cummings S, Fackenthal J, Ademuyiwa F, Ahsan H, Olopade O: Parity and breastfeeding are protective against breast cancer in nigerian women. Br J Cancer. 2008, 98: 992-996. 10.1038/sj.bjc.6604275.CrossRefPubMedPubMedCentral
13.
go back to reference Wiles N, Jones G, Haase A, Lawlor D, Macfarlane G, Lewis G: Physical activity and emotional problems amongst adolescents. Soc Psychiatry Psychiatric Epidemiol. 2008, 43: 765-772. 10.1007/s00127-008-0362-9.CrossRef Wiles N, Jones G, Haase A, Lawlor D, Macfarlane G, Lewis G: Physical activity and emotional problems amongst adolescents. Soc Psychiatry Psychiatric Epidemiol. 2008, 43: 765-772. 10.1007/s00127-008-0362-9.CrossRef
14.
go back to reference Azur M, Stuart E, Frangakis C, Leaf P: Multiple imputation by chained equations: what is it and how does it work?. Int J Methods Psychiatric Res. 2011, 20: 40-49. 10.1002/mpr.329.CrossRef Azur M, Stuart E, Frangakis C, Leaf P: Multiple imputation by chained equations: what is it and how does it work?. Int J Methods Psychiatric Res. 2011, 20: 40-49. 10.1002/mpr.329.CrossRef
15.
go back to reference White IR, Royston P, Wood A: Multiple imputation using chained equations: Issues and guidance for practice. Stat Med. 2011, 30: 377-399. 10.1002/sim.4067.CrossRefPubMed White IR, Royston P, Wood A: Multiple imputation using chained equations: Issues and guidance for practice. Stat Med. 2011, 30: 377-399. 10.1002/sim.4067.CrossRefPubMed
16.
go back to reference Arnold BC, Press SJ: Compatible conditional distributions. J Am Statist Assoc. 1989, 84: 152-156. 10.1080/01621459.1989.10478750.CrossRef Arnold BC, Press SJ: Compatible conditional distributions. J Am Statist Assoc. 1989, 84: 152-156. 10.1080/01621459.1989.10478750.CrossRef
17.
go back to reference Heckerman D, Chickering DM, Meek C, Rounthwaite R, Kadie C: Dependency networks for inference, collaborative filtering, and data visualization. J Mach Learn Res. 2000, 1: 49-75. Heckerman D, Chickering DM, Meek C, Rounthwaite R, Kadie C: Dependency networks for inference, collaborative filtering, and data visualization. J Mach Learn Res. 2000, 1: 49-75.
18.
go back to reference van Buuren S, Brand JPL, Groothuis-Oudshoorn CGM, Rubin DB: Fully conditional specification in multivariate imputation. J Stat Comput Simulat. 2006, 76: 1049-1064. 10.1080/10629360600810434.CrossRef van Buuren S, Brand JPL, Groothuis-Oudshoorn CGM, Rubin DB: Fully conditional specification in multivariate imputation. J Stat Comput Simulat. 2006, 76: 1049-1064. 10.1080/10629360600810434.CrossRef
19.
go back to reference Liu J, Gelman A, Hill J, Su Y, Kropko J: On the stationary distribution of iterative imputations. Biometrika. 2013, doi: 10.1093/biomet/ast044, Liu J, Gelman A, Hill J, Su Y, Kropko J: On the stationary distribution of iterative imputations. Biometrika. 2013, doi: 10.1093/biomet/ast044,
20.
go back to reference Little RJA, Rubin DB: Statistical Analysis with Missing Data. 2002, New York: John Wiley & Sons, IncCrossRef Little RJA, Rubin DB: Statistical Analysis with Missing Data. 2002, New York: John Wiley & Sons, IncCrossRef
21.
go back to reference Tanner MA, Wong WH: The calculation of posterior distributions by data augmentation. J Am Statist Assoc. 1987, 82: 528-540. 10.1080/01621459.1987.10478458.CrossRef Tanner MA, Wong WH: The calculation of posterior distributions by data augmentation. J Am Statist Assoc. 1987, 82: 528-540. 10.1080/01621459.1987.10478458.CrossRef
22.
go back to reference Arnold B, Castillo E, Sarabia J: Conditionally specified distributions an introduction. Stat Sci. 2001, 16: 249-265. 10.1214/ss/1009213728.CrossRef Arnold B, Castillo E, Sarabia J: Conditionally specified distributions an introduction. Stat Sci. 2001, 16: 249-265. 10.1214/ss/1009213728.CrossRef
23.
go back to reference Kirkwood B, Sterne J: Essential Medical Statistics. 2003, Hoboken, New Jersey, US: Wiley-Blackwell Kirkwood B, Sterne J: Essential Medical Statistics. 2003, Hoboken, New Jersey, US: Wiley-Blackwell
24.
go back to reference Albert J: Bayesian computation with R. 2009, Dordrecht, Heidelberg, London, New York: SpringerCrossRef Albert J: Bayesian computation with R. 2009, Dordrecht, Heidelberg, London, New York: SpringerCrossRef
25.
go back to reference Efron B: The efficiency of logistic regression compared to normal discriminant analysis. J Am Statist Assoc. 1975, 70: 892-898. 10.1080/01621459.1975.10480319.CrossRef Efron B: The efficiency of logistic regression compared to normal discriminant analysis. J Am Statist Assoc. 1975, 70: 892-898. 10.1080/01621459.1975.10480319.CrossRef
26.
go back to reference Cox D, Snell E: Analysis of Binary Data. second edition. 1989, London, UK: Chapman and Hall Cox D, Snell E: Analysis of Binary Data. second edition. 1989, London, UK: Chapman and Hall
28.
go back to reference Whittaker J: Graphical models in applied multivariate statistics. 1990, New York: John Wiley & Sons, Inc Whittaker J: Graphical models in applied multivariate statistics. 1990, New York: John Wiley & Sons, Inc
29.
go back to reference Asmussen S, Edwards D: Collapsibility and response variables in contingency tables. Biometrika. 1983, 70: 567-578. 10.1093/biomet/70.3.567.CrossRef Asmussen S, Edwards D: Collapsibility and response variables in contingency tables. Biometrika. 1983, 70: 567-578. 10.1093/biomet/70.3.567.CrossRef
30.
go back to reference Olkin I, Tate RF: Multivariate correlation models with mixed discrete and continuous variables. Ann Math Stat. 1961, 32: 448-465. 10.1214/aoms/1177705052.CrossRef Olkin I, Tate RF: Multivariate correlation models with mixed discrete and continuous variables. Ann Math Stat. 1961, 32: 448-465. 10.1214/aoms/1177705052.CrossRef
31.
go back to reference Arnold B, Castillo E, Sarabia J: Compatibility of partial or complete conditional probability specifications. J Stat Plann Inference. 2004, 123: 133-159. 10.1016/S0378-3758(03)00137-X.CrossRef Arnold B, Castillo E, Sarabia J: Compatibility of partial or complete conditional probability specifications. J Stat Plann Inference. 2004, 123: 133-159. 10.1016/S0378-3758(03)00137-X.CrossRef
32.
go back to reference Ip E, Wang Y: Canonical representation of conditionally specified multivariate discrete distributions. J Multivariate Anal. 2009, 100: 1282-1290. 10.1016/j.jmva.2008.11.010.CrossRef Ip E, Wang Y: Canonical representation of conditionally specified multivariate discrete distributions. J Multivariate Anal. 2009, 100: 1282-1290. 10.1016/j.jmva.2008.11.010.CrossRef
33.
go back to reference Tian G, Tan M, Ng K, Tang M: A unified method for checking compatibility and uniqueness for discrete conditional distributions. Commun Stat: Theory Methods. 2009, 38: 115-129.CrossRef Tian G, Tan M, Ng K, Tang M: A unified method for checking compatibility and uniqueness for discrete conditional distributions. Commun Stat: Theory Methods. 2009, 38: 115-129.CrossRef
34.
go back to reference Chen H: Compatibility of conditionally specified models. Stat Probability Lett. 2010, 80: 670-677. 10.1016/j.spl.2009.12.025.CrossRef Chen H: Compatibility of conditionally specified models. Stat Probability Lett. 2010, 80: 670-677. 10.1016/j.spl.2009.12.025.CrossRef
35.
go back to reference Kuo K, Wang Y: A simple algorithm for checking compatibility among discrete distributions. Comput Stat Data Anal. 2011, 55: 2457-2462. 10.1016/j.csda.2011.02.017.CrossRef Kuo K, Wang Y: A simple algorithm for checking compatibility among discrete distributions. Comput Stat Data Anal. 2011, 55: 2457-2462. 10.1016/j.csda.2011.02.017.CrossRef
36.
go back to reference Horton N, Kleinman K: Much ado about nothing: A comparison of missing data methods and software to fit incomplete data regression models. Am Stat. 2007, 61: 79-90. 10.1198/000313007X172556.CrossRefPubMedPubMedCentral Horton N, Kleinman K: Much ado about nothing: A comparison of missing data methods and software to fit incomplete data regression models. Am Stat. 2007, 61: 79-90. 10.1198/000313007X172556.CrossRefPubMedPubMedCentral
37.
go back to reference Yu LM, Burton A, Rivero-Arias O: Evaluation of software for multiple imputation of semi-continuous data. Stat Methods Med Res. 2007, 16: 243-258. 10.1177/0962280206074464.CrossRefPubMed Yu LM, Burton A, Rivero-Arias O: Evaluation of software for multiple imputation of semi-continuous data. Stat Methods Med Res. 2007, 16: 243-258. 10.1177/0962280206074464.CrossRefPubMed
38.
go back to reference Kenward M, Carpenter J: Multiple imputation: current perspectives. Stat Methods Med Res. 2007, 16: 199-218. 10.1177/0962280206075304.CrossRefPubMed Kenward M, Carpenter J: Multiple imputation: current perspectives. Stat Methods Med Res. 2007, 16: 199-218. 10.1177/0962280206075304.CrossRefPubMed
Metadata
Title
Joint modelling rationale for chained equations
Authors
Rachael A Hughes
Ian R White
Shaun R Seaman
James R Carpenter
Kate Tilling
Jonathan AC Sterne
Publication date
01-12-2014
Publisher
BioMed Central
Published in
BMC Medical Research Methodology / Issue 1/2014
Electronic ISSN: 1471-2288
DOI
https://doi.org/10.1186/1471-2288-14-28

Other articles of this Issue 1/2014

BMC Medical Research Methodology 1/2014 Go to the issue