Skip to main content
Top
Published in: BMC Medical Research Methodology 1/2017

Open Access 01-12-2017 | Technical advance

Methods for significance testing of categorical covariates in logistic regression models after multiple imputation: power and applicability analysis

Authors: Iris Eekhout, Mark A. van de Wiel, Martijn W. Heymans

Published in: BMC Medical Research Methodology | Issue 1/2017

Login to get access

Abstract

Background

Multiple imputation is a recommended method to handle missing data. For significance testing after multiple imputation, Rubin’s Rules (RR) are easily applied to pool parameter estimates. In a logistic regression model, to consider whether a categorical covariate with more than two levels significantly contributes to the model, different methods are available. For example pooling chi-square tests with multiple degrees of freedom, pooling likelihood ratio test statistics, and pooling based on the covariance matrix of the regression model. These methods are more complex than RR and are not available in all mainstream statistical software packages. In addition, they do not always obtain optimal power levels. We argue that the median of the p-values from the overall significance tests from the analyses on the imputed datasets can be used as an alternative pooling rule for categorical variables. The aim of the current study is to compare different methods to test a categorical variable for significance after multiple imputation on applicability and power.

Methods

In a large simulation study, we demonstrated the control of the type I error and power levels of different pooling methods for categorical variables.

Results

This simulation study showed that for non-significant categorical covariates the type I error is controlled and the statistical power of the median pooling rule was at least equal to current multiple parameter tests. An empirical data example showed similar results.

Conclusions

It can therefore be concluded that using the median of the p-values from the imputed data analyses is an attractive and easy to use alternative method for significance testing of categorical variables.
Appendix
Available only for authorised users
Literature
3.
5.
go back to reference Li KH, Meng XL, Raghunathan T, Rubin DB. Significance levels from repeated p-values with multiply-imputed data. Statitsica Sin. 1991;1:65–92. Li KH, Meng XL, Raghunathan T, Rubin DB. Significance levels from repeated p-values with multiply-imputed data. Statitsica Sin. 1991;1:65–92.
6.
go back to reference Enders CK. Applied missing data analysis. New York: The Guilford Press; 2010. Enders CK. Applied missing data analysis. New York: The Guilford Press; 2010.
7.
go back to reference Meng X-L, Rubin DB. Performing likelihood ratio tests with multiply-imputed data sets. Biometrika. 1992;79:103–11.CrossRef Meng X-L, Rubin DB. Performing likelihood ratio tests with multiply-imputed data sets. Biometrika. 1992;79:103–11.CrossRef
9.
go back to reference van Buuren S, Groothuis-Oudshoorn K. MICE: multivariate imputation by chained equations in R. J Stat Softw. 2009;45:1–67. van Buuren S, Groothuis-Oudshoorn K. MICE: multivariate imputation by chained equations in R. J Stat Softw. 2009;45:1–67.
11.
go back to reference Rubin DB. Multiple imputation for nonresponse in surveys. New York: John Wiley and Sons; 1987.CrossRef Rubin DB. Multiple imputation for nonresponse in surveys. New York: John Wiley and Sons; 1987.CrossRef
12.
13.
go back to reference Li KH, Raghunathan T, Rubin DB. Large-sample significance levels from multiply imputed data using moment-based statitsics and an F reference distribution. J Am Stat Assoc. 1991;86:1065–73. Li KH, Raghunathan T, Rubin DB. Large-sample significance levels from multiply imputed data using moment-based statitsics and an F reference distribution. J Am Stat Assoc. 1991;86:1065–73.
15.
go back to reference Heymans MW, de Vet HCW, Bongers PM, Knol DL, Koes BW, van Mechelen W. The effectiveness of high-intensity versus low-intensity back schools in an occupational setting: a pragmatic randomized controlled trial. Spine (Phila Pa 1976). 2006;31:1075–82. doi:10.1097/01.brs.0000216443.46783.4d.CrossRef Heymans MW, de Vet HCW, Bongers PM, Knol DL, Koes BW, van Mechelen W. The effectiveness of high-intensity versus low-intensity back schools in an occupational setting: a pragmatic randomized controlled trial. Spine (Phila Pa 1976). 2006;31:1075–82. doi:10.​1097/​01.​brs.​0000216443.​46783.​4d.CrossRef
17.
go back to reference StataCorp. 2013. Stata Statistical Software: Release 13. College Station, TX: StataCorp LP. StataCorp. 2013. Stata Statistical Software: Release 13. College Station, TX: StataCorp LP.
18.
go back to reference Muthén LK, Muthén BO. Mplus User’s Guide. Seventh ed. Muthén & Muthén: Los Angeles; 2012. Muthén LK, Muthén BO. Mplus User’s Guide. Seventh ed. Muthén & Muthén: Los Angeles; 2012.
19.
go back to reference Schoemann A, Mille P, Rhemtulla M, Pornprasertmanit S, Enders CK. Combining likelihood ratio Chi-Square statistics from a multiple imputation analysis in R 2012. Schoemann A, Mille P, Rhemtulla M, Pornprasertmanit S, Enders CK. Combining likelihood ratio Chi-Square statistics from a multiple imputation analysis in R 2012.
20.
go back to reference Mistler SA. A SAS macro for Computing Pooled Likelihood Ratio Tests with Multiply Imputed Data, in Proceedings of the SAS Global Forum 2013, San Francisco, California: Contributed Paper (Statistics and Data Analysis). 2013;440-2013. Mistler SA. A SAS macro for Computing Pooled Likelihood Ratio Tests with Multiply Imputed Data, in Proceedings of the SAS Global Forum 2013, San Francisco, California: Contributed Paper (Statistics and Data Analysis). 2013;440-2013. 
Metadata
Title
Methods for significance testing of categorical covariates in logistic regression models after multiple imputation: power and applicability analysis
Authors
Iris Eekhout
Mark A. van de Wiel
Martijn W. Heymans
Publication date
01-12-2017
Publisher
BioMed Central
Published in
BMC Medical Research Methodology / Issue 1/2017
Electronic ISSN: 1471-2288
DOI
https://doi.org/10.1186/s12874-017-0404-7

Other articles of this Issue 1/2017

BMC Medical Research Methodology 1/2017 Go to the issue