Skip to main content
Top
Published in: BMC Medical Research Methodology 1/2010

Open Access 01-12-2010 | Research article

Model selection in Medical Research: A simulation study comparing Bayesian Model Averaging and Stepwise Regression

Authors: Anna Genell, Szilard Nemes, Gunnar Steineck, Paul W Dickman

Published in: BMC Medical Research Methodology | Issue 1/2010

Login to get access

Abstract

Background

Automatic variable selection methods are usually discouraged in medical research although we believe they might be valuable for studies where subject matter knowledge is limited. Bayesian model averaging may be useful for model selection but only limited attempts to compare it to stepwise regression have been published. We therefore performed a simulation study to compare stepwise regression with Bayesian model averaging.

Methods

We simulated data corresponding to five different data generating processes and thirty different values of the effect size (the parameter estimate divided by its standard error). Each data generating process contained twenty explanatory variables in total and had between zero and two true predictors. Three data generating processes were built of uncorrelated predictor variables while two had a mixture of correlated and uncorrelated variables. We fitted linear regression models to the simulated data. We used Bayesian model averaging and stepwise regression respectively as model selection procedures and compared the estimated selection probabilities.

Results

The estimated probability of not selecting a redundant variable was between 0.99 and 1 for Bayesian model averaging while approximately 0.95 for stepwise regression when the redundant variable was not correlated with a true predictor. These probabilities did not depend on the effect size of the true predictor. In the case of correlation between a redundant variable and a true predictor, the probability of not selecting a redundant variable was 0.95 to 1 for Bayesian model averaging while for stepwise regression it was between 0.7 and 0.9, depending on the effect size of the true predictor. The probability of selecting a true predictor increased as the effect size of the true predictor increased and leveled out at between 0.9 and 1 for stepwise regression, while it leveled out at 1 for Bayesian model averaging.

Conclusions

Our simulation study showed that under the given conditions, Bayesian model averaging had a higher probability of not selecting a redundant variable than stepwise regression and had a similar probability of selecting a true predictor. Medical researchers building regression models with limited subject matter knowledge could thus benefit from using Bayesian model averaging.
Appendix
Available only for authorised users
Literature
1.
go back to reference Hoeting JA, Madigan D, Raftery AE, Volinsky CT: Bayesian Model Averaging: A Tutorial. Statistical Science. 1999, 14: 382-417. 10.1214/ss/1009212519.CrossRef Hoeting JA, Madigan D, Raftery AE, Volinsky CT: Bayesian Model Averaging: A Tutorial. Statistical Science. 1999, 14: 382-417. 10.1214/ss/1009212519.CrossRef
4.
go back to reference Mundry R, Nunn CL: Stepwise model fitting and statistical inference: turning noise into signal pollution. Am Nat. 2009, 173: 119-123. 10.1086/593303.CrossRefPubMed Mundry R, Nunn CL: Stepwise model fitting and statistical inference: turning noise into signal pollution. Am Nat. 2009, 173: 119-123. 10.1086/593303.CrossRefPubMed
5.
6.
go back to reference Malek MH, Berger DE, Coburn JW: On the inappropriateness of stepwise regression analysis for model building and testing. Eur J Appl Physiol. 2007, 101 (2): 263-4. 10.1007/s00421-007-0485-9. author reply 265-6CrossRefPubMed Malek MH, Berger DE, Coburn JW: On the inappropriateness of stepwise regression analysis for model building and testing. Eur J Appl Physiol. 2007, 101 (2): 263-4. 10.1007/s00421-007-0485-9. author reply 265-6CrossRefPubMed
7.
go back to reference Pace NL: Independent predictors from stepwise logistic regression may be nothing more than publishable P values. Anesth Analg. 2008, 107 (6): 1775-1778. 10.1213/ane.0b013e31818c1297.CrossRefPubMed Pace NL: Independent predictors from stepwise logistic regression may be nothing more than publishable P values. Anesth Analg. 2008, 107 (6): 1775-1778. 10.1213/ane.0b013e31818c1297.CrossRefPubMed
8.
go back to reference Wang D, Zhang W, Bakhai A: Comparison of Bayesian model averaging and stepwise methods for model selection in logistic regression. Statistics in Medicine. 2004, 23: 3451-3467. 10.1002/sim.1930.CrossRefPubMed Wang D, Zhang W, Bakhai A: Comparison of Bayesian model averaging and stepwise methods for model selection in logistic regression. Statistics in Medicine. 2004, 23: 3451-3467. 10.1002/sim.1930.CrossRefPubMed
9.
go back to reference Kass RE, Raftery AE: Bayes Factors. Journal of the American Statistical Association. 1995, 90: 773-795. 10.2307/2291091.CrossRef Kass RE, Raftery AE: Bayes Factors. Journal of the American Statistical Association. 1995, 90: 773-795. 10.2307/2291091.CrossRef
10.
go back to reference Jeffreys H: Theory of Probability. 1961, Oxford, U.K.: Clarendon Press, 3 Jeffreys H: Theory of Probability. 1961, Oxford, U.K.: Clarendon Press, 3
11.
go back to reference R Development Core Team: R: A Language and Environment for Statistical Computing. 2010, R Foundation for Statistical Computing, Vienna, Austria, [ISBN 3-900051-07-0], [http://www.R-project.org] R Development Core Team: R: A Language and Environment for Statistical Computing. 2010, R Foundation for Statistical Computing, Vienna, Austria, [ISBN 3-900051-07-0], [http://​www.​R-project.​org]
13.
go back to reference Raftery AE: Bayesian model selection in social research. Sociological Methodology. 1995, 25: 111-163. 10.2307/271063.CrossRef Raftery AE: Bayesian model selection in social research. Sociological Methodology. 1995, 25: 111-163. 10.2307/271063.CrossRef
15.
go back to reference Viallefont V, Raftery AE, Richardson S: Variable selection and Bayesian model averaging in case-control studies. Statistics in Medicine. 2001, 20 (21): 3215-3230. 10.1002/sim.976.CrossRefPubMed Viallefont V, Raftery AE, Richardson S: Variable selection and Bayesian model averaging in case-control studies. Statistics in Medicine. 2001, 20 (21): 3215-3230. 10.1002/sim.976.CrossRefPubMed
17.
go back to reference Madigan D, Raftery AE: Model Selection and Accounting for Model Uncertainity in Graphical Models Using Occam's Window. Journal of the American Statistical Association. 1994, 89: 1535-1546. 10.2307/2291017. [Pdf]CrossRef Madigan D, Raftery AE: Model Selection and Accounting for Model Uncertainity in Graphical Models Using Occam's Window. Journal of the American Statistical Association. 1994, 89: 1535-1546. 10.2307/2291017. [Pdf]CrossRef
Metadata
Title
Model selection in Medical Research: A simulation study comparing Bayesian Model Averaging and Stepwise Regression
Authors
Anna Genell
Szilard Nemes
Gunnar Steineck
Paul W Dickman
Publication date
01-12-2010
Publisher
BioMed Central
Published in
BMC Medical Research Methodology / Issue 1/2010
Electronic ISSN: 1471-2288
DOI
https://doi.org/10.1186/1471-2288-10-108

Other articles of this Issue 1/2010

BMC Medical Research Methodology 1/2010 Go to the issue