Skip to main content
Top
Published in: BMC Medical Research Methodology 1/2019

Open Access 01-12-2019 | Research article

The relationship between statistical power and predictor distribution in multilevel logistic regression: a simulation-based approach

Authors: Oscar L. Olvera Astivia, Anne Gadermann, Martin Guhn

Published in: BMC Medical Research Methodology | Issue 1/2019

Login to get access

Abstract

Background

Despite its popularity, issues concerning the estimation of power in multilevel logistic regression models are prevalent because of the complexity involved in its calculation (i.e., computer-simulation-based approaches). These issues are further compounded by the fact that the distribution of the predictors can play a role in the power to estimate these effects. To address both matters, we present a sample of cases documenting the influence that predictor distribution have on statistical power as well as a user-friendly, web-based application to conduct power analysis for multilevel logistic regression.

Method

Computer simulations are implemented to estimate statistical power in multilevel logistic regression with varying numbers of clusters, varying cluster sample sizes, and non-normal and non-symmetrical distributions of the Level 1/2 predictors. Power curves were simulated to see in what ways non-normal/unbalanced distributions of a binary predictor and a continuous predictor affect the detection of population effect sizes for main effects, a cross-level interaction and the variance of the random effects.

Results

Skewed continuous predictors and unbalanced binary ones require larger sample sizes at both levels than balanced binary predictors and normally-distributed continuous ones. In the most extreme case of imbalance (10% incidence) and skewness of a chi-square distribution with 1 degree of freedom, even 110 Level 2 units and 100 Level 1 units were not sufficient for all predictors to reach power of 80%, mostly hovering at around 50% with the exception of the skewed, continuous Level 2 predictor.

Conclusions

Given the complex interactive influence among sample sizes, effect sizes and predictor distribution characteristics, it seems unwarranted to make generic rule-of-thumb sample size recommendations for multilevel logistic regression, aside from the fact that larger sample sizes are required when the distributions of the predictors are not symmetric or balanced. The more skewed or imbalanced the predictor is, the larger the sample size requirements. To assist researchers in planning research studies, a user-friendly web application that conducts power analysis via computer simulations in the R programming language is provided. With this web application, users can conduct simulations, tailored to their study design, to estimate statistical power for multilevel logistic regression models.
Footnotes
1
Preliminary simulations were conducted to ensure that Type I error rate was maintained. Symmetric distributions with non-zero kurtosis were also examined. No detrimental effect on the power of the tests was found from these distributions.
 
Literature
1.
go back to reference Forer B, Zumbo BD. Validation of multilevel constructs: validation methods and empirical findings for the edi. Soc Indic Res. 2011;103(2):231.CrossRef Forer B, Zumbo BD. Validation of multilevel constructs: validation methods and empirical findings for the edi. Soc Indic Res. 2011;103(2):231.CrossRef
3.
go back to reference Wu AD, Liu Y, Gadermann AM, Zumbo BD. Multiple-indicator multilevel growth model: a solution to multiple methodological challenges in longitudinal studies. Soc Indic Res. 2010;97(2):123–42.CrossRef Wu AD, Liu Y, Gadermann AM, Zumbo BD. Multiple-indicator multilevel growth model: a solution to multiple methodological challenges in longitudinal studies. Soc Indic Res. 2010;97(2):123–42.CrossRef
4.
go back to reference Zappa P, Lomi A. The analysis of multilevel networks in organizations: models and empirical tests. Organ Res Methods. 2015;18(3):542–69.CrossRef Zappa P, Lomi A. The analysis of multilevel networks in organizations: models and empirical tests. Organ Res Methods. 2015;18(3):542–69.CrossRef
5.
go back to reference McNeish D, Stapleton LM, Silverman RD. On the unnecessary ubiquity of hierarchical linear modeling. Psychol Methods. 2017;22(1):114.CrossRef McNeish D, Stapleton LM, Silverman RD. On the unnecessary ubiquity of hierarchical linear modeling. Psychol Methods. 2017;22(1):114.CrossRef
6.
go back to reference Dunn EC, Masyn KE, Yudron M, Jones SM, Subramanian S. Translating multilevel theory into multilevel research: challenges and opportunities for understanding the social determinants of psychiatric disorders. Soc Psychiatry Psychiatr Epidemiol. 2014;49(6):859–72.CrossRef Dunn EC, Masyn KE, Yudron M, Jones SM, Subramanian S. Translating multilevel theory into multilevel research: challenges and opportunities for understanding the social determinants of psychiatric disorders. Soc Psychiatry Psychiatr Epidemiol. 2014;49(6):859–72.CrossRef
7.
go back to reference Schmidt-Catran AW, Fairbrother M. The random effects in multilevel models: getting them wrong and getting them right. Eur Sociol Rev. 2015;32(1):23–38.CrossRef Schmidt-Catran AW, Fairbrother M. The random effects in multilevel models: getting them wrong and getting them right. Eur Sociol Rev. 2015;32(1):23–38.CrossRef
8.
go back to reference Muthén LK, Muthén BO. How to use a Monte Carlo study to decide on sample size and determine power. Struct Equ Model. 2002;9(4):599–620.CrossRef Muthén LK, Muthén BO. How to use a Monte Carlo study to decide on sample size and determine power. Struct Equ Model. 2002;9(4):599–620.CrossRef
9.
go back to reference Maas CJ, Hox JJ. Sufficient sample sizes for multilevel modeling. Methodol Eur J Res Methods Behav Soc Sci. 2005;1(3):86. Maas CJ, Hox JJ. Sufficient sample sizes for multilevel modeling. Methodol Eur J Res Methods Behav Soc Sci. 2005;1(3):86.
10.
go back to reference Pacagnella O. Sample size and accuracy of estimates in multilevel models: new simulation results. Methodol Eur J Res Methods Behav Soc Sci. 2011;7:111–20. Pacagnella O. Sample size and accuracy of estimates in multilevel models: new simulation results. Methodol Eur J Res Methods Behav Soc Sci. 2011;7:111–20.
11.
go back to reference Hox JJ, Moerbeek M, van de Schoot R. Multilevel analysis: techniques and applications. New York: Routledge; 2017.CrossRef Hox JJ, Moerbeek M, van de Schoot R. Multilevel analysis: techniques and applications. New York: Routledge; 2017.CrossRef
12.
go back to reference Debrot A, Meuwly N, Muise A, Impett EA, Schoebi D. More than just sex: affection mediates the association between sexual activity and well-being. Personal Soc Psychol Bull. 2017;43(3):287–99.CrossRef Debrot A, Meuwly N, Muise A, Impett EA, Schoebi D. More than just sex: affection mediates the association between sexual activity and well-being. Personal Soc Psychol Bull. 2017;43(3):287–99.CrossRef
13.
go back to reference Lee MCC. Job resources as a mediator between management trust climate and employees’ well-being: a cross-sectional multilevel approach. Asian Acad Manag J. 2017;22(2):27–52.CrossRef Lee MCC. Job resources as a mediator between management trust climate and employees’ well-being: a cross-sectional multilevel approach. Asian Acad Manag J. 2017;22(2):27–52.CrossRef
14.
go back to reference Bell BA, Morgan GB, Schoeneberger JA, Kromrey JD, Ferron JM. How low can you go? An investigation of the influence of sample size and model complexity on point and interval estimates in two-level linear models. Methodol Eur J Res Methods Behav Soc Sci. 2014;10(1):1. Bell BA, Morgan GB, Schoeneberger JA, Kromrey JD, Ferron JM. How low can you go? An investigation of the influence of sample size and model complexity on point and interval estimates in two-level linear models. Methodol Eur J Res Methods Behav Soc Sci. 2014;10(1):1.
15.
go back to reference Bell B, Schoeneberger J, Morgan G, Ferron J, Kromrey J, editors. N≤ 30: impact of small level-1 and level-2 sample sizes on estimates in two-level multilevel models. Presentation at the American education research association conference, Denver, Co; 2010. Bell B, Schoeneberger J, Morgan G, Ferron J, Kromrey J, editors. N≤ 30: impact of small level-1 and level-2 sample sizes on estimates in two-level multilevel models. Presentation at the American education research association conference, Denver, Co; 2010.
16.
go back to reference Austin PC. A comparison of the statistical power of different methods for the analysis of cluster randomization trials with binary outcomes. Stat Med. 2007;26(19):3550–65.CrossRef Austin PC. A comparison of the statistical power of different methods for the analysis of cluster randomization trials with binary outcomes. Stat Med. 2007;26(19):3550–65.CrossRef
17.
go back to reference Austin PC. Estimating multilevel logistic regression models when the number of clusters is low: a comparison of different statistical software procedures. Int J Biostat. 2010;6(1):Article 16.PubMed Austin PC. Estimating multilevel logistic regression models when the number of clusters is low: a comparison of different statistical software procedures. Int J Biostat. 2010;6(1):Article 16.PubMed
20.
go back to reference Schoeneberger JA. The impact of sample size and other factors when estimating multilevel logistic models. J Exp Educ. 2016;84(2):373–97.CrossRef Schoeneberger JA. The impact of sample size and other factors when estimating multilevel logistic models. J Exp Educ. 2016;84(2):373–97.CrossRef
21.
go back to reference Burton A, Altman DG, Royston P, Holder RL. The design of simulation studies in medical statistics. Stat Med. 2006;25(24):4279–92.CrossRef Burton A, Altman DG, Royston P, Holder RL. The design of simulation studies in medical statistics. Stat Med. 2006;25(24):4279–92.CrossRef
22.
go back to reference Diaz RE. Comparison of pql and Laplace 6 estimates of hierarchical linear models when comparing groups of small incident rates in cluster randomised trials. Comput Stat Data Anal. 2007;51(6):2871–88.CrossRef Diaz RE. Comparison of pql and Laplace 6 estimates of hierarchical linear models when comparing groups of small incident rates in cluster randomised trials. Comput Stat Data Anal. 2007;51(6):2871–88.CrossRef
23.
go back to reference Guo G, Zhao H. Multilevel modeling for binary data. Annu Rev Sociol. 2000;26(1):441–62.CrossRef Guo G, Zhao H. Multilevel modeling for binary data. Annu Rev Sociol. 2000;26(1):441–62.CrossRef
24.
go back to reference Aguinis H, Gottfredson RK, Culpepper SA. Best-practice recommendations for estimating cross-level interaction effects using multilevel modeling. J Manag. 2013;39(6):1490–528. Aguinis H, Gottfredson RK, Culpepper SA. Best-practice recommendations for estimating cross-level interaction effects using multilevel modeling. J Manag. 2013;39(6):1490–528.
25.
go back to reference Abbott EF, Serrano VP, Rethlefsen ML, Pandian T, Naik ND, West CP, et al. Trends in p value, confidence interval, and power analysis reporting in health professions education research reports: a systematic appraisal. Acad Med. 2018;93(2):314–23.CrossRef Abbott EF, Serrano VP, Rethlefsen ML, Pandian T, Naik ND, West CP, et al. Trends in p value, confidence interval, and power analysis reporting in health professions education research reports: a systematic appraisal. Acad Med. 2018;93(2):314–23.CrossRef
26.
go back to reference Kain MP, Bolker BM, MW MC. A practical guide and power analysis for glmms: detecting among treatment variation in random effects. PeerJ. 2015;3:e1226.CrossRef Kain MP, Bolker BM, MW MC. A practical guide and power analysis for glmms: detecting among treatment variation in random effects. PeerJ. 2015;3:e1226.CrossRef
27.
go back to reference Kraemer HC, Blasey C. How many subjects?: statistical power analysis in research. Thousand Oaks: SAGE Publications; 2015. Kraemer HC, Blasey C. How many subjects?: statistical power analysis in research. Thousand Oaks: SAGE Publications; 2015.
28.
go back to reference Marsh HW, Lüdtke O, Nagengast B, Trautwein U, Morin AJ, Abduljabbar AS, et al. Classroom climate and contextual effects: conceptual and methodological issues in the evaluation of group-level effects. Educ Psychol. 2012;47(2):106–24.CrossRef Marsh HW, Lüdtke O, Nagengast B, Trautwein U, Morin AJ, Abduljabbar AS, et al. Classroom climate and contextual effects: conceptual and methodological issues in the evaluation of group-level effects. Educ Psychol. 2012;47(2):106–24.CrossRef
29.
go back to reference Cohen J. Statistical power analysis. Curr Dir Psychol Sci. 1992;1(3):98–101.CrossRef Cohen J. Statistical power analysis. Curr Dir Psychol Sci. 1992;1(3):98–101.CrossRef
30.
go back to reference Micceri T. The unicorn, the normal curve, and other improbable creatures. Psychol Bull. 1989;105(1):156.CrossRef Micceri T. The unicorn, the normal curve, and other improbable creatures. Psychol Bull. 1989;105(1):156.CrossRef
31.
go back to reference Blanca MJ, Arnau J, López-Montiel D, Bono R, Bendayan R. Skewness and kurtosis in real data samples. Methodology. 2013;9:78–84.CrossRef Blanca MJ, Arnau J, López-Montiel D, Bono R, Bendayan R. Skewness and kurtosis in real data samples. Methodology. 2013;9:78–84.CrossRef
32.
go back to reference Landau S, Stahl D. Sample size and power calculations for medical studies by simulation when closed form expressions are not available. Stat Methods Med Res. 2013;22(3):324–45.CrossRef Landau S, Stahl D. Sample size and power calculations for medical studies by simulation when closed form expressions are not available. Stat Methods Med Res. 2013;22(3):324–45.CrossRef
33.
go back to reference Ellis PD. The essential guide to effect sizes: statistical power, meta-analysis, and the interpretation of research results. Cambridge: Cambridge University Press; 2010.CrossRef Ellis PD. The essential guide to effect sizes: statistical power, meta-analysis, and the interpretation of research results. Cambridge: Cambridge University Press; 2010.CrossRef
34.
go back to reference Demidenko E. Sample size determination for logistic regression revisited. Stat Med. 2007;26(18):3385–97.CrossRef Demidenko E. Sample size determination for logistic regression revisited. Stat Med. 2007;26(18):3385–97.CrossRef
Metadata
Title
The relationship between statistical power and predictor distribution in multilevel logistic regression: a simulation-based approach
Authors
Oscar L. Olvera Astivia
Anne Gadermann
Martin Guhn
Publication date
01-12-2019
Publisher
BioMed Central
Published in
BMC Medical Research Methodology / Issue 1/2019
Electronic ISSN: 1471-2288
DOI
https://doi.org/10.1186/s12874-019-0742-8

Other articles of this Issue 1/2019

BMC Medical Research Methodology 1/2019 Go to the issue