Skip to main content
Top
Published in: BMC Medical Research Methodology 1/2016

Open Access 01-12-2016 | Research Article

Can the buck always be passed to the highest level of clustering?

Authors: Christian Bottomley, Matthew J. Kirby, Steve W. Lindsay, Neal Alexander

Published in: BMC Medical Research Methodology | Issue 1/2016

Login to get access

Abstract

Background

Clustering commonly affects the uncertainty of parameter estimates in epidemiological studies. Cluster-robust variance estimates (CRVE) are used to construct confidence intervals that account for single-level clustering, and are easily implemented in standard software. When data are clustered at more than one level (e.g. village and household) the level for the CRVE must be chosen. CRVE are consistent when used at the higher level of clustering (village), but since there are fewer clusters at the higher level, and consistency is an asymptotic property, there may be circumstances under which coverage is better from lower- rather than higher-level CRVE. Here we assess the relative importance of adjusting for clustering at the higher and lower level in a logistic regression model.

Methods

We performed a simulation study in which the coverage of 95 % confidence intervals was compared between adjustments at the higher and lower levels.

Results

Confidence intervals adjusted for the higher level of clustering had coverage close to 95 %, even when there were few clusters, provided that the intra-cluster correlation of the predictor was less than 0.5 for models with a single predictor and less than 0.2 for models with multiple predictors.

Conclusions

When there are multiple levels of clustering it is generally preferable to use confidence intervals that account for the highest level of clustering. This only fails if there are few clusters at this level and the intra-cluster correlation of the predictor is high.
Appendix
Available only for authorised users
Literature
1.
go back to reference Eldridge SM, Ukoumunne OC, Carlin JB. The intra-cluster correlation coefficient in cluster randomised trials: a review of definitions. Int Stat Rev. 2009; 77(3):378–394.CrossRef Eldridge SM, Ukoumunne OC, Carlin JB. The intra-cluster correlation coefficient in cluster randomised trials: a review of definitions. Int Stat Rev. 2009; 77(3):378–394.CrossRef
2.
go back to reference Moulton BR. Random group effects and the precision of regression estimates. J Econ. 1986; 32(3):385–397.CrossRef Moulton BR. Random group effects and the precision of regression estimates. J Econ. 1986; 32(3):385–397.CrossRef
3.
go back to reference Scott AJ, Holt D. The effect of two-stage sampling on ordinary least squares methods. J Am Stat Assoc. 1982; 77(380):848–854.CrossRef Scott AJ, Holt D. The effect of two-stage sampling on ordinary least squares methods. J Am Stat Assoc. 1982; 77(380):848–854.CrossRef
4.
go back to reference Liang KY, Zeger SL. Longitudinal data analysis using generalized linear models. Biometrika. 1986; 73(1):13–22.CrossRef Liang KY, Zeger SL. Longitudinal data analysis using generalized linear models. Biometrika. 1986; 73(1):13–22.CrossRef
5.
go back to reference Angrist JD, Pischke JS. Mostly harmless econometrics. 6 Oxford Street, Woodstock, Oxfordshire OX20 1TW: Princeton University Press; 2009. Angrist JD, Pischke JS. Mostly harmless econometrics. 6 Oxford Street, Woodstock, Oxfordshire OX20 1TW: Princeton University Press; 2009.
6.
go back to reference Bell RM, McCaffrey DF. Bias reduction in standard errors for linear regression with multi-stage samples. Surv Methodol. 2002; 28(2):169–181. Bell RM, McCaffrey DF. Bias reduction in standard errors for linear regression with multi-stage samples. Surv Methodol. 2002; 28(2):169–181.
7.
go back to reference Hubbard AE, Ahern J, Fleischer NL, Van der Laan M, Lippman SA, Jewell N, et al.To GEE or not to GEE: comparing population average and mixed models for estimating the associations between neighborhood risk factors and health. Epidemiology. 2010; 21(4):467–74.CrossRefPubMed Hubbard AE, Ahern J, Fleischer NL, Van der Laan M, Lippman SA, Jewell N, et al.To GEE or not to GEE: comparing population average and mixed models for estimating the associations between neighborhood risk factors and health. Epidemiology. 2010; 21(4):467–74.CrossRefPubMed
8.
go back to reference Zeger SL, Liang KY, Albert PS. Models for longitudinal data: a generalized estimating equation approach. Biometrics. 1988; 44(4):1049–60.CrossRefPubMed Zeger SL, Liang KY, Albert PS. Models for longitudinal data: a generalized estimating equation approach. Biometrics. 1988; 44(4):1049–60.CrossRefPubMed
9.
go back to reference Adams G, Gulliford MC, Ukoumunne OC, Eldridge S, Chinn S, Campbell MJ. Patterns of intra-cluster correlation from primary care research to inform study design and analysis. J Clin Epidemiol. 2004; 57(8):785–94.CrossRefPubMed Adams G, Gulliford MC, Ukoumunne OC, Eldridge S, Chinn S, Campbell MJ. Patterns of intra-cluster correlation from primary care research to inform study design and analysis. J Clin Epidemiol. 2004; 57(8):785–94.CrossRefPubMed
12.
go back to reference Kirby MJ, Ameh D, Bottomley C, Green C, Jawara M, Milligan PJ, et al.Effect of two different house screening interventions on exposure to malaria vectors and on anaemia in children in The Gambia: a randomised controlled trial. Lancet. 2009; 374(9694):998–1009.CrossRefPubMedPubMedCentral Kirby MJ, Ameh D, Bottomley C, Green C, Jawara M, Milligan PJ, et al.Effect of two different house screening interventions on exposure to malaria vectors and on anaemia in children in The Gambia: a randomised controlled trial. Lancet. 2009; 374(9694):998–1009.CrossRefPubMedPubMedCentral
13.
go back to reference Pan W, Wall MM. Small-sample adjustments in using the sandwich variance estimator in generalized estimating equations. Stat Med. 2002; 21(10):1429–1441.CrossRefPubMed Pan W, Wall MM. Small-sample adjustments in using the sandwich variance estimator in generalized estimating equations. Stat Med. 2002; 21(10):1429–1441.CrossRefPubMed
14.
go back to reference McCaffrey DF, Bell RM. Improved hypothesis testing for coefficients in generalized estimating equations with small samples of clusters. Stat Med. 2006; 25(23):4081–4098.CrossRefPubMed McCaffrey DF, Bell RM. Improved hypothesis testing for coefficients in generalized estimating equations with small samples of clusters. Stat Med. 2006; 25(23):4081–4098.CrossRefPubMed
15.
go back to reference Fay MP, Graubard BI. Small-sample adjustments for Wald-type tests using sandwich estimators. Biometrics. 2001; 57(4):1198–1206.CrossRefPubMed Fay MP, Graubard BI. Small-sample adjustments for Wald-type tests using sandwich estimators. Biometrics. 2001; 57(4):1198–1206.CrossRefPubMed
16.
go back to reference Mancl LA, DeRouen TA. A covariance estimator for GEE with improved small-sample properties. Biometrics. 2001; 57(1):126–134.CrossRefPubMed Mancl LA, DeRouen TA. A covariance estimator for GEE with improved small-sample properties. Biometrics. 2001; 57(1):126–134.CrossRefPubMed
17.
go back to reference Cameron AC, Miller DL. A practitioner’s guide to cluster-robust inference. J Hum Resour. 2015; 50(2):317–372.CrossRef Cameron AC, Miller DL. A practitioner’s guide to cluster-robust inference. J Hum Resour. 2015; 50(2):317–372.CrossRef
18.
go back to reference Qaqish BF, Liang KY. Marginal models for correlated binary responses with multiple classes and multiple levels of nesting. Biometrics. 1992; 48(3):939–50.CrossRefPubMed Qaqish BF, Liang KY. Marginal models for correlated binary responses with multiple classes and multiple levels of nesting. Biometrics. 1992; 48(3):939–50.CrossRefPubMed
19.
20.
go back to reference Stoner JA, Leroux BG, Puumala M. Optimal combination of estimating equations in the analysis of multilevel nested correlated data. Stat Med. 2010; 29(4):464–73.PubMedPubMedCentral Stoner JA, Leroux BG, Puumala M. Optimal combination of estimating equations in the analysis of multilevel nested correlated data. Stat Med. 2010; 29(4):464–73.PubMedPubMedCentral
21.
go back to reference McDonald BW. Estimating logistic regression parameters for bivariate binary data. J R Stat Soc Ser B. 1993; 55(2):391–397. McDonald BW. Estimating logistic regression parameters for bivariate binary data. J R Stat Soc Ser B. 1993; 55(2):391–397.
22.
go back to reference Fitzmaurice GM. A caveat concerning independence estimating equations with multivariate binary data. Biometrics. 1995; 51(1):309–317.CrossRefPubMed Fitzmaurice GM. A caveat concerning independence estimating equations with multivariate binary data. Biometrics. 1995; 51(1):309–317.CrossRefPubMed
Metadata
Title
Can the buck always be passed to the highest level of clustering?
Authors
Christian Bottomley
Matthew J. Kirby
Steve W. Lindsay
Neal Alexander
Publication date
01-12-2016
Publisher
BioMed Central
Published in
BMC Medical Research Methodology / Issue 1/2016
Electronic ISSN: 1471-2288
DOI
https://doi.org/10.1186/s12874-016-0127-1

Other articles of this Issue 1/2016

BMC Medical Research Methodology 1/2016 Go to the issue