Skip to main content
Top
Published in: BMC Medical Research Methodology 1/2017

Open Access 01-12-2017 | Research Article

Binomial outcomes in dataset with some clusters of size two: can the dependence of twins be accounted for? A simulation study comparing the reliability of statistical methods based on a dataset of preterm infants

Authors: Odile Sauzet, Janet L. Peacock

Published in: BMC Medical Research Methodology | Issue 1/2017

Login to get access

Abstract

Background

The analysis of perinatal outcomes often involves datasets with some multiple births. These are datasets mostly formed of independent observations and a limited number of clusters of size two (twins) and maybe of size three or more. This non-independence needs to be accounted for in the statistical analysis. Using simulated data based on a dataset of preterm infants we have previously investigated the performance of several approaches to the analysis of continuous outcomes in the presence of some clusters of size two. Mixed models have been developed for binomial outcomes but very little is known about their reliability when only a limited number of small clusters are present.

Methods

Using simulated data based on a dataset of preterm infants we investigated the performance of several approaches to the analysis of binomial outcomes in the presence of some clusters of size two. Logistic models, several methods of estimation for the logistic random intercept models and generalised estimating equations were compared.

Results

The presence of even a small percentage of twins means that a logistic regression model will underestimate all parameters but a logistic random intercept model fails to estimate the correlation between siblings if the percentage of twins is too small and will provide similar estimates to logistic regression. The method which seems to provide the best balance between estimation of the standard error and the parameter for any percentage of twins is the generalised estimating equations.

Conclusions

This study has shown that the number of covariates or the level two variance do not necessarily affect the performance of the various methods used to analyse datasets containing twins but when the percentage of small clusters is too small, mixed models cannot capture the dependence between siblings.
Literature
1.
go back to reference Kenny DA, Mannetti L, Pierro A, Livi S, Kashy DA. The statistical analysis of data from small groups. J Pers Soc Psychol. 2002; 83(1):126–37.CrossRefPubMed Kenny DA, Mannetti L, Pierro A, Livi S, Kashy DA. The statistical analysis of data from small groups. J Pers Soc Psychol. 2002; 83(1):126–37.CrossRefPubMed
2.
go back to reference Moerbeek M, van Breukelen, Gerard JP, Berger, Martijn PF. A comparison between traditional methods and multilevel regression for the analysis of multicenter intervention studies. J Clin Epidemiol. 2003; 56(4):341–50.CrossRefPubMed Moerbeek M, van Breukelen, Gerard JP, Berger, Martijn PF. A comparison between traditional methods and multilevel regression for the analysis of multicenter intervention studies. J Clin Epidemiol. 2003; 56(4):341–50.CrossRefPubMed
4.
go back to reference Peters TJ, Richards SH, Bankhead CR, Ades AE, Sterne JAC. Comparison of methods for analysing cluster randomized trials: an example involving a factorial design. Int J Epidemiol. 2003; 32(5):840–6.CrossRefPubMed Peters TJ, Richards SH, Bankhead CR, Ades AE, Sterne JAC. Comparison of methods for analysing cluster randomized trials: an example involving a factorial design. Int J Epidemiol. 2003; 32(5):840–6.CrossRefPubMed
5.
go back to reference Johnson AH, Peacock JL, Greenough A, Marlow N, Limb ES, Marston L, Calvert SA. High-frequency oscillatory ventilation for the prevention of chronic lung disease of prematurity. N Engl J Med. 2002; 347(9):633–42.CrossRefPubMed Johnson AH, Peacock JL, Greenough A, Marlow N, Limb ES, Marston L, Calvert SA. High-frequency oscillatory ventilation for the prevention of chronic lung disease of prematurity. N Engl J Med. 2002; 347(9):633–42.CrossRefPubMed
6.
go back to reference Zivanovic S, Peacock J, Alcazar-Paris M, Lo JW, Lunt A, Marlow N, Calvert S, Greenough A. Late outcomes of a randomized trial of high-frequency oscillation in neonates. N Engl J Med. 2014; 370(12):1121–1130.CrossRefPubMedPubMedCentral Zivanovic S, Peacock J, Alcazar-Paris M, Lo JW, Lunt A, Marlow N, Calvert S, Greenough A. Late outcomes of a randomized trial of high-frequency oscillation in neonates. N Engl J Med. 2014; 370(12):1121–1130.CrossRefPubMedPubMedCentral
7.
go back to reference Pendergast JF, Gange SJ, Newton MA, Lindstrom MJ, Palta M, Fisher MR. A survey of methods for analyzing clustered binary response data. Int Stat Rev/Rev Int Stat. 1996; 64(1):89–118.CrossRef Pendergast JF, Gange SJ, Newton MA, Lindstrom MJ, Palta M, Fisher MR. A survey of methods for analyzing clustered binary response data. Int Stat Rev/Rev Int Stat. 1996; 64(1):89–118.CrossRef
8.
go back to reference Sauzet O, Wright KC, Marston L, Brocklehurst P, Peacock JL. Modelling the hierarchical structure in datasets with very small clusters: a simulation study to explore the effect of the proportion of clusters when the outcome is continuous. Stat Med. 2013; 32(8):1429–1438.CrossRefPubMed Sauzet O, Wright KC, Marston L, Brocklehurst P, Peacock JL. Modelling the hierarchical structure in datasets with very small clusters: a simulation study to explore the effect of the proportion of clusters when the outcome is continuous. Stat Med. 2013; 32(8):1429–1438.CrossRefPubMed
9.
go back to reference Gates S, Brocklehurst P. How should randomised trials including multiple pregnancies be analysed?BJOG Int J Obstet Gynaecol. 2004; 111(3):213–9.CrossRef Gates S, Brocklehurst P. How should randomised trials including multiple pregnancies be analysed?BJOG Int J Obstet Gynaecol. 2004; 111(3):213–9.CrossRef
10.
go back to reference Carlin JB, Gurrin LC, Sterne JA, Morley R, Dwyer T. Regression models for twin studies: a critical review. Int J Epidemiol. 2005; 34(5):1089–1099.CrossRefPubMed Carlin JB, Gurrin LC, Sterne JA, Morley R, Dwyer T. Regression models for twin studies: a critical review. Int J Epidemiol. 2005; 34(5):1089–1099.CrossRefPubMed
11.
go back to reference Louis GB, Dukic V, Heagerty PJ, Louis TA, Lynch CD, Ryan LM, Schisterman EF, Trumble A. Analysis of repeated pregnancy outcomes. Stat Methods Med Res. 2006; 15(2):103–26.CrossRefPubMed Louis GB, Dukic V, Heagerty PJ, Louis TA, Lynch CD, Ryan LM, Schisterman EF, Trumble A. Analysis of repeated pregnancy outcomes. Stat Methods Med Res. 2006; 15(2):103–26.CrossRefPubMed
13.
go back to reference Ananth CV, Platt RW, Savitz DA. Regression models for clustered binary responses: implications of ignoring the intra-cluster correlation in an analysis of perinatal mortality in twin gestations. Ann Epidemiol. 2005; 15(4):293–301.CrossRefPubMed Ananth CV, Platt RW, Savitz DA. Regression models for clustered binary responses: implications of ignoring the intra-cluster correlation in an analysis of perinatal mortality in twin gestations. Ann Epidemiol. 2005; 15(4):293–301.CrossRefPubMed
14.
go back to reference Xu Y, Lee CF, Cheung YB. Analyzing binary outcome data with small clusters: A simulation study. Commun Stat Simul Comput. 2014; 43(7):1771–1782.CrossRef Xu Y, Lee CF, Cheung YB. Analyzing binary outcome data with small clusters: A simulation study. Commun Stat Simul Comput. 2014; 43(7):1771–1782.CrossRef
15.
go back to reference McNeish DM. Modeling sparsely clustered data: Design-based, model-based, and single-level methods. Psychol Methods. 2014; 19(4):552–63.CrossRefPubMed McNeish DM. Modeling sparsely clustered data: Design-based, model-based, and single-level methods. Psychol Methods. 2014; 19(4):552–63.CrossRefPubMed
16.
go back to reference Casey BM, McIntire DD, Leveno KJ. The Continuing Value of the Apgar Score for the Assessment of Newborn Infants. N Engl J Med. 2001; 344(7):467–71.CrossRefPubMed Casey BM, McIntire DD, Leveno KJ. The Continuing Value of the Apgar Score for the Assessment of Newborn Infants. N Engl J Med. 2001; 344(7):467–71.CrossRefPubMed
17.
go back to reference Bates D, Mächler M, Bolker B. Fitting Linear Mixed-Effects Models Using lme4. J Stat Softw. 2015; 67(1):1–48.CrossRef Bates D, Mächler M, Bolker B. Fitting Linear Mixed-Effects Models Using lme4. J Stat Softw. 2015; 67(1):1–48.CrossRef
18.
go back to reference Diggle PJ, Heagerty P, Liang K-Y, Zeger SL. Analysis of longitudinal data. Oxford: Oxford University Press; 2002. Diggle PJ, Heagerty P, Liang K-Y, Zeger SL. Analysis of longitudinal data. Oxford: Oxford University Press; 2002.
19.
go back to reference Burton A, Altman DG, Royston P, Holder RL. The design of simulation studies in medical statistics. Stat Med. 2006; 25:4279–292.CrossRefPubMed Burton A, Altman DG, Royston P, Holder RL. The design of simulation studies in medical statistics. Stat Med. 2006; 25:4279–292.CrossRefPubMed
20.
go back to reference Peacock JL, Sauzet O, Ewings SM, Kerry SM. Dichotomising continuous data while retaining statistical power using a distributional approach. Stat Med. 2012; 31(26):3089–103.CrossRefPubMed Peacock JL, Sauzet O, Ewings SM, Kerry SM. Dichotomising continuous data while retaining statistical power using a distributional approach. Stat Med. 2012; 31(26):3089–103.CrossRefPubMed
21.
go back to reference Sauzet O, Peacock JL. Estimating dichotomised outcomes in two groups with unequal variances: a distributional approach. Stat Med. 2014; 33(26):4547–59.CrossRefPubMed Sauzet O, Peacock JL. Estimating dichotomised outcomes in two groups with unequal variances: a distributional approach. Stat Med. 2014; 33(26):4547–59.CrossRefPubMed
22.
go back to reference Pan J, Thompson R. Gauss-hermite quadrature approximation for estimation in generalised linear mixed models. Comput Stat. 2003; 18(1):57–78.CrossRef Pan J, Thompson R. Gauss-hermite quadrature approximation for estimation in generalised linear mixed models. Comput Stat. 2003; 18(1):57–78.CrossRef
23.
go back to reference Delyon B, Lavielle M, Moulines E. Convergence of a stochastic approximation version of the EM algorithm. Ann Statist. 1999; 27(1):94–128.CrossRef Delyon B, Lavielle M, Moulines E. Convergence of a stochastic approximation version of the EM algorithm. Ann Statist. 1999; 27(1):94–128.CrossRef
Metadata
Title
Binomial outcomes in dataset with some clusters of size two: can the dependence of twins be accounted for? A simulation study comparing the reliability of statistical methods based on a dataset of preterm infants
Authors
Odile Sauzet
Janet L. Peacock
Publication date
01-12-2017
Publisher
BioMed Central
Published in
BMC Medical Research Methodology / Issue 1/2017
Electronic ISSN: 1471-2288
DOI
https://doi.org/10.1186/s12874-017-0369-6

Other articles of this Issue 1/2017

BMC Medical Research Methodology 1/2017 Go to the issue