Skip to main content
Top
Published in: BMC Medical Research Methodology 1/2013

Open Access 01-12-2013 | Research article

Prediction models for clustered data: comparison of a random intercept and standard regression model

Authors: Walter Bouwmeester, Jos WR Twisk, Teus H Kappen, Wilton A van Klei, Karel GM Moons, Yvonne Vergouwe

Published in: BMC Medical Research Methodology | Issue 1/2013

Login to get access

Abstract

Background

When study data are clustered, standard regression analysis is considered inappropriate and analytical techniques for clustered data need to be used. For prediction research in which the interest of predictor effects is on the patient level, random effect regression models are probably preferred over standard regression analysis. It is well known that the random effect parameter estimates and the standard logistic regression parameter estimates are different. Here, we compared random effect and standard logistic regression models for their ability to provide accurate predictions.

Methods

Using an empirical study on 1642 surgical patients at risk of postoperative nausea and vomiting, who were treated by one of 19 anesthesiologists (clusters), we developed prognostic models either with standard or random intercept logistic regression. External validity of these models was assessed in new patients from other anesthesiologists. We supported our results with simulation studies using intra-class correlation coefficients (ICC) of 5%, 15%, or 30%. Standard performance measures and measures adapted for the clustered data structure were estimated.

Results

The model developed with random effect analysis showed better discrimination than the standard approach, if the cluster effects were used for risk prediction (standard c-index of 0.69 versus 0.66). In the external validation set, both models showed similar discrimination (standard c-index 0.68 versus 0.67). The simulation study confirmed these results. For datasets with a high ICC (≥15%), model calibration was only adequate in external subjects, if the used performance measure assumed the same data structure as the model development method: standard calibration measures showed good calibration for the standard developed model, calibration measures adapting the clustered data structure showed good calibration for the prediction model with random intercept.

Conclusion

The models with random intercept discriminate better than the standard model only if the cluster effect is used for predictions. The prediction model with random intercept had good calibration within clusters.
Appendix
Available only for authorised users
Literature
1.
go back to reference Steyerberg EW: Clinical prediction models; a practical approach to development, validation, and updating. 2009, New York: Springer Steyerberg EW: Clinical prediction models; a practical approach to development, validation, and updating. 2009, New York: Springer
2.
go back to reference Bouwmeester W: Reporting and methods in clinical prediction research: a systematic review. Prediction models: systematic reviews and clustered study data. 2012, Utrecht: Igitur archive Bouwmeester W: Reporting and methods in clinical prediction research: a systematic review. Prediction models: systematic reviews and clustered study data. 2012, Utrecht: Igitur archive
3.
go back to reference Sullivan LM, Dukes KA, Losina E: Tutorial in biostatistics. An introduction to hierarchical linear modelling. Stat Med. 1999, 18: 855-888. 10.1002/(SICI)1097-0258(19990415)18:7<855::AID-SIM117>3.0.CO;2-7.CrossRefPubMed Sullivan LM, Dukes KA, Losina E: Tutorial in biostatistics. An introduction to hierarchical linear modelling. Stat Med. 1999, 18: 855-888. 10.1002/(SICI)1097-0258(19990415)18:7<855::AID-SIM117>3.0.CO;2-7.CrossRefPubMed
4.
go back to reference Goldstein H: Multilevel statistical models. 1995, London: Edward Arnold Goldstein H: Multilevel statistical models. 1995, London: Edward Arnold
5.
go back to reference Moerbeek M, van Breukelen GJ, Berger MP: A comparison between traditional methods and multilevel regression for the analysis of multicenter intervention studies. J Clin Epidemiol. 2003, 56: 341-350. 10.1016/S0895-4356(03)00007-6.CrossRefPubMed Moerbeek M, van Breukelen GJ, Berger MP: A comparison between traditional methods and multilevel regression for the analysis of multicenter intervention studies. J Clin Epidemiol. 2003, 56: 341-350. 10.1016/S0895-4356(03)00007-6.CrossRefPubMed
6.
go back to reference Twisk JWR: Applied multilevel analysis. 2006, New York: Cambridge University PressCrossRef Twisk JWR: Applied multilevel analysis. 2006, New York: Cambridge University PressCrossRef
7.
go back to reference Guo G, Zhao H: Multilevel modeling for binary data. Annu Rev Sociol. 2000, 26: 441-462. 10.1146/annurev.soc.26.1.441.CrossRef Guo G, Zhao H: Multilevel modeling for binary data. Annu Rev Sociol. 2000, 26: 441-462. 10.1146/annurev.soc.26.1.441.CrossRef
8.
go back to reference Bardenheier BH, Shefer A, Barker L, Winston CA, Sionean CK: Public health application comparing multilevel analysis with logistic regression: immunization coverage among long-term care facility residents. Ann Epidemiol. 2005, 15: 749-755. 10.1016/j.annepidem.2005.03.001.CrossRefPubMed Bardenheier BH, Shefer A, Barker L, Winston CA, Sionean CK: Public health application comparing multilevel analysis with logistic regression: immunization coverage among long-term care facility residents. Ann Epidemiol. 2005, 15: 749-755. 10.1016/j.annepidem.2005.03.001.CrossRefPubMed
9.
go back to reference Liu I, Agresti A: The analysis of ordered categorical data: an overview and a survey of recent developments. Sociedad de Estadistica e Investigacion Operative Test. 2005, 14: 1-73. Liu I, Agresti A: The analysis of ordered categorical data: an overview and a survey of recent developments. Sociedad de Estadistica e Investigacion Operative Test. 2005, 14: 1-73.
10.
go back to reference Carriere I, Bouyer J: Choosing marginal or random-effects models for longitudinal binary responses: application to self-reported disability among older persons. BMC Med Res Methodol. 2002, 2: 15-10.1186/1471-2288-2-15.CrossRefPubMedPubMedCentral Carriere I, Bouyer J: Choosing marginal or random-effects models for longitudinal binary responses: application to self-reported disability among older persons. BMC Med Res Methodol. 2002, 2: 15-10.1186/1471-2288-2-15.CrossRefPubMedPubMedCentral
11.
go back to reference Apfel CC, Laara E, Koivuranta M, Greim CA, Roewer N: A simplified risk score for predicting postoperative nausea and vomiting: conclusions from cross-validations between two centers. Anesthesiology. 1999, 91: 693-700. 10.1097/00000542-199909000-00022.CrossRefPubMed Apfel CC, Laara E, Koivuranta M, Greim CA, Roewer N: A simplified risk score for predicting postoperative nausea and vomiting: conclusions from cross-validations between two centers. Anesthesiology. 1999, 91: 693-700. 10.1097/00000542-199909000-00022.CrossRefPubMed
12.
go back to reference Kappen TH, Vergouwe Y, van Klei WA, van Wolfswinkel L, Kalkman CJ, Moons KG: Adaptation of clinical prediction models for application in local settings. Med Decis Making. 2012, 32 (3): E1-10. 10.1177/0272989X12439755.CrossRefPubMedPubMedCentral Kappen TH, Vergouwe Y, van Klei WA, van Wolfswinkel L, Kalkman CJ, Moons KG: Adaptation of clinical prediction models for application in local settings. Med Decis Making. 2012, 32 (3): E1-10. 10.1177/0272989X12439755.CrossRefPubMedPubMedCentral
13.
go back to reference Van den Bosch JE, Moons KG, Bonsel GJ, Kalkman CJ: Does measurement of preoperative anxiety have added value for predicting postoperative nausea and vomiting?. Anesth Analg. 2005, 100: 1525-1532. 10.1213/01.ANE.0000149325.20542.D4. TableCrossRefPubMed Van den Bosch JE, Moons KG, Bonsel GJ, Kalkman CJ: Does measurement of preoperative anxiety have added value for predicting postoperative nausea and vomiting?. Anesth Analg. 2005, 100: 1525-1532. 10.1213/01.ANE.0000149325.20542.D4. TableCrossRefPubMed
14.
go back to reference Goldstein H, Browne W, Rasbash J: Multilevel modelling of medical data. Stat Med. 2002, 21: 3291-3315. 10.1002/sim.1264.CrossRefPubMed Goldstein H, Browne W, Rasbash J: Multilevel modelling of medical data. Stat Med. 2002, 21: 3291-3315. 10.1002/sim.1264.CrossRefPubMed
15.
go back to reference Skrondal A, Rabe-Hesketh S: Prediction in multilevel generalized linear models. J R Stat Soc A Stat Soc. 2009, 172: 659-687. 10.1111/j.1467-985X.2009.00587.x.CrossRef Skrondal A, Rabe-Hesketh S: Prediction in multilevel generalized linear models. J R Stat Soc A Stat Soc. 2009, 172: 659-687. 10.1111/j.1467-985X.2009.00587.x.CrossRef
16.
go back to reference Harrell FE, Lee KL, Mark DB: Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med. 1996, 15: 361-387. 10.1002/(SICI)1097-0258(19960229)15:4<361::AID-SIM168>3.0.CO;2-4.CrossRefPubMed Harrell FE, Lee KL, Mark DB: Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med. 1996, 15: 361-387. 10.1002/(SICI)1097-0258(19960229)15:4<361::AID-SIM168>3.0.CO;2-4.CrossRefPubMed
17.
go back to reference Miller ME, Hui SL, Tierney WM: Validation techniques for logistic regression models. Stat Med. 1991, 10: 1213-1226. 10.1002/sim.4780100805.CrossRefPubMed Miller ME, Hui SL, Tierney WM: Validation techniques for logistic regression models. Stat Med. 1991, 10: 1213-1226. 10.1002/sim.4780100805.CrossRefPubMed
18.
go back to reference Van Houwelingen HC, Thorogood J: Construction, validation and updating of a prognostic model for kidney graft survival. Stat Med. 1995, 14: 1999-2008. 10.1002/sim.4780141806.CrossRefPubMed Van Houwelingen HC, Thorogood J: Construction, validation and updating of a prognostic model for kidney graft survival. Stat Med. 1995, 14: 1999-2008. 10.1002/sim.4780141806.CrossRefPubMed
19.
go back to reference Van OR, Lesaffre E: An application of Harrell's C-index to PH frailty models. Stat Med. 2010, 29: 3160-3171. 10.1002/sim.4058.CrossRef Van OR, Lesaffre E: An application of Harrell's C-index to PH frailty models. Stat Med. 2010, 29: 3160-3171. 10.1002/sim.4058.CrossRef
20.
go back to reference Senn S: Some controversies in planning and analysing multi-centre trials. Stat Med. 1998, 17: 1753-1765. 10.1002/(SICI)1097-0258(19980815/30)17:15/16<1753::AID-SIM977>3.0.CO;2-X.CrossRefPubMed Senn S: Some controversies in planning and analysing multi-centre trials. Stat Med. 1998, 17: 1753-1765. 10.1002/(SICI)1097-0258(19980815/30)17:15/16<1753::AID-SIM977>3.0.CO;2-X.CrossRefPubMed
21.
go back to reference Scott AJ, Holt D: The effect of Two-stage sampling on ordinary least squares methods. J Am Stat Assoc. 1982, 77: 848-854. 10.1080/01621459.1982.10477897.CrossRef Scott AJ, Holt D: The effect of Two-stage sampling on ordinary least squares methods. J Am Stat Assoc. 1982, 77: 848-854. 10.1080/01621459.1982.10477897.CrossRef
22.
go back to reference R Development Core Team: R: a language and environment for statistical computing. 2008, Vienna, Austria: R Foundation for Statistical Computing, Ref Type: Computer R Development Core Team: R: a language and environment for statistical computing. 2008, Vienna, Austria: R Foundation for Statistical Computing, Ref Type: Computer
25.
go back to reference Vergouwe Y, Moons KG, Steyerberg EW: External validity of risk models: Use of benchmark values to disentangle a case-mix effect from incorrect coefficients. Am J Epidemiol. 2010, 172: 971-980. 10.1093/aje/kwq223.CrossRefPubMedPubMedCentral Vergouwe Y, Moons KG, Steyerberg EW: External validity of risk models: Use of benchmark values to disentangle a case-mix effect from incorrect coefficients. Am J Epidemiol. 2010, 172: 971-980. 10.1093/aje/kwq223.CrossRefPubMedPubMedCentral
26.
go back to reference Hedeker D, Gibbons R, Davis J: Random regression models for multicenter clinical trial data. Psychopharmacol Bull. 1991, 27: 73-77.PubMed Hedeker D, Gibbons R, Davis J: Random regression models for multicenter clinical trial data. Psychopharmacol Bull. 1991, 27: 73-77.PubMed
27.
go back to reference Turrell G, Sanders AE, Slade GD, Spencer AJ, Marcenes W: The independent contribution of neighborhood disadvantage and individual-level socioeconomic position to self-reported oral health: a multilevel analysis. Community Dent Oral Epidemiol. 2007, 35: 195-206. 10.1111/j.1600-0528.2006.00311.x.CrossRefPubMed Turrell G, Sanders AE, Slade GD, Spencer AJ, Marcenes W: The independent contribution of neighborhood disadvantage and individual-level socioeconomic position to self-reported oral health: a multilevel analysis. Community Dent Oral Epidemiol. 2007, 35: 195-206. 10.1111/j.1600-0528.2006.00311.x.CrossRefPubMed
28.
go back to reference Diez-Roux AV, Nieto FJ, Muntaner C, Tyroler HA, Comstock GW, Shahar E: Neighborhood environments and coronary heart disease: a multilevel analysis. Am J Epidemiol. 1997, 146: 48-63. 10.1093/oxfordjournals.aje.a009191.CrossRefPubMed Diez-Roux AV, Nieto FJ, Muntaner C, Tyroler HA, Comstock GW, Shahar E: Neighborhood environments and coronary heart disease: a multilevel analysis. Am J Epidemiol. 1997, 146: 48-63. 10.1093/oxfordjournals.aje.a009191.CrossRefPubMed
Metadata
Title
Prediction models for clustered data: comparison of a random intercept and standard regression model
Authors
Walter Bouwmeester
Jos WR Twisk
Teus H Kappen
Wilton A van Klei
Karel GM Moons
Yvonne Vergouwe
Publication date
01-12-2013
Publisher
BioMed Central
Published in
BMC Medical Research Methodology / Issue 1/2013
Electronic ISSN: 1471-2288
DOI
https://doi.org/10.1186/1471-2288-13-19

Other articles of this Issue 1/2013

BMC Medical Research Methodology 1/2013 Go to the issue

Technical advance

The agreement chart