Skip to main content
Top
Published in: BMC Medical Research Methodology 1/2009

Open Access 01-12-2009 | Research article

Combining estimates of interest in prognostic modelling studies after multiple imputation: current practice and guidelines

Authors: Andrea Marshall, Douglas G Altman, Roger L Holder, Patrick Royston

Published in: BMC Medical Research Methodology | Issue 1/2009

Login to get access

Abstract

Background

Multiple imputation (MI) provides an effective approach to handle missing covariate data within prognostic modelling studies, as it can properly account for the missing data uncertainty. The multiply imputed datasets are each analysed using standard prognostic modelling techniques to obtain the estimates of interest. The estimates from each imputed dataset are then combined into one overall estimate and variance, incorporating both the within and between imputation variability. Rubin's rules for combining these multiply imputed estimates are based on asymptotic theory. The resulting combined estimates may be more accurate if the posterior distribution of the population parameter of interest is better approximated by the normal distribution. However, the normality assumption may not be appropriate for all the parameters of interest when analysing prognostic modelling studies, such as predicted survival probabilities and model performance measures.

Methods

Guidelines for combining the estimates of interest when analysing prognostic modelling studies are provided. A literature review is performed to identify current practice for combining such estimates in prognostic modelling studies.

Results

Methods for combining all reported estimates after MI were not well reported in the current literature. Rubin's rules without applying any transformations were the standard approach used, when any method was stated.

Conclusion

The proposed simple guidelines for combining estimates after MI may lead to a wider and more appropriate use of MI in future prognostic modelling studies.
Literature
1.
go back to reference Altman DG, Royston P: What do we mean by validating a prognostic model?. Statistics in Medicine. 2000, 19 (4): 453-473. 10.1002/(SICI)1097-0258(20000229)19:4<453::AID-SIM350>3.0.CO;2-5.CrossRefPubMed Altman DG, Royston P: What do we mean by validating a prognostic model?. Statistics in Medicine. 2000, 19 (4): 453-473. 10.1002/(SICI)1097-0258(20000229)19:4<453::AID-SIM350>3.0.CO;2-5.CrossRefPubMed
2.
go back to reference Wyatt JC, Altman DG: Commentary: Prognostic models: clinically useful or quickly forgotten?. British Medical Journal. 1995, 311 (7019): 1539-1541.CrossRefPubMedCentral Wyatt JC, Altman DG: Commentary: Prognostic models: clinically useful or quickly forgotten?. British Medical Journal. 1995, 311 (7019): 1539-1541.CrossRefPubMedCentral
3.
go back to reference Burton A, Altman DG: Missing covariate data within cancer prognostic studies: a review of current reporting and proposed guidelines. British Journal of Cancer. 2004, 91 (1): 4-8. 10.1038/sj.bjc.6601907.CrossRefPubMedPubMedCentral Burton A, Altman DG: Missing covariate data within cancer prognostic studies: a review of current reporting and proposed guidelines. British Journal of Cancer. 2004, 91 (1): 4-8. 10.1038/sj.bjc.6601907.CrossRefPubMedPubMedCentral
4.
go back to reference Rubin DB: Multiple Imputation for Nonresponse in Surveys. 2004, New York: John Wiley and Sons Rubin DB: Multiple Imputation for Nonresponse in Surveys. 2004, New York: John Wiley and Sons
5.
go back to reference Graham JW, Olchowski AE, Gilreath TD: How many imputations are really needed? Some practical clarifications of multiple imputation theory. Prevention Science. 2007, 8 (3): 206-213. 10.1007/s11121-007-0070-9.CrossRefPubMed Graham JW, Olchowski AE, Gilreath TD: How many imputations are really needed? Some practical clarifications of multiple imputation theory. Prevention Science. 2007, 8 (3): 206-213. 10.1007/s11121-007-0070-9.CrossRefPubMed
6.
go back to reference Kenward MG, Carpenter J: Multiple imputation: current perspectives. Statistical Methods in Medical Research. 2007, 16 (3): 199-218. 10.1177/0962280206075304.CrossRefPubMed Kenward MG, Carpenter J: Multiple imputation: current perspectives. Statistical Methods in Medical Research. 2007, 16 (3): 199-218. 10.1177/0962280206075304.CrossRefPubMed
7.
go back to reference van Buuren S, Boshuizen HC, Knook DL: Multiple imputation of missing blood pressure covariates in survival analysis. Statistics in Medicine. 1999, 18 (6): 681-694. 10.1002/(SICI)1097-0258(19990330)18:6<681::AID-SIM71>3.0.CO;2-R.CrossRefPubMed van Buuren S, Boshuizen HC, Knook DL: Multiple imputation of missing blood pressure covariates in survival analysis. Statistics in Medicine. 1999, 18 (6): 681-694. 10.1002/(SICI)1097-0258(19990330)18:6<681::AID-SIM71>3.0.CO;2-R.CrossRefPubMed
8.
go back to reference Li KH, Meng XL, Raghunathan TE, Rubin DB: Significance levels from repeated p-values with multiply-imputed data. Statistica Sinica. 1991, 1 (1): 65-92. Li KH, Meng XL, Raghunathan TE, Rubin DB: Significance levels from repeated p-values with multiply-imputed data. Statistica Sinica. 1991, 1 (1): 65-92.
9.
go back to reference Schafer JL: Analysis of Incomplete Multivariate Data. 1997, New York: Chapman and HallCrossRef Schafer JL: Analysis of Incomplete Multivariate Data. 1997, New York: Chapman and HallCrossRef
10.
go back to reference Rubin DB, Schenker N: Multiple imputation in health-care databases: an overview and some applications. Statistics in Medicine. 1991, 10 (4): 585-598. 10.1002/sim.4780100410.CrossRefPubMed Rubin DB, Schenker N: Multiple imputation in health-care databases: an overview and some applications. Statistics in Medicine. 1991, 10 (4): 585-598. 10.1002/sim.4780100410.CrossRefPubMed
11.
go back to reference Rubin DB, Schenker N: Multiple imputation for interval estimation from simple random samples with ignorable nonresponse. Journal of the American Statistical Association. 1986, 81 (394): 366-374. 10.2307/2289225.CrossRef Rubin DB, Schenker N: Multiple imputation for interval estimation from simple random samples with ignorable nonresponse. Journal of the American Statistical Association. 1986, 81 (394): 366-374. 10.2307/2289225.CrossRef
12.
go back to reference Hampel FR, Ronchetti EM, Rousseeuw PJ, Stahel WA: Robust statistics. The approach based on influence functions. 1986, New York: John Wiley & Sons Hampel FR, Ronchetti EM, Rousseeuw PJ, Stahel WA: Robust statistics. The approach based on influence functions. 1986, New York: John Wiley & Sons
13.
go back to reference Ambler G, Brady AR, Royston P: Simplifying a prognostic model: a simulation study based on clinical data. Statistics in Medicine. 2002, 21 (24): 3803-3822. 10.1002/sim.1422.CrossRefPubMed Ambler G, Brady AR, Royston P: Simplifying a prognostic model: a simulation study based on clinical data. Statistics in Medicine. 2002, 21 (24): 3803-3822. 10.1002/sim.1422.CrossRefPubMed
14.
go back to reference Peduzzi P, Concato J, Feinstein AR, Holford TR: Importance of events per independent variable in proportional hazards regression analysis. II. Accuracy and precision of regression estimates. Journal of Clinical Epidemiology. 1995, 48 (12): 1503-1510. 10.1016/0895-4356(95)00048-8.CrossRefPubMed Peduzzi P, Concato J, Feinstein AR, Holford TR: Importance of events per independent variable in proportional hazards regression analysis. II. Accuracy and precision of regression estimates. Journal of Clinical Epidemiology. 1995, 48 (12): 1503-1510. 10.1016/0895-4356(95)00048-8.CrossRefPubMed
15.
go back to reference Harrell FE: Regression Modeling Strategies with Applications to Linear Models, Logistic Regression, and Survival Analysis. 2001, New York: Springer-Verlag Harrell FE: Regression Modeling Strategies with Applications to Linear Models, Logistic Regression, and Survival Analysis. 2001, New York: Springer-Verlag
16.
go back to reference Schemper M, Stare J: Explained variation in survival analysis. Statistics in Medicine. 1996, 15 (19): 1999-2012. 10.1002/(SICI)1097-0258(19961015)15:19<1999::AID-SIM353>3.0.CO;2-D.CrossRefPubMed Schemper M, Stare J: Explained variation in survival analysis. Statistics in Medicine. 1996, 15 (19): 1999-2012. 10.1002/(SICI)1097-0258(19961015)15:19<1999::AID-SIM353>3.0.CO;2-D.CrossRefPubMed
17.
go back to reference Schemper M, Henderson R: Predictive accuracy and explained variation in Cox regression. Biometrics. 2000, 56 (1): 249-255. 10.1111/j.0006-341X.2000.00249.x.CrossRefPubMed Schemper M, Henderson R: Predictive accuracy and explained variation in Cox regression. Biometrics. 2000, 56 (1): 249-255. 10.1111/j.0006-341X.2000.00249.x.CrossRefPubMed
18.
go back to reference O'Quigley J, Xu RH, Stare J: Explained randomness in proportional hazards models. Statistics in Medicine. 2005, 24 (3): 479-489. 10.1002/sim.1946.CrossRefPubMed O'Quigley J, Xu RH, Stare J: Explained randomness in proportional hazards models. Statistics in Medicine. 2005, 24 (3): 479-489. 10.1002/sim.1946.CrossRefPubMed
19.
go back to reference Harrell FE, Lee KL, Mark DB: Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Statistics in Medicine. 1996, 15 (4): 361-387. 10.1002/(SICI)1097-0258(19960229)15:4<361::AID-SIM168>3.0.CO;2-4.CrossRefPubMed Harrell FE, Lee KL, Mark DB: Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Statistics in Medicine. 1996, 15 (4): 361-387. 10.1002/(SICI)1097-0258(19960229)15:4<361::AID-SIM168>3.0.CO;2-4.CrossRefPubMed
20.
go back to reference Royston P, Sauerbrei W: A new measure of prognostic separation in survival data. Statistics in Medicine. 2004, 23 (5): 723-748. 10.1002/sim.1621.CrossRefPubMed Royston P, Sauerbrei W: A new measure of prognostic separation in survival data. Statistics in Medicine. 2004, 23 (5): 723-748. 10.1002/sim.1621.CrossRefPubMed
21.
go back to reference van Houwelingen HC, Le Cessie S: Predictive value of statistical models. Statistics in Medicine. 1990, 9 (1): 1303-1325. 10.1002/sim.4780091109.CrossRefPubMed van Houwelingen HC, Le Cessie S: Predictive value of statistical models. Statistics in Medicine. 1990, 9 (1): 1303-1325. 10.1002/sim.4780091109.CrossRefPubMed
22.
go back to reference Meng XL, Rubin DB: Performing likelihood ratio tests with multiply-imputed data sets. Biometrika. 1992, 79 (1): 103-111. 10.1093/biomet/79.1.103.CrossRef Meng XL, Rubin DB: Performing likelihood ratio tests with multiply-imputed data sets. Biometrika. 1992, 79 (1): 103-111. 10.1093/biomet/79.1.103.CrossRef
23.
go back to reference Fisher RA: Statistical Methods for Research Workers. 1941, Edinburgh: Oliver and Boyd Ltd Fisher RA: Statistical Methods for Research Workers. 1941, Edinburgh: Oliver and Boyd Ltd
24.
go back to reference Hosmer DW, Lemeshow S: Applied survival analysis – Regression modeling of time to event data. 1999, New York: John Wiley & Sons Hosmer DW, Lemeshow S: Applied survival analysis – Regression modeling of time to event data. 1999, New York: John Wiley & Sons
25.
go back to reference Collett D: Modelling survival data in medical research. 2003, London: Chapman & Hall/CRC, Second Collett D: Modelling survival data in medical research. 2003, London: Chapman & Hall/CRC, Second
26.
go back to reference Thomsen BL, Keiding N, Altman DG: A note on the calculation of expected survival, illustrated by the survival of liver transplant patients. Statistics in Medicine. 1991, 10 (5): 733-738. 10.1002/sim.4780100508.CrossRefPubMed Thomsen BL, Keiding N, Altman DG: A note on the calculation of expected survival, illustrated by the survival of liver transplant patients. Statistics in Medicine. 1991, 10 (5): 733-738. 10.1002/sim.4780100508.CrossRefPubMed
27.
go back to reference Clark TG, Altman DG: Developing a prognostic model in the presence of missing data. an ovarian cancer case study. Journal of Clinical Epidemiology. 2003, 56 (1): 28-37. 10.1016/S0895-4356(02)00539-5.CrossRefPubMed Clark TG, Altman DG: Developing a prognostic model in the presence of missing data. an ovarian cancer case study. Journal of Clinical Epidemiology. 2003, 56 (1): 28-37. 10.1016/S0895-4356(02)00539-5.CrossRefPubMed
28.
go back to reference Sinharay S, Stern HS, Russell D: The use of multiple imputation for the analysis of missing data. Psychological Methods. 2001, 6 (4): 317-329.CrossRefPubMed Sinharay S, Stern HS, Russell D: The use of multiple imputation for the analysis of missing data. Psychological Methods. 2001, 6 (4): 317-329.CrossRefPubMed
29.
go back to reference Gill S, Loprinzi CL, Sargent DJ, Thome SD, Alberts SR, Haller DG, Benedetti J, Francini G, Shepherd LE, Seitz JF, et al: Pooled analysis of fluorouracil-based adjuvant therapy for stage II and III colon cancer: Who benefits and by how much?. Journal of Clinical Oncology. 2004, 22 (10): 1797-1806. 10.1200/JCO.2004.09.059.CrossRefPubMed Gill S, Loprinzi CL, Sargent DJ, Thome SD, Alberts SR, Haller DG, Benedetti J, Francini G, Shepherd LE, Seitz JF, et al: Pooled analysis of fluorouracil-based adjuvant therapy for stage II and III colon cancer: Who benefits and by how much?. Journal of Clinical Oncology. 2004, 22 (10): 1797-1806. 10.1200/JCO.2004.09.059.CrossRefPubMed
30.
go back to reference Clark TG, Stewart ME, Altman DG, Gabra H, Smyth JF: A prognostic model for ovarian cancer. British Journal of Cancer. 2001, 85 (7): 944-952. 10.1054/bjoc.2001.2030.CrossRefPubMedPubMedCentral Clark TG, Stewart ME, Altman DG, Gabra H, Smyth JF: A prognostic model for ovarian cancer. British Journal of Cancer. 2001, 85 (7): 944-952. 10.1054/bjoc.2001.2030.CrossRefPubMedPubMedCentral
31.
go back to reference Rouxel A, Hejblum G, Bernier MO, Boelle PY, Menegaux F, Mansour G, Hoang C, Aurengo A, Leenhardt L: Prognostic factors associated with the survival of patients developing loco-regional recurrences of differentiated thyroid carcinomas. J Clin Endocrinol Metab. 2004, 89 (11): 5362-5368. 10.1210/jc.2003-032004.CrossRefPubMed Rouxel A, Hejblum G, Bernier MO, Boelle PY, Menegaux F, Mansour G, Hoang C, Aurengo A, Leenhardt L: Prognostic factors associated with the survival of patients developing loco-regional recurrences of differentiated thyroid carcinomas. J Clin Endocrinol Metab. 2004, 89 (11): 5362-5368. 10.1210/jc.2003-032004.CrossRefPubMed
32.
go back to reference Stadler WM, Huo DZ, George C, Yang XM, Ryan CW, Karrison T, Zimmerman TM, Vogelzang NJ: Prognostic factors for survival with gemcitabine plus 5-fluorouracil based regimens for metastatic renal cancer. Journal of Urology. 2003, 170 (4): 1141-1145. 10.1097/01.ju.0000086829.74971.4a.CrossRefPubMed Stadler WM, Huo DZ, George C, Yang XM, Ryan CW, Karrison T, Zimmerman TM, Vogelzang NJ: Prognostic factors for survival with gemcitabine plus 5-fluorouracil based regimens for metastatic renal cancer. Journal of Urology. 2003, 170 (4): 1141-1145. 10.1097/01.ju.0000086829.74971.4a.CrossRefPubMed
33.
go back to reference Vaughn G, Detels R: Protease inhibitors and cardiovascular disease: analysis of the Los Angeles County adult spectrum of disease cohort. AIDS Care. 2007, 19 (4): 492-499. 10.1080/09540120701203329.CrossRefPubMed Vaughn G, Detels R: Protease inhibitors and cardiovascular disease: analysis of the Los Angeles County adult spectrum of disease cohort. AIDS Care. 2007, 19 (4): 492-499. 10.1080/09540120701203329.CrossRefPubMed
34.
go back to reference Orsini N, Mantzoros CS, Wolk A: Association of physical activity with cancer incidence, mortality, and survival: a population-based study of men. British Journal of Cancer. 2008, 98 (11): 1864-1869. 10.1038/sj.bjc.6604354.CrossRefPubMedPubMedCentral Orsini N, Mantzoros CS, Wolk A: Association of physical activity with cancer incidence, mortality, and survival: a population-based study of men. British Journal of Cancer. 2008, 98 (11): 1864-1869. 10.1038/sj.bjc.6604354.CrossRefPubMedPubMedCentral
35.
go back to reference Mertens AC, Yasui Y, Neglia JP, Potter JD, Nesbit ME, Ruccione K, Smithson WA, Robison LL: Late mortality experience in five-year survivors of childhood and adolescent cancer: The childhood cancer survivor study. Journal of Clinical Oncology. 2001, 19 (13): 3163-3172.PubMed Mertens AC, Yasui Y, Neglia JP, Potter JD, Nesbit ME, Ruccione K, Smithson WA, Robison LL: Late mortality experience in five-year survivors of childhood and adolescent cancer: The childhood cancer survivor study. Journal of Clinical Oncology. 2001, 19 (13): 3163-3172.PubMed
36.
go back to reference Serrat C, Gomez G, de Olalla PG, Cayla JA: CD4+ lymphocytes and tuberculin skin test as survival predictors in pulmonary tuberculosis HIV-infected patients. International Journal of Epidemiology. 1998, 27 (4): 703-712. 10.1093/ije/27.4.703.CrossRefPubMed Serrat C, Gomez G, de Olalla PG, Cayla JA: CD4+ lymphocytes and tuberculin skin test as survival predictors in pulmonary tuberculosis HIV-infected patients. International Journal of Epidemiology. 1998, 27 (4): 703-712. 10.1093/ije/27.4.703.CrossRefPubMed
37.
go back to reference Bärnighausen T, Tanser F, Gqwede Z, Mbizana C, Herbst K, Newell M-L: High HIV incidence in a community with high HIV prevalence in rural South Africa: findings from a prospective population-based study. AIDS. 2008, 22 (1): 139-144. 10.1097/QAD.0b013e3282f2ef43.CrossRefPubMed Bärnighausen T, Tanser F, Gqwede Z, Mbizana C, Herbst K, Newell M-L: High HIV incidence in a community with high HIV prevalence in rural South Africa: findings from a prospective population-based study. AIDS. 2008, 22 (1): 139-144. 10.1097/QAD.0b013e3282f2ef43.CrossRefPubMed
39.
go back to reference Heymans MW, van Buuren S, Knol DL, van Mechelen W, de Vet HCW: Variable selection under multiple imputation using the bootstrap in a prognostic study. BMC Medical Research Methodology. 2007, 7: 33-10.1186/1471-2288-7-33.CrossRefPubMedPubMedCentral Heymans MW, van Buuren S, Knol DL, van Mechelen W, de Vet HCW: Variable selection under multiple imputation using the bootstrap in a prognostic study. BMC Medical Research Methodology. 2007, 7: 33-10.1186/1471-2288-7-33.CrossRefPubMedPubMedCentral
40.
go back to reference Hoeting JA, Madigan D, Raftery AE, Volinsky CT: Bayesian model averaging: A tutorial. Statistical Science. 1999, 14 (4): 382-401. 10.1214/ss/1009212519.CrossRef Hoeting JA, Madigan D, Raftery AE, Volinsky CT: Bayesian model averaging: A tutorial. Statistical Science. 1999, 14 (4): 382-401. 10.1214/ss/1009212519.CrossRef
Metadata
Title
Combining estimates of interest in prognostic modelling studies after multiple imputation: current practice and guidelines
Authors
Andrea Marshall
Douglas G Altman
Roger L Holder
Patrick Royston
Publication date
01-12-2009
Publisher
BioMed Central
Published in
BMC Medical Research Methodology / Issue 1/2009
Electronic ISSN: 1471-2288
DOI
https://doi.org/10.1186/1471-2288-9-57

Other articles of this Issue 1/2009

BMC Medical Research Methodology 1/2009 Go to the issue