Skip to main content
Top
Published in: BMC Medical Research Methodology 1/2013

Open Access 01-12-2013 | Research article

Derivation and assessment of risk prediction models using case-cohort data

Authors: Jean Sanderson, Simon G Thompson, Ian R White, Thor Aspelund, Lisa Pennells

Published in: BMC Medical Research Methodology | Issue 1/2013

Login to get access

Abstract

Background

Case-cohort studies are increasingly used to quantify the association of novel factors with disease risk. Conventional measures of predictive ability need modification for this design. We show how Harrell’s C-index, Royston’s D, and the category-based and continuous versions of the net reclassification index (NRI) can be adapted.

Methods

We simulated full cohort and case-cohort data, with sampling fractions ranging from 1% to 90%, using covariates from a cohort study of coronary heart disease, and two incidence rates. We then compared the accuracy and precision of the proposed risk prediction metrics.

Results

The C-index and D must be weighted in order to obtain unbiased results. The NRI does not need modification, provided that the relevant non-subcohort cases are excluded from the calculation. The empirical standard errors across simulations were consistent with analytical standard errors for the C-index and D but not for the NRI. Good relative efficiency of the prediction metrics was observed in our examples, provided the sampling fraction was above 40% for the C-index, 60% for D, or 30% for the NRI. Stata code is made available.

Conclusions

Case-cohort designs can be used to provide unbiased estimates of the C-index, D measure and NRI.
Appendix
Available only for authorised users
Literature
1.
go back to reference Prentince RL: A case-cohort design for epidemiologic cohort studies and disease prevention trials. Biometrika. 1986, 73: 1-11. 10.1093/biomet/73.1.1.CrossRef Prentince RL: A case-cohort design for epidemiologic cohort studies and disease prevention trials. Biometrika. 1986, 73: 1-11. 10.1093/biomet/73.1.1.CrossRef
2.
go back to reference Barlow WE, Ichikawa L, Rosner D, Izumi S: Analysis of case-cohort designs. J Clin Epidemiol. 1999, 52: 1165-1172. 10.1016/S0895-4356(99)00102-X.CrossRefPubMed Barlow WE, Ichikawa L, Rosner D, Izumi S: Analysis of case-cohort designs. J Clin Epidemiol. 1999, 52: 1165-1172. 10.1016/S0895-4356(99)00102-X.CrossRefPubMed
3.
go back to reference Onland-Moret N, Vandera D, Vanderschouw Y, Buschers W, Elias S, Vangils C, Koerselman J, Roest M, Grobbee D, Peeters P: Analysis of case-cohort data: a comparison of different methods. J Clin Epidemiol. 2007, 60: 350-355. 10.1016/j.jclinepi.2006.06.022.CrossRefPubMed Onland-Moret N, Vandera D, Vanderschouw Y, Buschers W, Elias S, Vangils C, Koerselman J, Roest M, Grobbee D, Peeters P: Analysis of case-cohort data: a comparison of different methods. J Clin Epidemiol. 2007, 60: 350-355. 10.1016/j.jclinepi.2006.06.022.CrossRefPubMed
4.
go back to reference Ganna A, Reilly M, de Faire U, Pedersen N, Magnusson P, Ingelsson E: Risk prediction measures for case-cohort and nested case–control designs: an application to cardiovascular disease. Am J Epidemiol. 2012, 175: 715-724. 10.1093/aje/kwr374.CrossRefPubMedPubMedCentral Ganna A, Reilly M, de Faire U, Pedersen N, Magnusson P, Ingelsson E: Risk prediction measures for case-cohort and nested case–control designs: an application to cardiovascular disease. Am J Epidemiol. 2012, 175: 715-724. 10.1093/aje/kwr374.CrossRefPubMedPubMedCentral
5.
go back to reference Chambless LE, Diao G: Estimation of time-dependent area under the ROC curve for long-term risk prediction. Stat Med. 2006, 25: 3474-3486. 10.1002/sim.2299.CrossRefPubMed Chambless LE, Diao G: Estimation of time-dependent area under the ROC curve for long-term risk prediction. Stat Med. 2006, 25: 3474-3486. 10.1002/sim.2299.CrossRefPubMed
6.
go back to reference Folsom AR, Chambless LE, Ballantyne CM, Coresh J, Heiss G, Wu KK, Boerwinkle E, Mosley TH, Sorlie P, Diao G, et al: An assessment of incremental coronary risk prediction using C-reactive protein and other novel risk markers: the atherosclerosis risk in communities study. Arch Intern Med. 2006, 166: 1368-1373. 10.1001/archinte.166.13.1368.CrossRefPubMed Folsom AR, Chambless LE, Ballantyne CM, Coresh J, Heiss G, Wu KK, Boerwinkle E, Mosley TH, Sorlie P, Diao G, et al: An assessment of incremental coronary risk prediction using C-reactive protein and other novel risk markers: the atherosclerosis risk in communities study. Arch Intern Med. 2006, 166: 1368-1373. 10.1001/archinte.166.13.1368.CrossRefPubMed
7.
go back to reference Herder C, Baumert J, Zierer A, Roden M, Meisinger C, Karakas M, Chambless L, Rathmann W, Peters A, Koenig W, et al: Immunological and cardiometabolic risk factors in the prediction of type 2 diabetes and coronary events: MONICA/KORA Augsburg case-cohort study. PLoS One. 2011, 6: e19852-10.1371/journal.pone.0019852.CrossRefPubMedPubMedCentral Herder C, Baumert J, Zierer A, Roden M, Meisinger C, Karakas M, Chambless L, Rathmann W, Peters A, Koenig W, et al: Immunological and cardiometabolic risk factors in the prediction of type 2 diabetes and coronary events: MONICA/KORA Augsburg case-cohort study. PLoS One. 2011, 6: e19852-10.1371/journal.pone.0019852.CrossRefPubMedPubMedCentral
8.
go back to reference Vaarhorst AA, Lu Y, Heijmans BT, Dolle ME, Bohringer S, Putter H, Imholz S, Merry AH, van Greevenbroek MM, Jukema JW, et al: Literature-based genetic risk scores for coronary heart disease: the Cardiovascular Registry Maastricht (CAREMA) prospective cohort study. Circ Cardiovasc Genet. 2012, 5: 202-209. 10.1161/CIRCGENETICS.111.960708.CrossRefPubMed Vaarhorst AA, Lu Y, Heijmans BT, Dolle ME, Bohringer S, Putter H, Imholz S, Merry AH, van Greevenbroek MM, Jukema JW, et al: Literature-based genetic risk scores for coronary heart disease: the Cardiovascular Registry Maastricht (CAREMA) prospective cohort study. Circ Cardiovasc Genet. 2012, 5: 202-209. 10.1161/CIRCGENETICS.111.960708.CrossRefPubMed
9.
go back to reference Danesh J, Saracci R, Berglund G, Feskens E, Overvad K, Panico S, Thompson S, Fournier A, Clavel-Chapelon F, Canonico M, et al: EPIC-Heart: the cardiovascular component of a prospective study of nutritional, lifestyle and biological factors in 520,000 middle-aged participants from 10 European countries. Eur J Epidemiol. 2007, 22: 129-141. 10.1007/s10654-006-9096-8.CrossRefPubMed Danesh J, Saracci R, Berglund G, Feskens E, Overvad K, Panico S, Thompson S, Fournier A, Clavel-Chapelon F, Canonico M, et al: EPIC-Heart: the cardiovascular component of a prospective study of nutritional, lifestyle and biological factors in 520,000 middle-aged participants from 10 European countries. Eur J Epidemiol. 2007, 22: 129-141. 10.1007/s10654-006-9096-8.CrossRefPubMed
10.
go back to reference Harrell FE, Califf RM, Pryor DB, Lee KL, Rosati RA: Evaluating the yield of medical tests. JAMA. 1982, 247: 2543-2546. 10.1001/jama.1982.03320430047030.CrossRefPubMed Harrell FE, Califf RM, Pryor DB, Lee KL, Rosati RA: Evaluating the yield of medical tests. JAMA. 1982, 247: 2543-2546. 10.1001/jama.1982.03320430047030.CrossRefPubMed
11.
go back to reference Harrell FE, Lee KL, Mark DB: Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med. 1996, 15: 361-387. 10.1002/(SICI)1097-0258(19960229)15:4<361::AID-SIM168>3.0.CO;2-4.CrossRefPubMed Harrell FE, Lee KL, Mark DB: Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med. 1996, 15: 361-387. 10.1002/(SICI)1097-0258(19960229)15:4<361::AID-SIM168>3.0.CO;2-4.CrossRefPubMed
12.
go back to reference Royston P, Sauerbrei W: A new measure of prognostic separation in survival data. Stat Med. 2004, 23: 723-748. 10.1002/sim.1621.CrossRefPubMed Royston P, Sauerbrei W: A new measure of prognostic separation in survival data. Stat Med. 2004, 23: 723-748. 10.1002/sim.1621.CrossRefPubMed
13.
go back to reference Pencina MJ, D’Agostino RB, D’Agostino RB, Vasan RS: Evaluating the added predictive ability of a new marker: from area under the ROC curve to reclassification and beyond. Stat Med. 2008, 27: 157-172. 10.1002/sim.2929.CrossRefPubMed Pencina MJ, D’Agostino RB, D’Agostino RB, Vasan RS: Evaluating the added predictive ability of a new marker: from area under the ROC curve to reclassification and beyond. Stat Med. 2008, 27: 157-172. 10.1002/sim.2929.CrossRefPubMed
14.
go back to reference Pencina MJ, D’Agostino RB, Steyerberg EW: Extensions of net reclassification improvement calculations to measure usefulness of new biomarkers. Stat Med. 2011, 30: 11-21. 10.1002/sim.4085.CrossRefPubMed Pencina MJ, D’Agostino RB, Steyerberg EW: Extensions of net reclassification improvement calculations to measure usefulness of new biomarkers. Stat Med. 2011, 30: 11-21. 10.1002/sim.4085.CrossRefPubMed
15.
go back to reference Jonsdottir LS, Sigfusson N, Gudnason V, Sigvaldason H, Thorgeirsson G: Do lipids, blood pressure, diabetes, and smoking confer equal risk of myocardial infarction in women as in men? The Reykjavik Study. J Cardiovasc Risk. 2002, 9: 67-76. 10.1097/00043798-200204000-00001.CrossRefPubMed Jonsdottir LS, Sigfusson N, Gudnason V, Sigvaldason H, Thorgeirsson G: Do lipids, blood pressure, diabetes, and smoking confer equal risk of myocardial infarction in women as in men? The Reykjavik Study. J Cardiovasc Risk. 2002, 9: 67-76. 10.1097/00043798-200204000-00001.CrossRefPubMed
16.
go back to reference Cox DR: Regression Models and Life-Tables. J R Stat Soc Ser B Methodol. 1972, 37: 187-220. Cox DR: Regression Models and Life-Tables. J R Stat Soc Ser B Methodol. 1972, 37: 187-220.
17.
go back to reference Self SG, Prentice RL: Asymptotic distribution theory and efficiency results for case-cohort studies. Ann Stat. 1988, 16: 64-81. 10.1214/aos/1176350691.CrossRef Self SG, Prentice RL: Asymptotic distribution theory and efficiency results for case-cohort studies. Ann Stat. 1988, 16: 64-81. 10.1214/aos/1176350691.CrossRef
18.
go back to reference Langholz B, Jiao J: Computational methods for case-cohort studies. Comput Stat Data Anal. 2007, 51: 3737-3748. 10.1016/j.csda.2006.12.028.CrossRef Langholz B, Jiao J: Computational methods for case-cohort studies. Comput Stat Data Anal. 2007, 51: 3737-3748. 10.1016/j.csda.2006.12.028.CrossRef
19.
go back to reference Kulathinal S, Karvanen J, Saarela O, Kuulasmaa K: Case-cohort design in practice - experiences from the MORGAM Project. Epidemiol Perspect Innov. 2007, 4: 15-10.1186/1742-5573-4-15.CrossRefPubMedPubMedCentral Kulathinal S, Karvanen J, Saarela O, Kuulasmaa K: Case-cohort design in practice - experiences from the MORGAM Project. Epidemiol Perspect Innov. 2007, 4: 15-10.1186/1742-5573-4-15.CrossRefPubMedPubMedCentral
20.
go back to reference Graf E, Schmoor C, Sauerbrei W, Schumacher M: Assessment and comparison of prognostic classification schemes for survival data. Stat Med. 1999, 18: 2529-2545. 10.1002/(SICI)1097-0258(19990915/30)18:17/18<2529::AID-SIM274>3.0.CO;2-5.CrossRefPubMed Graf E, Schmoor C, Sauerbrei W, Schumacher M: Assessment and comparison of prognostic classification schemes for survival data. Stat Med. 1999, 18: 2529-2545. 10.1002/(SICI)1097-0258(19990915/30)18:17/18<2529::AID-SIM274>3.0.CO;2-5.CrossRefPubMed
21.
go back to reference Schemper M, Stare J: Explained variation in survival analysis. Stat Med. 1996, 15: 1999-2012. 10.1002/(SICI)1097-0258(19961015)15:19<1999::AID-SIM353>3.0.CO;2-D.CrossRefPubMed Schemper M, Stare J: Explained variation in survival analysis. Stat Med. 1996, 15: 1999-2012. 10.1002/(SICI)1097-0258(19961015)15:19<1999::AID-SIM353>3.0.CO;2-D.CrossRefPubMed
22.
go back to reference Steyerberg EW, Vickers AJ, Cook NR, Gerds T, Gonen M, Obuchowski N, Pencina MJ, Kattan MW: Assessing the performance of prediction models: a framework for traditional and novel measures. Epidemiology. 2010, 21: 128-138. 10.1097/EDE.0b013e3181c30fb2.CrossRefPubMedPubMedCentral Steyerberg EW, Vickers AJ, Cook NR, Gerds T, Gonen M, Obuchowski N, Pencina MJ, Kattan MW: Assessing the performance of prediction models: a framework for traditional and novel measures. Epidemiology. 2010, 21: 128-138. 10.1097/EDE.0b013e3181c30fb2.CrossRefPubMedPubMedCentral
23.
go back to reference Newson R: Confidence intervals for rank statistics: Somers’ D and extensions. Stata J. 2006, 6: 309-334. Newson R: Confidence intervals for rank statistics: Somers’ D and extensions. Stata J. 2006, 6: 309-334.
24.
go back to reference Stata Statistical Software: Release 11. 2009, College Station, TX: StataCorp LP Stata Statistical Software: Release 11. 2009, College Station, TX: StataCorp LP
25.
go back to reference The Emerging Risk Factors Collaboration: Lipid-related markers and cardiovascular disease prediction. JAMA. 2012, 307: 2499-2506.PubMedCentral The Emerging Risk Factors Collaboration: Lipid-related markers and cardiovascular disease prediction. JAMA. 2012, 307: 2499-2506.PubMedCentral
26.
go back to reference The Emerging Risk Factors Collaboration: C-reactive protein, fibrinogen, and cardiovascular disease prediction. NEJM. 2012, 367: 1310-1320.CrossRefPubMedCentral The Emerging Risk Factors Collaboration: C-reactive protein, fibrinogen, and cardiovascular disease prediction. NEJM. 2012, 367: 1310-1320.CrossRefPubMedCentral
27.
go back to reference Gonen M, Heller G: Concordance probability and discriminatory power in proportional hazards regression. Biometrika. 2005, 92: 965-970. 10.1093/biomet/92.4.965.CrossRef Gonen M, Heller G: Concordance probability and discriminatory power in proportional hazards regression. Biometrika. 2005, 92: 965-970. 10.1093/biomet/92.4.965.CrossRef
28.
go back to reference Wolbers M, Koller MT, Witteman JC, Steyerberg EW: Prognostic models with competing risks: methods and application to coronary risk prediction. Epidemiology. 2009, 20: 555-561. 10.1097/EDE.0b013e3181a39056.CrossRefPubMed Wolbers M, Koller MT, Witteman JC, Steyerberg EW: Prognostic models with competing risks: methods and application to coronary risk prediction. Epidemiology. 2009, 20: 555-561. 10.1097/EDE.0b013e3181a39056.CrossRefPubMed
Metadata
Title
Derivation and assessment of risk prediction models using case-cohort data
Authors
Jean Sanderson
Simon G Thompson
Ian R White
Thor Aspelund
Lisa Pennells
Publication date
01-12-2013
Publisher
BioMed Central
Published in
BMC Medical Research Methodology / Issue 1/2013
Electronic ISSN: 1471-2288
DOI
https://doi.org/10.1186/1471-2288-13-113

Other articles of this Issue 1/2013

BMC Medical Research Methodology 1/2013 Go to the issue