Top

BMC Medical Research Methodology

Published in:

Open Access 01-12-2013 | Research article

Derivation and assessment of risk prediction models using case-cohort data

Authors: Jean Sanderson, Simon G Thompson, Ian R White, Thor Aspelund, Lisa Pennells

Published in: BMC Medical Research Methodology | Issue 1/2013

Abstract

Background

Case-cohort studies are increasingly used to quantify the association of novel factors with disease risk. Conventional measures of predictive ability need modification for this design. We show how Harrell’s C-index, Royston’s D, and the category-based and continuous versions of the net reclassification index (NRI) can be adapted.

Methods

We simulated full cohort and case-cohort data, with sampling fractions ranging from 1% to 90%, using covariates from a cohort study of coronary heart disease, and two incidence rates. We then compared the accuracy and precision of the proposed risk prediction metrics.

Results

The C-index and D must be weighted in order to obtain unbiased results. The NRI does not need modification, provided that the relevant non-subcohort cases are excluded from the calculation. The empirical standard errors across simulations were consistent with analytical standard errors for the C-index and D but not for the NRI. Good relative efficiency of the prediction metrics was observed in our examples, provided the sampling fraction was above 40% for the C-index, 60% for D, or 30% for the NRI. Stata code is made available.

Conclusions

Case-cohort designs can be used to provide unbiased estimates of the C-index, D measure and NRI.

Available only for authorised users

Prentince RL: A case-cohort design for epidemiologic cohort studies and disease prevention trials. Biometrika. 1986, 73: 1-11. 10.1093/biomet/73.1.1.CrossRef

Barlow WE, Ichikawa L, Rosner D, Izumi S: Analysis of case-cohort designs. J Clin Epidemiol. 1999, 52: 1165-1172. 10.1016/S0895-4356(99)00102-X.CrossRefPubMed

Onland-Moret N, Vandera D, Vanderschouw Y, Buschers W, Elias S, Vangils C, Koerselman J, Roest M, Grobbee D, Peeters P: Analysis of case-cohort data: a comparison of different methods. J Clin Epidemiol. 2007, 60: 350-355. 10.1016/j.jclinepi.2006.06.022.CrossRefPubMed

Ganna A, Reilly M, de Faire U, Pedersen N, Magnusson P, Ingelsson E: Risk prediction measures for case-cohort and nested case–control designs: an application to cardiovascular disease. Am J Epidemiol. 2012, 175: 715-724. 10.1093/aje/kwr374.CrossRefPubMedPubMedCentral

Chambless LE, Diao G: Estimation of time-dependent area under the ROC curve for long-term risk prediction. Stat Med. 2006, 25: 3474-3486. 10.1002/sim.2299.CrossRefPubMed

Folsom AR, Chambless LE, Ballantyne CM, Coresh J, Heiss G, Wu KK, Boerwinkle E, Mosley TH, Sorlie P, Diao G, et al: An assessment of incremental coronary risk prediction using C-reactive protein and other novel risk markers: the atherosclerosis risk in communities study. Arch Intern Med. 2006, 166: 1368-1373. 10.1001/archinte.166.13.1368.CrossRefPubMed

Herder C, Baumert J, Zierer A, Roden M, Meisinger C, Karakas M, Chambless L, Rathmann W, Peters A, Koenig W, et al: Immunological and cardiometabolic risk factors in the prediction of type 2 diabetes and coronary events: MONICA/KORA Augsburg case-cohort study. PLoS One. 2011, 6: e19852-10.1371/journal.pone.0019852.CrossRefPubMedPubMedCentral

Vaarhorst AA, Lu Y, Heijmans BT, Dolle ME, Bohringer S, Putter H, Imholz S, Merry AH, van Greevenbroek MM, Jukema JW, et al: Literature-based genetic risk scores for coronary heart disease: the Cardiovascular Registry Maastricht (CAREMA) prospective cohort study. Circ Cardiovasc Genet. 2012, 5: 202-209. 10.1161/CIRCGENETICS.111.960708.CrossRefPubMed

Danesh J, Saracci R, Berglund G, Feskens E, Overvad K, Panico S, Thompson S, Fournier A, Clavel-Chapelon F, Canonico M, et al: EPIC-Heart: the cardiovascular component of a prospective study of nutritional, lifestyle and biological factors in 520,000 middle-aged participants from 10 European countries. Eur J Epidemiol. 2007, 22: 129-141. 10.1007/s10654-006-9096-8.CrossRefPubMed

10.

Harrell FE, Califf RM, Pryor DB, Lee KL, Rosati RA: Evaluating the yield of medical tests. JAMA. 1982, 247: 2543-2546. 10.1001/jama.1982.03320430047030.CrossRefPubMed

11.

Harrell FE, Lee KL, Mark DB: Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med. 1996, 15: 361-387. 10.1002/(SICI)1097-0258(19960229)15:4<361::AID-SIM168>3.0.CO;2-4.CrossRefPubMed

12.

Royston P, Sauerbrei W: A new measure of prognostic separation in survival data. Stat Med. 2004, 23: 723-748. 10.1002/sim.1621.CrossRefPubMed

13.

Pencina MJ, D’Agostino RB, D’Agostino RB, Vasan RS: Evaluating the added predictive ability of a new marker: from area under the ROC curve to reclassification and beyond. Stat Med. 2008, 27: 157-172. 10.1002/sim.2929.CrossRefPubMed

14.

Pencina MJ, D’Agostino RB, Steyerberg EW: Extensions of net reclassification improvement calculations to measure usefulness of new biomarkers. Stat Med. 2011, 30: 11-21. 10.1002/sim.4085.CrossRefPubMed

15.

Jonsdottir LS, Sigfusson N, Gudnason V, Sigvaldason H, Thorgeirsson G: Do lipids, blood pressure, diabetes, and smoking confer equal risk of myocardial infarction in women as in men? The Reykjavik Study. J Cardiovasc Risk. 2002, 9: 67-76. 10.1097/00043798-200204000-00001.CrossRefPubMed

16.

Cox DR: Regression Models and Life-Tables. J R Stat Soc Ser B Methodol. 1972, 37: 187-220.

17.

Self SG, Prentice RL: Asymptotic distribution theory and efficiency results for case-cohort studies. Ann Stat. 1988, 16: 64-81. 10.1214/aos/1176350691.CrossRef

18.

Langholz B, Jiao J: Computational methods for case-cohort studies. Comput Stat Data Anal. 2007, 51: 3737-3748. 10.1016/j.csda.2006.12.028.CrossRef

19.

Kulathinal S, Karvanen J, Saarela O, Kuulasmaa K: Case-cohort design in practice - experiences from the MORGAM Project. Epidemiol Perspect Innov. 2007, 4: 15-10.1186/1742-5573-4-15.CrossRefPubMedPubMedCentral

20.

Graf E, Schmoor C, Sauerbrei W, Schumacher M: Assessment and comparison of prognostic classification schemes for survival data. Stat Med. 1999, 18: 2529-2545. 10.1002/(SICI)1097-0258(19990915/30)18:17/18<2529::AID-SIM274>3.0.CO;2-5.CrossRefPubMed

21.

Schemper M, Stare J: Explained variation in survival analysis. Stat Med. 1996, 15: 1999-2012. 10.1002/(SICI)1097-0258(19961015)15:19<1999::AID-SIM353>3.0.CO;2-D.CrossRefPubMed

22.

Steyerberg EW, Vickers AJ, Cook NR, Gerds T, Gonen M, Obuchowski N, Pencina MJ, Kattan MW: Assessing the performance of prediction models: a framework for traditional and novel measures. Epidemiology. 2010, 21: 128-138. 10.1097/EDE.0b013e3181c30fb2.CrossRefPubMedPubMedCentral

23.

Newson R: Confidence intervals for rank statistics: Somers’ D and extensions. Stata J. 2006, 6: 309-334.

24.

Stata Statistical Software: Release 11. 2009, College Station, TX: StataCorp LP

25.

The Emerging Risk Factors Collaboration: Lipid-related markers and cardiovascular disease prediction. JAMA. 2012, 307: 2499-2506.PubMedCentral

26.

The Emerging Risk Factors Collaboration: C-reactive protein, fibrinogen, and cardiovascular disease prediction. NEJM. 2012, 367: 1310-1320.CrossRefPubMedCentral

27.

Gonen M, Heller G: Concordance probability and discriminatory power in proportional hazards regression. Biometrika. 2005, 92: 965-970. 10.1093/biomet/92.4.965.CrossRef

28.

Wolbers M, Koller MT, Witteman JC, Steyerberg EW: Prognostic models with competing risks: methods and application to coronary risk prediction. Epidemiology. 2009, 20: 555-561. 10.1097/EDE.0b013e3181a39056.CrossRefPubMed

The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1471-2288/13/113/prepub

Title: Derivation and assessment of risk prediction models using case-cohort data
Authors: Jean Sanderson
Simon G Thompson
Ian R White
Thor Aspelund
Lisa Pennells
Publication date: 01-12-2013
Publisher: BioMed Central
Published in: BMC Medical Research Methodology / Issue 1/2013
Electronic ISSN: 1471-2288
DOI: https://doi.org/10.1186/1471-2288-13-113

Keynote webinar | Spotlight on medication adherence

Springer Medicine

Derivation and assessment of risk prediction models using case-cohort data

Abstract

Background

Methods

Results

Conclusions

Keynote webinar | Spotlight on medication adherence

Springer Medicine

Abstract

Background

Methods

Results

Conclusions

Please log in to get access to this content

Other articles of this Issue 1/2013

Misclassification of incident conditions using claims data: impact of varying the period used to exclude pre-existing disease

Direct risk standardisation: a new method for comparing casemix adjusted event rates using complex models

Developing longitudinal qualitative designs: lessons learned and recommendations for health services research

Using the random forest method to detect a response shift in the quality of life of multiple sclerosis patients: a cohort study

A semi-parametric approach to estimate risk functions associated with multi-dimensional exposure profiles: application to smoking and lung cancer

Ascertaining invasive breast cancer cases; the validity of administrative and self-reported data sources in Australia