Skip to main content
Top
Published in: BMC Medical Research Methodology 1/2011

Open Access 01-12-2011 | Debate

One statistical test is sufficient for assessing new predictive markers

Authors: Andrew J Vickers, Angel M Cronin, Colin B Begg

Published in: BMC Medical Research Methodology | Issue 1/2011

Login to get access

Abstract

Background

We have observed that the area under the receiver operating characteristic curve (AUC) is increasingly being used to evaluate whether a novel predictor should be incorporated in a multivariable model to predict risk of disease. Frequently, investigators will approach the issue in two distinct stages: first, by testing whether the new predictor variable is significant in a multivariable regression model; second, by testing differences between the AUC of models with and without the predictor using the same data from which the predictive models were derived. These two steps often lead to discordant conclusions.

Discussion

We conducted a simulation study in which two predictors, X and X*, were generated as standard normal variables with varying levels of predictive strength, represented by means that differed depending on the binary outcome Y. The data sets were analyzed using logistic regression, and likelihood ratio and Wald tests for the incremental contribution of X* were performed. The patient-specific predictors for each of the models were then used as data for a test comparing the two AUCs. Under the null, the size of the likelihood ratio and Wald tests were close to nominal, but the area test was extremely conservative, with test sizes less than 0.006 for all configurations studied. Where X* was associated with outcome, the area test had much lower power than the likelihood ratio and Wald tests.

Summary

Evaluation of the statistical significance of a new predictor when there are existing clinical predictors is most appropriately accomplished in the context of a regression model. Although comparison of AUCs is a conceptually equivalent approach to the likelihood ratio and Wald test, it has vastly inferior statistical properties. Use of both approaches will frequently lead to inconsistent conclusions. Nonetheless, comparison of receiver operating characteristic curves remains a useful descriptive tool for initial evaluation of whether a new predictor might be of clinical relevance.
Literature
1.
go back to reference Harrell FE, Califf RM, Pryor DB, Lee KL, Rosati RA: Evaluating the yield of medical tests. JAMA. 1982, 247 (18): 2543-2546. 10.1001/jama.247.18.2543.CrossRefPubMed Harrell FE, Califf RM, Pryor DB, Lee KL, Rosati RA: Evaluating the yield of medical tests. JAMA. 1982, 247 (18): 2543-2546. 10.1001/jama.247.18.2543.CrossRefPubMed
2.
go back to reference Kattan MW: Judging new markers by their ability to improve predictive accuracy. J Natl Cancer Inst. 2003, 95 (9): 634-635. 10.1093/jnci/95.9.634.CrossRefPubMed Kattan MW: Judging new markers by their ability to improve predictive accuracy. J Natl Cancer Inst. 2003, 95 (9): 634-635. 10.1093/jnci/95.9.634.CrossRefPubMed
3.
go back to reference Pepe MS, Janes H, Longton G, Leisenring W, Newcomb P: Limitations of the odds ratio in gauging the performance of a diagnostic, prognostic, or screening marker. Am J Epidemiol. 2004, 159 (9): 882-890. 10.1093/aje/kwh101.CrossRefPubMed Pepe MS, Janes H, Longton G, Leisenring W, Newcomb P: Limitations of the odds ratio in gauging the performance of a diagnostic, prognostic, or screening marker. Am J Epidemiol. 2004, 159 (9): 882-890. 10.1093/aje/kwh101.CrossRefPubMed
4.
go back to reference Cook NR: Statistical evaluation of prognostic versus diagnostic models: beyond the ROC curve. Clin Chem. 2008, 54 (1): 17-23. 10.1373/clinchem.2007.096529.CrossRefPubMed Cook NR: Statistical evaluation of prognostic versus diagnostic models: beyond the ROC curve. Clin Chem. 2008, 54 (1): 17-23. 10.1373/clinchem.2007.096529.CrossRefPubMed
5.
go back to reference Hlatky MA, Greenland P, Arnett DK, Ballantyne CM, Criqui MH, Elkind MS, Go AS, Harrell FE, Hong Y, Howard BV, et al: Criteria for evaluation of novel markers of cardiovascular risk: a scientific statement from the American Heart Association. Circulation. 2009, 119 (17): 2408-2416. 10.1161/CIRCULATIONAHA.109.192278.CrossRefPubMedPubMedCentral Hlatky MA, Greenland P, Arnett DK, Ballantyne CM, Criqui MH, Elkind MS, Go AS, Harrell FE, Hong Y, Howard BV, et al: Criteria for evaluation of novel markers of cardiovascular risk: a scientific statement from the American Heart Association. Circulation. 2009, 119 (17): 2408-2416. 10.1161/CIRCULATIONAHA.109.192278.CrossRefPubMedPubMedCentral
6.
go back to reference Steyerberg EW, Vickers AJ, Cook NR, Gerds T, Gonen M, Obuchowski N, Pencina MJ, Kattan MW: Assessing the performance of prediction models: a framework for traditional and novel measures. Epidemiology. 2010, 21 (1): 128-138. 10.1097/EDE.0b013e3181c30fb2.CrossRefPubMedPubMedCentral Steyerberg EW, Vickers AJ, Cook NR, Gerds T, Gonen M, Obuchowski N, Pencina MJ, Kattan MW: Assessing the performance of prediction models: a framework for traditional and novel measures. Epidemiology. 2010, 21 (1): 128-138. 10.1097/EDE.0b013e3181c30fb2.CrossRefPubMedPubMedCentral
7.
go back to reference Hanley JA, McNeil BJ: A method of comparing the areas under receiver operating characteristic curves derived from the same cases. Radiology. 1983, 148 (3): 839-843.CrossRefPubMed Hanley JA, McNeil BJ: A method of comparing the areas under receiver operating characteristic curves derived from the same cases. Radiology. 1983, 148 (3): 839-843.CrossRefPubMed
8.
go back to reference DeLong ER, DeLong DM, Clarke-Pearson DL: Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics. 1988, 44 (3): 837-845. 10.2307/2531595.CrossRefPubMed DeLong ER, DeLong DM, Clarke-Pearson DL: Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics. 1988, 44 (3): 837-845. 10.2307/2531595.CrossRefPubMed
9.
go back to reference Jansen FH, van Schaik RH, Kurstjens J, Horninger W, Klocker H, Bektic J, Wildhagen MF, Roobol MJ, Bangma CH, Bartsch G: Prostate-Specific Antigen (PSA) Isoform p2PSA in Combination with Total PSA and Free PSA Improves Diagnostic Accuracy in Prostate Cancer Detection. Eur Urol. 2010, 57 (6): 921-7. 10.1016/j.eururo.2010.02.003.CrossRefPubMed Jansen FH, van Schaik RH, Kurstjens J, Horninger W, Klocker H, Bektic J, Wildhagen MF, Roobol MJ, Bangma CH, Bartsch G: Prostate-Specific Antigen (PSA) Isoform p2PSA in Combination with Total PSA and Free PSA Improves Diagnostic Accuracy in Prostate Cancer Detection. Eur Urol. 2010, 57 (6): 921-7. 10.1016/j.eururo.2010.02.003.CrossRefPubMed
10.
go back to reference Mitchell DG, Snyder B, Coakley F, Reinhold C, Thomas G, Amendola MA, Schwartz LH, Woodward P, Pannu H, Atri M, et al: Early invasive cervical cancer: MRI and CT predictors of lymphatic metastases in the ACRIN 6651/GOG 183 intergroup study. Gynecol Oncol. 2009, 112 (1): 95-103. 10.1016/j.ygyno.2008.10.005.CrossRefPubMed Mitchell DG, Snyder B, Coakley F, Reinhold C, Thomas G, Amendola MA, Schwartz LH, Woodward P, Pannu H, Atri M, et al: Early invasive cervical cancer: MRI and CT predictors of lymphatic metastases in the ACRIN 6651/GOG 183 intergroup study. Gynecol Oncol. 2009, 112 (1): 95-103. 10.1016/j.ygyno.2008.10.005.CrossRefPubMed
11.
go back to reference Kaptoge S, Armbrecht G, Felsenberg D, Lunt M, O'Neill TW, Silman AJ, Reeve J: When should the doctor order a spine X-ray? Identifying vertebral fractures for osteoporosis care: results from the European Prospective Osteoporosis Study (EPOS). J Bone Miner Res. 2004, 19 (12): 1982-1993. 10.1359/jbmr.040901.CrossRefPubMed Kaptoge S, Armbrecht G, Felsenberg D, Lunt M, O'Neill TW, Silman AJ, Reeve J: When should the doctor order a spine X-ray? Identifying vertebral fractures for osteoporosis care: results from the European Prospective Osteoporosis Study (EPOS). J Bone Miner Res. 2004, 19 (12): 1982-1993. 10.1359/jbmr.040901.CrossRefPubMed
12.
go back to reference Pierorazio P, Desai M, McCann T, Benson M, McKiernan J: The relationship between preoperative prostate-specific antigen and biopsy Gleason sum in men undergoing radical retropubic prostatectomy: a novel assessment of traditional predictors of outcome. BJU Int. 2009, 103 (1): 38-42. 10.1111/j.1464-410X.2008.07952.x.CrossRefPubMed Pierorazio P, Desai M, McCann T, Benson M, McKiernan J: The relationship between preoperative prostate-specific antigen and biopsy Gleason sum in men undergoing radical retropubic prostatectomy: a novel assessment of traditional predictors of outcome. BJU Int. 2009, 103 (1): 38-42. 10.1111/j.1464-410X.2008.07952.x.CrossRefPubMed
13.
go back to reference Adabag AS, Rector T, Mithani S, Harmala J, Ward HB, Kelly RF, Nguyen JT, McFalls EO, Bloomfield HE: Prognostic significance of elevated cardiac troponin I after heart surgery. Ann Thorac Surg. 2007, 83 (5): 1744-1750. 10.1016/j.athoracsur.2006.12.049.CrossRefPubMed Adabag AS, Rector T, Mithani S, Harmala J, Ward HB, Kelly RF, Nguyen JT, McFalls EO, Bloomfield HE: Prognostic significance of elevated cardiac troponin I after heart surgery. Ann Thorac Surg. 2007, 83 (5): 1744-1750. 10.1016/j.athoracsur.2006.12.049.CrossRefPubMed
14.
go back to reference McCullough PA, Nowak RM, McCord J, Hollander JE, Herrmann HC, Steg PG, Duc P, Westheim A, Omland T, Knudsen CW, et al: B-type natriuretic peptide and clinical judgment in emergency diagnosis of heart failure: analysis from Breathing Not Properly (BNP) Multinational Study. Circulation. 2002, 106 (4): 416-422. 10.1161/01.CIR.0000025242.79963.4C.CrossRefPubMed McCullough PA, Nowak RM, McCord J, Hollander JE, Herrmann HC, Steg PG, Duc P, Westheim A, Omland T, Knudsen CW, et al: B-type natriuretic peptide and clinical judgment in emergency diagnosis of heart failure: analysis from Breathing Not Properly (BNP) Multinational Study. Circulation. 2002, 106 (4): 416-422. 10.1161/01.CIR.0000025242.79963.4C.CrossRefPubMed
15.
go back to reference Folsom AR, Chambless LE, Ballantyne CM, Coresh J, Heiss G, Wu KK, Boerwinkle E, Mosley TH, Sorlie P, Diao G, et al: An assessment of incremental coronary risk prediction using C-reactive protein and other novel risk markers: the atherosclerosis risk in communities study. Arch Intern Med. 2006, 166 (13): 1368-1373. 10.1001/archinte.166.13.1368.CrossRefPubMed Folsom AR, Chambless LE, Ballantyne CM, Coresh J, Heiss G, Wu KK, Boerwinkle E, Mosley TH, Sorlie P, Diao G, et al: An assessment of incremental coronary risk prediction using C-reactive protein and other novel risk markers: the atherosclerosis risk in communities study. Arch Intern Med. 2006, 166 (13): 1368-1373. 10.1001/archinte.166.13.1368.CrossRefPubMed
16.
go back to reference Gallina A, Karakiewicz PI, Hutterer GC, Chun FK, Briganti A, Walz J, Antebi E, Shariat SF, Suardi N, Graefen M, et al: Obesity does not predispose to more aggressive prostate cancer either at biopsy or radical prostatectomy in European men. Int J Cancer. 2007, 121 (4): 791-795. 10.1002/ijc.22730.CrossRefPubMed Gallina A, Karakiewicz PI, Hutterer GC, Chun FK, Briganti A, Walz J, Antebi E, Shariat SF, Suardi N, Graefen M, et al: Obesity does not predispose to more aggressive prostate cancer either at biopsy or radical prostatectomy in European men. Int J Cancer. 2007, 121 (4): 791-795. 10.1002/ijc.22730.CrossRefPubMed
17.
go back to reference Venkatraman ES, Begg CB: A distribution-free procedure for comparing receiver operating characteristic curves from a paired experiment. Biometrika. 1996, 83 (4): 835-848. 10.1093/biomet/83.4.835.CrossRef Venkatraman ES, Begg CB: A distribution-free procedure for comparing receiver operating characteristic curves from a paired experiment. Biometrika. 1996, 83 (4): 835-848. 10.1093/biomet/83.4.835.CrossRef
18.
go back to reference Metz CE: Basic principles of ROC analysis. Semin Nucl Med. 1978, 8 (4): 283-298. 10.1016/S0001-2998(78)80014-2.CrossRefPubMed Metz CE: Basic principles of ROC analysis. Semin Nucl Med. 1978, 8 (4): 283-298. 10.1016/S0001-2998(78)80014-2.CrossRefPubMed
19.
go back to reference Swets JA, Pickett RM: Evaluation of Diagnostic Systems: Methods from Signal Detection Theory. 1982, New York: Academic Press Swets JA, Pickett RM: Evaluation of Diagnostic Systems: Methods from Signal Detection Theory. 1982, New York: Academic Press
20.
go back to reference Wieand HS, Gail MH, James BR, James KL: A family of nonparametric statistics for comparing diagnostic markers with paired or unpaired data. Biometrika. 1989, 76 (3): 585-592. 10.1093/biomet/76.3.585.CrossRef Wieand HS, Gail MH, James BR, James KL: A family of nonparametric statistics for comparing diagnostic markers with paired or unpaired data. Biometrika. 1989, 76 (3): 585-592. 10.1093/biomet/76.3.585.CrossRef
21.
go back to reference Steyerberg E: Clinical Prediction Models: A Practical Approach to Development, Validation, and Updating. 2009, New York: SpringerCrossRef Steyerberg E: Clinical Prediction Models: A Practical Approach to Development, Validation, and Updating. 2009, New York: SpringerCrossRef
22.
go back to reference Pencina MJ, D'Agostino RB, D'Agostino RB, Vasan RS: Evaluating the added predictive ability of a new marker: from area under the ROC curve to reclassification and beyond. Stat Med. 2008, 27 (2): 157-172. 10.1002/sim.2929. discussion 207-112CrossRefPubMed Pencina MJ, D'Agostino RB, D'Agostino RB, Vasan RS: Evaluating the added predictive ability of a new marker: from area under the ROC curve to reclassification and beyond. Stat Med. 2008, 27 (2): 157-172. 10.1002/sim.2929. discussion 207-112CrossRefPubMed
23.
go back to reference Cook NR: Use and misuse of the receiver operating characteristic curve in risk prediction. Circulation. 2007, 115 (7): 928-935. 10.1161/CIRCULATIONAHA.106.672402.CrossRefPubMed Cook NR: Use and misuse of the receiver operating characteristic curve in risk prediction. Circulation. 2007, 115 (7): 928-935. 10.1161/CIRCULATIONAHA.106.672402.CrossRefPubMed
24.
go back to reference Vickers AJ, Cronin AM: Traditional statistical methods for evaluating prediction models are uninformative as to clinical value: towards a decision analytic framework. Semin Oncol. 2010, 37 (1): 31-38. 10.1053/j.seminoncol.2009.12.004.CrossRefPubMedPubMedCentral Vickers AJ, Cronin AM: Traditional statistical methods for evaluating prediction models are uninformative as to clinical value: towards a decision analytic framework. Semin Oncol. 2010, 37 (1): 31-38. 10.1053/j.seminoncol.2009.12.004.CrossRefPubMedPubMedCentral
25.
go back to reference Vickers AJ, Cronin AM, Elkin EB, Gonen M: Extensions to decision curve analysis, a novel method for evaluating diagnostic tests, prediction models and molecular markers. BMC Med Inform Decis Mak. 2008, 8: 53-10.1186/1472-6947-8-53.CrossRefPubMedPubMedCentral Vickers AJ, Cronin AM, Elkin EB, Gonen M: Extensions to decision curve analysis, a novel method for evaluating diagnostic tests, prediction models and molecular markers. BMC Med Inform Decis Mak. 2008, 8: 53-10.1186/1472-6947-8-53.CrossRefPubMedPubMedCentral
26.
go back to reference Vickers AJ, Elkin EB: Decision curve analysis: a novel method for evaluating prediction models. Med Decis Making. 2006, 26 (6): 565-574. 10.1177/0272989X06295361.CrossRefPubMedPubMedCentral Vickers AJ, Elkin EB: Decision curve analysis: a novel method for evaluating prediction models. Med Decis Making. 2006, 26 (6): 565-574. 10.1177/0272989X06295361.CrossRefPubMedPubMedCentral
27.
go back to reference Baker SG, Cook NR, Vickers A, Kramer BS: Using relative utility curves to evaluate risk prediction. J R Stat Soc Ser A Stat Soc. 2009, 172 (4): 729-748. 10.1111/j.1467-985X.2009.00592.x.CrossRefPubMedPubMedCentral Baker SG, Cook NR, Vickers A, Kramer BS: Using relative utility curves to evaluate risk prediction. J R Stat Soc Ser A Stat Soc. 2009, 172 (4): 729-748. 10.1111/j.1467-985X.2009.00592.x.CrossRefPubMedPubMedCentral
Metadata
Title
One statistical test is sufficient for assessing new predictive markers
Authors
Andrew J Vickers
Angel M Cronin
Colin B Begg
Publication date
01-12-2011
Publisher
BioMed Central
Published in
BMC Medical Research Methodology / Issue 1/2011
Electronic ISSN: 1471-2288
DOI
https://doi.org/10.1186/1471-2288-11-13

Other articles of this Issue 1/2011

BMC Medical Research Methodology 1/2011 Go to the issue