Skip to main content
Top
Published in: BMC Medical Research Methodology 1/2013

Open Access 01-12-2013 | Debate

On the assessment of the added value of new predictive biomarkers

Authors: Weijie Chen, Frank W Samuelson, Brandon D Gallas, Le Kang, Berkman Sahiner, Nicholas Petrick

Published in: BMC Medical Research Methodology | Issue 1/2013

Login to get access

Abstract

Background

The surge in biomarker development calls for research on statistical evaluation methodology to rigorously assess emerging biomarkers and classification models. Recently, several authors reported the puzzling observation that, in assessing the added value of new biomarkers to existing ones in a logistic regression model, statistical significance of new predictor variables does not necessarily translate into a statistically significant increase in the area under the ROC curve (AUC). Vickers et al. concluded that this inconsistency is because AUC “has vastly inferior statistical properties,” i.e., it is extremely conservative. This statement is based on simulations that misuse the DeLong et al. method. Our purpose is to provide a fair comparison of the likelihood ratio (LR) test and the Wald test versus diagnostic accuracy (AUC) tests.

Discussion

We present a test to compare ideal AUCs of nested linear discriminant functions via an F test. We compare it with the LR test and the Wald test for the logistic regression model. The null hypotheses of these three tests are equivalent; however, the F test is an exact test whereas the LR test and the Wald test are asymptotic tests. Our simulation shows that the F test has the nominal type I error even with a small sample size. Our results also indicate that the LR test and the Wald test have inflated type I errors when the sample size is small, while the type I error converges to the nominal value asymptotically with increasing sample size as expected. We further show that the DeLong et al. method tests a different hypothesis and has the nominal type I error when it is used within its designed scope. Finally, we summarize the pros and cons of all four methods we consider in this paper.

Summary

We show that there is nothing inherently less powerful or disagreeable about ROC analysis for showing the usefulness of new biomarkers or characterizing the performance of classification models. Each statistical method for assessing biomarkers and classification models has its own strengths and weaknesses. Investigators need to choose methods based on the assessment purpose, the biomarker development phase at which the assessment is being performed, the available patient data, and the validity of assumptions behind the methodologies.
Appendix
Available only for authorised users
Literature
1.
go back to reference Begg C, Vickers A: One statistical test is sufficient for assessing new predictive markers. BMC Med Res Methodol. 2011, 11 (13): 1-7. Begg C, Vickers A: One statistical test is sufficient for assessing new predictive markers. BMC Med Res Methodol. 2011, 11 (13): 1-7.
2.
3.
go back to reference DeLong ER, DeLong DM, Clarke-Pearson DL: Comparing the areas under two or more correlated receiver operating characteristic curves: A nonparametric approach. Biometrics. 1988, 44 (3): 837-845. 10.2307/2531595.CrossRefPubMed DeLong ER, DeLong DM, Clarke-Pearson DL: Comparing the areas under two or more correlated receiver operating characteristic curves: A nonparametric approach. Biometrics. 1988, 44 (3): 837-845. 10.2307/2531595.CrossRefPubMed
4.
go back to reference Efron B, Tibshirani R: Improvements on cross-validation: the.632+ bootstrap method. J Am Stat Assoc. 1997, 92 (438): 548-560. Efron B, Tibshirani R: Improvements on cross-validation: the.632+ bootstrap method. J Am Stat Assoc. 1997, 92 (438): 548-560.
5.
go back to reference Hosmer DW, Lemeshow S: Applied Logistic Regression. 2004, New York, NY: John Wiley & Sons, 2, illustrated edition Hosmer DW, Lemeshow S: Applied Logistic Regression. 2004, New York, NY: John Wiley & Sons, 2, illustrated edition
6.
go back to reference Su JQ, Liu JS: Linear combinations of multiple diagnostic markers. J Am Stat Assoc. 1993, 88 (424): 1350-1355. 10.1080/01621459.1993.10476417.CrossRef Su JQ, Liu JS: Linear combinations of multiple diagnostic markers. J Am Stat Assoc. 1993, 88 (424): 1350-1355. 10.1080/01621459.1993.10476417.CrossRef
8.
go back to reference Demler OV, Pencina MJ, D’Agostino R: Equivalence of improvement in area under ROC curve and linear discriminant analysis coefficient under assumption of normality. Stat Med. 2011, 30: 1410-1418.PubMed Demler OV, Pencina MJ, D’Agostino R: Equivalence of improvement in area under ROC curve and linear discriminant analysis coefficient under assumption of normality. Stat Med. 2011, 30: 1410-1418.PubMed
9.
go back to reference Chen W, Gallas BD, Yousef WA: Classifier variability: accounting for training and testing. Pattern Recognit. 2012, 45 (7): 2661-2671. 10.1016/j.patcog.2011.12.024.CrossRef Chen W, Gallas BD, Yousef WA: Classifier variability: accounting for training and testing. Pattern Recognit. 2012, 45 (7): 2661-2671. 10.1016/j.patcog.2011.12.024.CrossRef
10.
go back to reference Efron B: The efficiency of logistic regression compared to normal discriminant analysis. J Am Stat Assoc. 1975, 70 (352): 892-898. 10.1080/01621459.1975.10480319.CrossRef Efron B: The efficiency of logistic regression compared to normal discriminant analysis. J Am Stat Assoc. 1975, 70 (352): 892-898. 10.1080/01621459.1975.10480319.CrossRef
11.
go back to reference Hoeffding W: A class of statistics with asymptotically normal distribution. Ann Math Stat. 1948, 19 (3): 293-325. 10.1214/aoms/1177730196.CrossRef Hoeffding W: A class of statistics with asymptotically normal distribution. Ann Math Stat. 1948, 19 (3): 293-325. 10.1214/aoms/1177730196.CrossRef
13.
go back to reference Kerr KF, McClelland RL, Brown ER, Lumley T: Evaluating the incremental value of new biomarkers with integrated discrimination improvement. Am J Epidemiol. 2011, 174: 364-374. 10.1093/aje/kwr086.CrossRefPubMedPubMedCentral Kerr KF, McClelland RL, Brown ER, Lumley T: Evaluating the incremental value of new biomarkers with integrated discrimination improvement. Am J Epidemiol. 2011, 174: 364-374. 10.1093/aje/kwr086.CrossRefPubMedPubMedCentral
14.
go back to reference Chen W, Wagner RF, Yousef WA, Gallas BD: Comparison of classifier performance estimators: a simulation study. 2009 Chen W, Wagner RF, Yousef WA, Gallas BD: Comparison of classifier performance estimators: a simulation study. 2009
15.
go back to reference Sahiner B, Chan HP, Hadjiiski L: Classifier performance prediction for computer-aided diagnosis using a limited dataset. Med Phys. 2008, 35 (4): 1559-10.1118/1.2868757.CrossRefPubMedPubMedCentral Sahiner B, Chan HP, Hadjiiski L: Classifier performance prediction for computer-aided diagnosis using a limited dataset. Med Phys. 2008, 35 (4): 1559-10.1118/1.2868757.CrossRefPubMedPubMedCentral
16.
go back to reference Pepe MS, Etzioni R, Feng Z, Potter JD, Thompson ML, Thornquist M, Winget M, Yasui Y: Phases of biomarker development for early detection of cancer. J Natl Cancer Inst. 2001, 93 (14): 1054-1061. 10.1093/jnci/93.14.1054.CrossRefPubMed Pepe MS, Etzioni R, Feng Z, Potter JD, Thompson ML, Thornquist M, Winget M, Yasui Y: Phases of biomarker development for early detection of cancer. J Natl Cancer Inst. 2001, 93 (14): 1054-1061. 10.1093/jnci/93.14.1054.CrossRefPubMed
17.
go back to reference Baker SG, Kramer BS, McIntosh M, Patterson BH, Shyr Y, Skates S: Evaluating markers for the early detection of cancer: overview of study designs and methods. Clin Trials. 2006, 3: 43-56. 10.1191/1740774506cn130oa.CrossRefPubMed Baker SG, Kramer BS, McIntosh M, Patterson BH, Shyr Y, Skates S: Evaluating markers for the early detection of cancer: overview of study designs and methods. Clin Trials. 2006, 3: 43-56. 10.1191/1740774506cn130oa.CrossRefPubMed
18.
go back to reference Pepe MS, Feng Z, Janes H, Bossuyt PM, Potter JD: Pivotal evaluation of the accuracy of a biomarker used for classification or prediction: standards for study design. J Natl Cancer Inst. 2008, 100 (20): 1432-1438. 10.1093/jnci/djn326.CrossRefPubMedPubMedCentral Pepe MS, Feng Z, Janes H, Bossuyt PM, Potter JD: Pivotal evaluation of the accuracy of a biomarker used for classification or prediction: standards for study design. J Natl Cancer Inst. 2008, 100 (20): 1432-1438. 10.1093/jnci/djn326.CrossRefPubMedPubMedCentral
19.
go back to reference Sen PK: On some convergence properties of U-statistics. Calcutta Stat Assoc Bull. 1960, 10: 1-18. Sen PK: On some convergence properties of U-statistics. Calcutta Stat Assoc Bull. 1960, 10: 1-18.
Metadata
Title
On the assessment of the added value of new predictive biomarkers
Authors
Weijie Chen
Frank W Samuelson
Brandon D Gallas
Le Kang
Berkman Sahiner
Nicholas Petrick
Publication date
01-12-2013
Publisher
BioMed Central
Published in
BMC Medical Research Methodology / Issue 1/2013
Electronic ISSN: 1471-2288
DOI
https://doi.org/10.1186/1471-2288-13-98

Other articles of this Issue 1/2013

BMC Medical Research Methodology 1/2013 Go to the issue