Skip to main content
Top
Published in: BMC Medical Research Methodology 1/2017

Open Access 01-12-2017 | Debate

Towards evidence-based computational statistics: lessons from clinical research on the role and design of real-data benchmark studies

Authors: Anne-Laure Boulesteix, Rory Wilson, Alexander Hapfelmeier

Published in: BMC Medical Research Methodology | Issue 1/2017

Login to get access

Abstract

Background

The goal of medical research is to develop interventions that are in some sense superior, with respect to patient outcome, to interventions currently in use. Similarly, the goal of research in methodological computational statistics is to develop data analysis tools that are themselves superior to the existing tools. The methodology of the evaluation of medical interventions continues to be discussed extensively in the literature and it is now well accepted that medicine should be at least partly “evidence-based”. Although we statisticians are convinced of the importance of unbiased, well-thought-out study designs and evidence-based approaches in the context of clinical research, we tend to ignore these principles when designing our own studies for evaluating statistical methods in the context of our methodological research.

Main message

In this paper, we draw an analogy between clinical trials and real-data-based benchmarking experiments in methodological statistical science, with datasets playing the role of patients and methods playing the role of medical interventions. Through this analogy, we suggest directions for improvement in the design and interpretation of studies which use real data to evaluate statistical methods, in particular with respect to dataset inclusion criteria and the reduction of various forms of bias. More generally, we discuss the concept of “evidence-based” statistical research, its limitations and its impact on the design and interpretation of real-data-based benchmark experiments.

Conclusion

We suggest that benchmark studies—a method of assessment of statistical methods using real-world datasets—might benefit from adopting (some) concepts from evidence-based medicine towards the goal of more evidence-based statistical research.
Literature
6.
go back to reference Boulesteix AL, Hornung R, Sauerbrei W. On fishing for significance and statistician’s degree of freedom in the era of big molecular data In: Wernecke J, Pietsch W, Otte M, editors. Berechenbarkeit der Welt? Philosophie und Wissenschaft Im Zeitalter Von Big Data. Springer VS: 2017. p. 155–170. Boulesteix AL, Hornung R, Sauerbrei W. On fishing for significance and statistician’s degree of freedom in the era of big molecular data In: Wernecke J, Pietsch W, Otte M, editors. Berechenbarkeit der Welt? Philosophie und Wissenschaft Im Zeitalter Von Big Data. Springer VS: 2017. p. 155–170.
10.
go back to reference Boulesteix AL, Strobl C, Augustin T, Daumer M. Evaluating microarray-based classifiers: an overview. Cancer Informat. 2008; 6:77–97. Boulesteix AL, Strobl C, Augustin T, Daumer M. Evaluating microarray-based classifiers: an overview. Cancer Informat. 2008; 6:77–97.
11.
go back to reference Dudoit S, Fridlyand J, Speed TP. Comparison of discrimination methods for the classification of tumors using gene expression data. J Am Stat Assoc. 2002; 97(457):77–87.CrossRef Dudoit S, Fridlyand J, Speed TP. Comparison of discrimination methods for the classification of tumors using gene expression data. J Am Stat Assoc. 2002; 97(457):77–87.CrossRef
12.
go back to reference Romualdi C, Campanaro S, Campagna D, Celegato B, Cannata N, Toppo S, Valle G, Lanfranchi G. Pattern recognition in gene expression profiling using dna array: a comparative study of different statistical methods applied to cancer classification. Hum Mol Genet. 2003; 12:823–36.CrossRefPubMed Romualdi C, Campanaro S, Campagna D, Celegato B, Cannata N, Toppo S, Valle G, Lanfranchi G. Pattern recognition in gene expression profiling using dna array: a comparative study of different statistical methods applied to cancer classification. Hum Mol Genet. 2003; 12:823–36.CrossRefPubMed
13.
go back to reference Li T, Zhang C, Ogihara M. A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression. Bioinformatics. 2004; 20:2429–37.CrossRefPubMed Li T, Zhang C, Ogihara M. A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression. Bioinformatics. 2004; 20:2429–37.CrossRefPubMed
14.
go back to reference Statnikov A, Aliferis CF, Tsamardinos I, Hardin D, Levy S. A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis. Bioinformatics. 2005; 21(5):631–43.CrossRefPubMed Statnikov A, Aliferis CF, Tsamardinos I, Hardin D, Levy S. A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis. Bioinformatics. 2005; 21(5):631–43.CrossRefPubMed
15.
go back to reference Lee JW, Lee JB, Park M, Song SH. An extensive comparison of recent classification tools applied to microarray data. Comput Stat Data Anal. 2005; 48(4):869–85.CrossRef Lee JW, Lee JB, Park M, Song SH. An extensive comparison of recent classification tools applied to microarray data. Comput Stat Data Anal. 2005; 48(4):869–85.CrossRef
16.
go back to reference Huang X, Pan W, Grindle S, Han X, Chen Y, Park S, Miller L, Hall J. A comparative study of discriminating human heart failure etiology using gene expression profiles. BMC Bioinforma. 2005; 6:205.CrossRef Huang X, Pan W, Grindle S, Han X, Chen Y, Park S, Miller L, Hall J. A comparative study of discriminating human heart failure etiology using gene expression profiles. BMC Bioinforma. 2005; 6:205.CrossRef
17.
go back to reference Statnikov A, Wang L, Aliferis C. A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification. BMC Bioinforma. 2008; 9:319.CrossRef Statnikov A, Wang L, Aliferis C. A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification. BMC Bioinforma. 2008; 9:319.CrossRef
21.
go back to reference Fernández-Delgado M, Cernadas E, Barro S, Amorim D. Do we need hundreds of classifiers to solve real world classification problems. J Mach Learn Res. 2014; 15(1):3133–81. Fernández-Delgado M, Cernadas E, Barro S, Amorim D. Do we need hundreds of classifiers to solve real world classification problems. J Mach Learn Res. 2014; 15(1):3133–81.
25.
go back to reference De Angelis C, Drazen JM, Frizelle FA, Haug C, Hoey J, Horton R, Kotzin S, Laine C, Marusic A, Overbeke AJP, et al. Clinical trial registration: a statement from the international committee of medical journal editors. N Engl J Med. 2004; 351(12):1250–51.CrossRefPubMed De Angelis C, Drazen JM, Frizelle FA, Haug C, Hoey J, Horton R, Kotzin S, Laine C, Marusic A, Overbeke AJP, et al. Clinical trial registration: a statement from the international committee of medical journal editors. N Engl J Med. 2004; 351(12):1250–51.CrossRefPubMed
26.
go back to reference Chambers CD. Registered reports: A new publishing initiative at Cortex [editorial]. Cortex. 2013; 49(3):609–10.CrossRefPubMed Chambers CD. Registered reports: A new publishing initiative at Cortex [editorial]. Cortex. 2013; 49(3):609–10.CrossRefPubMed
28.
go back to reference Binder H, Sauerbrei W, Royston P. Comparison between splines and fractional polynomials for multivariable model building with continuous covariates: a simulation study with continuous response. Stat Med. 2013; 32(13):2262–77.CrossRefPubMed Binder H, Sauerbrei W, Royston P. Comparison between splines and fractional polynomials for multivariable model building with continuous covariates: a simulation study with continuous response. Stat Med. 2013; 32(13):2262–77.CrossRefPubMed
30.
go back to reference Duin RP. A note on comparing classifiers. Pattern Recogn Lett. 1996; 17(5):529–36.CrossRef Duin RP. A note on comparing classifiers. Pattern Recogn Lett. 1996; 17(5):529–36.CrossRef
31.
go back to reference Canadian Task Force on the Periodic Health Examination. The periodic health examination. Can Med Assoc J. 1979; 121(9):1193–54. Canadian Task Force on the Periodic Health Examination. The periodic health examination. Can Med Assoc J. 1979; 121(9):1193–54.
32.
go back to reference Boulesteix AL. On representative and illustrative comparisons with real data in bioinformatics: response to the letter to the editor by smith et al. Bioinformatics. 2013; 29(20):2664–6.CrossRefPubMed Boulesteix AL. On representative and illustrative comparisons with real data in bioinformatics: response to the letter to the editor by smith et al. Bioinformatics. 2013; 29(20):2664–6.CrossRefPubMed
34.
go back to reference Sargent DJ. Comparison of artificial neural networks with other statistical approaches: results from medical data sets. Cancer. 2001; 91:1636–42.CrossRefPubMed Sargent DJ. Comparison of artificial neural networks with other statistical approaches: results from medical data sets. Cancer. 2001; 91:1636–42.CrossRefPubMed
35.
go back to reference Couronné R, Probst P, Boulesteix AL. Random forest versus logistic regression: a large-scale benchmark experiment. 2017. Technical Report 205, Department of Statistics, LMU Munich. Couronné R, Probst P, Boulesteix AL. Random forest versus logistic regression: a large-scale benchmark experiment. 2017. Technical Report 205, Department of Statistics, LMU Munich.
Metadata
Title
Towards evidence-based computational statistics: lessons from clinical research on the role and design of real-data benchmark studies
Authors
Anne-Laure Boulesteix
Rory Wilson
Alexander Hapfelmeier
Publication date
01-12-2017
Publisher
BioMed Central
Published in
BMC Medical Research Methodology / Issue 1/2017
Electronic ISSN: 1471-2288
DOI
https://doi.org/10.1186/s12874-017-0417-2

Other articles of this Issue 1/2017

BMC Medical Research Methodology 1/2017 Go to the issue