Skip to main content
Top
Published in: Insights into Imaging 1/2021

Open Access 01-12-2021 | Original Article

Measuring the bias of incorrect application of feature selection when using cross-validation in radiomics

Author: Aydin Demircioğlu

Published in: Insights into Imaging | Issue 1/2021

Login to get access

Abstract

Background

Many studies in radiomics are using feature selection methods to identify the most predictive features. At the same time, they employ cross-validation to estimate the performance of the developed models. However, if the feature selection is performed before the cross-validation, data leakage can occur, and the results can be biased. To measure the extent of this bias, we collected ten publicly available radiomics datasets and conducted two experiments. First, the models were developed by incorrectly applying the feature selection prior to cross-validation. Then, the same experiment was conducted by applying feature selection correctly within cross-validation to each fold. The resulting models were then evaluated against each other in terms of AUC-ROC, AUC-F1, and Accuracy.

Results

Applying the feature selection incorrectly prior to the cross-validation showed a bias of up to 0.15 in AUC-ROC, 0.29 in AUC-F1, and 0.17 in Accuracy.

Conclusions

Incorrect application of feature selection and cross-validation can lead to highly biased results for radiomic datasets.
Appendix
Available only for authorised users
Literature
10.
go back to reference Kuncheva LI, Matthews CE, Arnaiz-González Á, Rodríguez JJ (2020) Feature Selection from High-Dimensional Data with Very Low Sample Size: A Cautionary Tale. arXiv:2008.12025 Cs Stat Kuncheva LI, Matthews CE, Arnaiz-González Á, Rodríguez JJ (2020) Feature Selection from High-Dimensional Data with Very Low Sample Size: A Cautionary Tale. arXiv:​2008.​12025 Cs Stat
17.
go back to reference Xiong X, Wang J, Hu S, Dai Y, Zhang Y, Hu C (2021) Differentiating between multiple myeloma and metastasis subtypes of lumbar vertebra lesions using machine learning-based radiomics. Front Oncol 11:601699 Xiong X, Wang J, Hu S, Dai Y, Zhang Y, Hu C (2021) Differentiating between multiple myeloma and metastasis subtypes of lumbar vertebra lesions using machine learning-based radiomics. Front Oncol 11:601699
18.
go back to reference Wen L, Weng S, Yan C et al (2021) A radiomics nomogram for preoperative prediction of early recurrence of small hepatocellular carcinoma after surgical resection or radiofrequency ablation. Front Oncol 11:657039 Wen L, Weng S, Yan C et al (2021) A radiomics nomogram for preoperative prediction of early recurrence of small hepatocellular carcinoma after surgical resection or radiofrequency ablation. Front Oncol 11:657039
19.
go back to reference Wang Q, Zhang Y, Zhang E et al (2021) Prediction of the early recurrence in spinal giant cell tumor of bone using radiomics of preoperative CT: Long-term outcome of 62 consecutive patients. J Bone Oncol 27:100354 Wang Q, Zhang Y, Zhang E et al (2021) Prediction of the early recurrence in spinal giant cell tumor of bone using radiomics of preoperative CT: Long-term outcome of 62 consecutive patients. J Bone Oncol 27:100354
20.
go back to reference Shi Y, Wahle E, Du Q et al (2021) Associations between statin/omega3 usage and MRI-based radiomics signatures in prostate cancer. Diagnostics 11:85CrossRef Shi Y, Wahle E, Du Q et al (2021) Associations between statin/omega3 usage and MRI-based radiomics signatures in prostate cancer. Diagnostics 11:85CrossRef
21.
go back to reference Sartoretti E (2021) Amide proton transfer weighted (APTw) imaging based radiomics allows for the differentiation of gliomas from metastases. Sci Rep 11:1–8CrossRef Sartoretti E (2021) Amide proton transfer weighted (APTw) imaging based radiomics allows for the differentiation of gliomas from metastases. Sci Rep 11:1–8CrossRef
23.
go back to reference Naranjo ID, Gibbs P, Reiner JS et al (2021) Radiomics and machine learning with multiparametric breast MRI for improved diagnostic accuracy in breastcancer diagnosis. Diagnostics 11:919CrossRef Naranjo ID, Gibbs P, Reiner JS et al (2021) Radiomics and machine learning with multiparametric breast MRI for improved diagnostic accuracy in breastcancer diagnosis. Diagnostics 11:919CrossRef
24.
go back to reference Mulford K, Chen C, Dusenbery K et al (2021) A radiomics-based model for predicting local control of resected brain metastases receiving adjuvant SRS. Clin Transl Radiat Oncol 29:27–32 Mulford K, Chen C, Dusenbery K et al (2021) A radiomics-based model for predicting local control of resected brain metastases receiving adjuvant SRS. Clin Transl Radiat Oncol 29:27–32
25.
go back to reference Li Z, Ma X, Shen F, Lu H, Xia Y, Lu J (2021) Evaluating treatment response to neoadjuvant chemoradiotherapy in rectal cancer using various MRI-based radiomics models. BMC Med Imaging 21:1–10 Li Z, Ma X, Shen F, Lu H, Xia Y, Lu J (2021) Evaluating treatment response to neoadjuvant chemoradiotherapy in rectal cancer using various MRI-based radiomics models. BMC Med Imaging 21:1–10
26.
go back to reference Krajnc D, Papp L, Nakuz TS et al (2021) Breast tumor characterization using [18F]FDG-PET/CT imaging combined with data preprocessing and radiomics. Cancers 13:1249CrossRef Krajnc D, Papp L, Nakuz TS et al (2021) Breast tumor characterization using [18F]FDG-PET/CT imaging combined with data preprocessing and radiomics. Cancers 13:1249CrossRef
28.
go back to reference Kawahara D, Tang X, Lee CK, Nagata Y, Watanabe Y (2021) Predicting the local response of metastatic brain tumor to gamma knife radiosurgery by radiomics with a machine learning method. Front Oncol 10:569461 Kawahara D, Tang X, Lee CK, Nagata Y, Watanabe Y (2021) Predicting the local response of metastatic brain tumor to gamma knife radiosurgery by radiomics with a machine learning method. Front Oncol 10:569461
29.
go back to reference Bevilacqua A, Mottola M, Ferroni F, Rossi A, Gavelli G, Barone D (2021) The primacy of high B-value 3T-DWI radiomics in the prediction of clinically significant prostate cancer. Diagnostics 11:739CrossRef Bevilacqua A, Mottola M, Ferroni F, Rossi A, Gavelli G, Barone D (2021) The primacy of high B-value 3T-DWI radiomics in the prediction of clinically significant prostate cancer. Diagnostics 11:739CrossRef
39.
go back to reference Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182 Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182
Metadata
Title
Measuring the bias of incorrect application of feature selection when using cross-validation in radiomics
Author
Aydin Demircioğlu
Publication date
01-12-2021
Publisher
Springer International Publishing
Published in
Insights into Imaging / Issue 1/2021
Electronic ISSN: 1869-4101
DOI
https://doi.org/10.1186/s13244-021-01115-1

Other articles of this Issue 1/2021

Insights into Imaging 1/2021 Go to the issue