Skip to main content
Top
Published in: BMC Cancer 1/2023

Open Access 01-12-2023 | Colorectal Cancer | Research

Machine learning-based colorectal cancer prediction using global dietary data

Authors: Hanif Abdul Rahman, Mohammad Ashraf Ottom, Ivo D. Dinov

Published in: BMC Cancer | Issue 1/2023

Login to get access

Abstract

Background

Colorectal cancer (CRC) is the third most commonly diagnosed cancer worldwide. Active health screening for CRC yielded detection of an increasingly younger adults. However, current machine learning algorithms that are trained using older adults and smaller datasets, may not perform well in practice for large populations.

Aim

To evaluate machine learning algorithms using large datasets accounting for both younger and older adults from multiple regions and diverse sociodemographics.

Methods

A large dataset including 109,343 participants in a dietary-based colorectal cancer ase study from Canada, India, Italy, South Korea, Mexico, Sweden, and the United States was collected by the Center for Disease Control and Prevention. This global dietary database was augmented with other publicly accessible information from multiple sources. Nine supervised and unsupervised machine learning algorithms were evaluated on the aggregated dataset.

Results

Both supervised and unsupervised models performed well in predicting CRC and non-CRC phenotypes. A prediction model based on an artificial neural network (ANN) was found to be the optimal algorithm with CRC misclassification of 1% and non-CRC misclassification of 3%.

Conclusions

ANN models trained on large heterogeneous datasets may be applicable for both younger and older adults. Such models provide a solid foundation for building effective clinical decision support systems assisting healthcare providers in dietary-related, non-invasive screening that can be applied in large studies. Using optimal algorithms coupled with high compliance to cancer screening is expected to significantly improve early diagnoses and boost the success rate of timely and appropriate cancer interventions.
Appendix
Available only for authorised users
Literature
1.
go back to reference K. Hassibi, Machine learning vs. traditional statistics: different philosophies, different approaches, (2016). Data Science Central. K. Hassibi, Machine learning vs. traditional statistics: different philosophies, different approaches, (2016). Data Science Central.
2.
go back to reference Stewart M. The actual difference between statistics and machine learning. Towar Data Sci. 2019;24:19. Stewart M. The actual difference between statistics and machine learning. Towar Data Sci. 2019;24:19.
3.
go back to reference Bray F, Ferlay J, Soerjomataram I, Siegel RL, Torre LA, Jemal A, Global cancer statistics,. GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2018;68(2018):394–424.CrossRefPubMed Bray F, Ferlay J, Soerjomataram I, Siegel RL, Torre LA, Jemal A, Global cancer statistics,. GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2018;68(2018):394–424.CrossRefPubMed
6.
go back to reference Bénard F, Barkun AN, Martel M, von Renteln D. Systematic review of colorectal cancer screening guidelines for average-risk adults: Summarizing the current global recommendations. World J Gastroenterol. 2018;24:124.CrossRefPubMedPubMedCentral Bénard F, Barkun AN, Martel M, von Renteln D. Systematic review of colorectal cancer screening guidelines for average-risk adults: Summarizing the current global recommendations. World J Gastroenterol. 2018;24:124.CrossRefPubMedPubMedCentral
7.
go back to reference Schreuders EH, Ruco A, Rabeneck L, Schoen RE, Sung JJY, Young GP, Kuipers EJ. Colorectal cancer screening: a global overview of existing programmes. Gut. 2015;64:1637–49.CrossRefPubMed Schreuders EH, Ruco A, Rabeneck L, Schoen RE, Sung JJY, Young GP, Kuipers EJ. Colorectal cancer screening: a global overview of existing programmes. Gut. 2015;64:1637–49.CrossRefPubMed
8.
go back to reference Araghi M, Soerjomataram I, Bardot A, Ferlay J, Cabasag CJ, Morrison DS, De P, Tervonen H, Walsh PM, Bucher O. Changes in colorectal cancer incidence in seven high-income countries: a population-based study, Lancet. Gastroenterol Hepatol. 2019;4:511–8. Araghi M, Soerjomataram I, Bardot A, Ferlay J, Cabasag CJ, Morrison DS, De P, Tervonen H, Walsh PM, Bucher O. Changes in colorectal cancer incidence in seven high-income countries: a population-based study, Lancet. Gastroenterol Hepatol. 2019;4:511–8.
9.
go back to reference Guren MG. The global challenge of colorectal cancer, Lancet. Gastroenterol Hepatol. 2019;4:894–5. Guren MG. The global challenge of colorectal cancer, Lancet. Gastroenterol Hepatol. 2019;4:894–5.
10.
go back to reference Dekker E, Tanis PJ, Vleugels JLA, Kasi PM, Wallace MB. Colorectal cancer. Lancet. 2019;394:1467–80.CrossRefPubMed Dekker E, Tanis PJ, Vleugels JLA, Kasi PM, Wallace MB. Colorectal cancer. Lancet. 2019;394:1467–80.CrossRefPubMed
11.
go back to reference Henderson RH, French D, Maughan T, Adams R, Allemani C, Minicozzi P, Coleman MP, McFerran E, Sullivan R, Lawler M. The economic burden of colorectal cancer across Europe: a population-based cost-of-illness study, Lancet. Gastroenterol Hepatol. 2021;6:709–22. Henderson RH, French D, Maughan T, Adams R, Allemani C, Minicozzi P, Coleman MP, McFerran E, Sullivan R, Lawler M. The economic burden of colorectal cancer across Europe: a population-based cost-of-illness study, Lancet. Gastroenterol Hepatol. 2021;6:709–22.
12.
go back to reference Hossain MJ, Chowdhury UN, Islam MB, Uddin S, Ahmed MB, Quinn JMW, Moni MA. Machine learning and network-based models to identify genetic risk factors to the progression and survival of colorectal cancer. Comput Biol Med. 2021;135:104539.CrossRefPubMed Hossain MJ, Chowdhury UN, Islam MB, Uddin S, Ahmed MB, Quinn JMW, Moni MA. Machine learning and network-based models to identify genetic risk factors to the progression and survival of colorectal cancer. Comput Biol Med. 2021;135:104539.CrossRefPubMed
13.
go back to reference Zhao D, Liu H, Zheng Y, He Y, Lu D, Lyu C. A reliable method for colorectal cancer prediction based on feature selection and support vector machine. Med Biol Eng Comput. 2019;57:901–12.CrossRefPubMed Zhao D, Liu H, Zheng Y, He Y, Lu D, Lyu C. A reliable method for colorectal cancer prediction based on feature selection and support vector machine. Med Biol Eng Comput. 2019;57:901–12.CrossRefPubMed
14.
go back to reference Bingham SA, Day NE, Luben R, Ferrari P, Slimani N, Norat T, Clavel-Chapelon F, Kesse E, Nieters A, Boeing H. Dietary fibre in food and protection against colorectal cancer in the European Prospective Investigation into Cancer and nutrition (EPIC): an observational study. Lancet. 2003;361:1496–501.CrossRefPubMed Bingham SA, Day NE, Luben R, Ferrari P, Slimani N, Norat T, Clavel-Chapelon F, Kesse E, Nieters A, Boeing H. Dietary fibre in food and protection against colorectal cancer in the European Prospective Investigation into Cancer and nutrition (EPIC): an observational study. Lancet. 2003;361:1496–501.CrossRefPubMed
15.
go back to reference Keum N, Giovannucci E. Global burden of colorectal cancer: emerging trends, risk factors and prevention strategies. Nat Rev Gastroenterol Hepatol. 2019;16:713–32.CrossRefPubMed Keum N, Giovannucci E. Global burden of colorectal cancer: emerging trends, risk factors and prevention strategies. Nat Rev Gastroenterol Hepatol. 2019;16:713–32.CrossRefPubMed
16.
go back to reference Murphy N, Moreno V, Hughes DJ, Vodicka L, Vodicka P, Aglago EK, Gunter MJ, Jenab M. Lifestyle and dietary environmental factors in colorectal cancer susceptibility. Mol Aspects Med. 2019;69:2–9.CrossRefPubMed Murphy N, Moreno V, Hughes DJ, Vodicka L, Vodicka P, Aglago EK, Gunter MJ, Jenab M. Lifestyle and dietary environmental factors in colorectal cancer susceptibility. Mol Aspects Med. 2019;69:2–9.CrossRefPubMed
25.
go back to reference Zhang Z. Multiple imputation with multivariate imputation by chained equation (MICE) package. Ann Transl Med. 2016;4:30.PubMedPubMedCentral Zhang Z. Multiple imputation with multivariate imputation by chained equation (MICE) package. Ann Transl Med. 2016;4:30.PubMedPubMedCentral
26.
go back to reference Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: synthetic minority over-sampling technique. J Artif Intell Res. 2002;16:321–57.CrossRef Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: synthetic minority over-sampling technique. J Artif Intell Res. 2002;16:321–57.CrossRef
27.
go back to reference Ripley B, Venables B, Bates DM, Hornik K, Gebhardt A, Firth D, Ripley MB. Package ‘mass.’ Cran R. 2013;538:113–20. Ripley B, Venables B, Bates DM, Hornik K, Gebhardt A, Firth D, Ripley MB. Package ‘mass.’ Cran R. 2013;538:113–20.
28.
go back to reference Kursa MB, Rudnicki WR. Feature selection with the Boruta package. J Stat Softw. 2010;36:1–13.CrossRef Kursa MB, Rudnicki WR. Feature selection with the Boruta package. J Stat Softw. 2010;36:1–13.CrossRef
29.
go back to reference Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B. 1995;57:289–300. Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B. 1995;57:289–300.
30.
go back to reference Zhao M, Fu C, Ji L, Tang K, Zhou M. Feature selection and parameter optimization for support vector machines: A new approach based on genetic algorithm with feature chromosomes. Expert Syst Appl. 2011;38:5197–204.CrossRef Zhao M, Fu C, Ji L, Tang K, Zhou M. Feature selection and parameter optimization for support vector machines: A new approach based on genetic algorithm with feature chromosomes. Expert Syst Appl. 2011;38:5197–204.CrossRef
31.
go back to reference Dinov ID, Data science and predictive analytics: Biomedical and health applications using R, Springer, 2018. Dinov ID, Data science and predictive analytics: Biomedical and health applications using R, Springer, 2018.
32.
go back to reference Dinov ID. Data Science and Predictive Analytics: Biomedical and Health Applications using R, 2nd edition, Springer Series in Applied Machine Learning, ISBN 978-3-031-17482-7. Cham, Switzerland: Springer; 2023. Dinov ID. Data Science and Predictive Analytics: Biomedical and Health Applications using R, 2nd edition, Springer Series in Applied Machine Learning, ISBN 978-3-031-17482-7. Cham, Switzerland: Springer; 2023.
33.
go back to reference Myers RH, Montgomery DC. A tutorial on generalized linear models. J Qual Technol. 1997;29:274–91.CrossRef Myers RH, Montgomery DC. A tutorial on generalized linear models. J Qual Technol. 1997;29:274–91.CrossRef
34.
go back to reference Therneau TM, Atkinson EJ. An introduction to recursive partitioning using the RPART routines. Technical report Mayo Foundation. 1997;61:452. Therneau TM, Atkinson EJ. An introduction to recursive partitioning using the RPART routines. Technical report Mayo Foundation. 1997;61:452.
35.
go back to reference Kuhn M, Wing J, Weston S, Williams A, Keefer C, Engelhardt A, Cooper T, Mayer Z, Kenkel B, Core Team R. 2020 Package ‘caret’. The R Journal 223, no. 7 Kuhn M, Wing J, Weston S, Williams A, Keefer C, Engelhardt A, Cooper T, Mayer Z, Kenkel B, Core Team R. 2020 Package ‘caret’. The R Journal 223, no. 7
36.
go back to reference Nartowt BJ, Hart GR, Muhammad W, Liang Y, Stark GF, Deng J. Robust machine learning for colorectal cancer risk prediction and stratification. Front Big Data. 2020;3:6.CrossRefPubMedPubMedCentral Nartowt BJ, Hart GR, Muhammad W, Liang Y, Stark GF, Deng J. Robust machine learning for colorectal cancer risk prediction and stratification. Front Big Data. 2020;3:6.CrossRefPubMedPubMedCentral
37.
go back to reference Hornbrook MC, Goshen R, Choman E, O’Keeffe-Rosetti M, Kinar Y, Liles EG, Rust KC. Early colorectal cancer detected by machine learning model using gender, age, and complete blood count data. Dig Dis Sci. 2017;62:2719–27.CrossRefPubMed Hornbrook MC, Goshen R, Choman E, O’Keeffe-Rosetti M, Kinar Y, Liles EG, Rust KC. Early colorectal cancer detected by machine learning model using gender, age, and complete blood count data. Dig Dis Sci. 2017;62:2719–27.CrossRefPubMed
38.
go back to reference Gründner J, Prokosch H-U, Stürzl M, Croner R, Christoph J, Toddenroth D. Predicting Clinical Outcomes in Colorectal Cancer Using Machine Learning., in: MIE, 2018: pp. 101–105. Gründner J, Prokosch H-U, Stürzl M, Croner R, Christoph J, Toddenroth D. Predicting Clinical Outcomes in Colorectal Cancer Using Machine Learning., in: MIE, 2018: pp. 101–105.
39.
go back to reference Shiao SPK, Grayson J, Lie A, Yu CH. Personalized nutrition—genes, diet, and related interactive parameters as predictors of cancer in multiethnic colorectal cancer families. Nutrients. 2018;10:795.CrossRefPubMedPubMedCentral Shiao SPK, Grayson J, Lie A, Yu CH. Personalized nutrition—genes, diet, and related interactive parameters as predictors of cancer in multiethnic colorectal cancer families. Nutrients. 2018;10:795.CrossRefPubMedPubMedCentral
40.
go back to reference Hofseth LJ, Hebert JR, Chanda A, Chen H, Love BL, Pena MM, Murphy EA, Sajish M, Sheth A, Buckhaults PJ. Early-onset colorectal cancer: initial clues and current views. Nat Rev Gastroenterol Hepatol. 2020;17:352–64.CrossRefPubMed Hofseth LJ, Hebert JR, Chanda A, Chen H, Love BL, Pena MM, Murphy EA, Sajish M, Sheth A, Buckhaults PJ. Early-onset colorectal cancer: initial clues and current views. Nat Rev Gastroenterol Hepatol. 2020;17:352–64.CrossRefPubMed
42.
go back to reference T Li C Zheng L Zhang Z Zhou R Li 2015 Exploring the risk dietary factors for the colorectal cancer, in, IEEE Int. Conf. Prog. Informatics Comput IEEE 2015 570 573. T Li C Zheng L Zhang Z Zhou R Li 2015 Exploring the risk dietary factors for the colorectal cancer, in, IEEE Int. Conf. Prog. Informatics Comput IEEE 2015 570 573.
43.
go back to reference Abu Zuhri MAZ, Awad M, Najjar S, El Sharif N, Ghrouz I. Colorectal cancer risk factor assessment in Palestine using machine learning models, (2022). Abu Zuhri MAZ, Awad M, Najjar S, El Sharif N, Ghrouz I. Colorectal cancer risk factor assessment in Palestine using machine learning models, (2022).
44.
go back to reference L Zheng E Eniola J Wang M Learning for Colorectal Cancer Risk Prediction, in, 2021 Int. Conf. Cyber-Physical Soc. Intell IEEE 2021 1 6. L Zheng E Eniola J Wang M Learning for Colorectal Cancer Risk Prediction, in, 2021 Int. Conf. Cyber-Physical Soc. Intell IEEE 2021 1 6.
Metadata
Title
Machine learning-based colorectal cancer prediction using global dietary data
Authors
Hanif Abdul Rahman
Mohammad Ashraf Ottom
Ivo D. Dinov
Publication date
01-12-2023
Publisher
BioMed Central
Published in
BMC Cancer / Issue 1/2023
Electronic ISSN: 1471-2407
DOI
https://doi.org/10.1186/s12885-023-10587-x

Other articles of this Issue 1/2023

BMC Cancer 1/2023 Go to the issue
Webinar | 19-02-2024 | 17:30 (CET)

Keynote webinar | Spotlight on antibody–drug conjugates in cancer

Antibody–drug conjugates (ADCs) are novel agents that have shown promise across multiple tumor types. Explore the current landscape of ADCs in breast and lung cancer with our experts, and gain insights into the mechanism of action, key clinical trials data, existing challenges, and future directions.

Dr. Véronique Diéras
Prof. Fabrice Barlesi
Developed by: Springer Medicine