Skip to main content

Advertisement

Log in

The application of feature engineering in establishing a rapid and robust model for identifying patients with glioma

  • Original Article
  • Published:
Lasers in Medical Science Aims and scope Submit manuscript

Abstract

The aim of the study is to evaluate the efficacy of the combination of Raman spectroscopy with feature engineering and machine learning algorithms for detecting glioma patients. In this study, we used Raman spectroscopy technology to collect serum spectra of glioma patients and healthy people and used feature engineering-based classification models for prediction. First, to reduce the dimensionality of the data, we used two feature extraction algorithms which are partial least squares (PLS) and principal component analysis (PCA). Then, the principal components were selected using the feature selection methods of four correlation indexes, namely, Relief-F (RF), the Pearson correlation coefficient (PCC), the F-score (FS) and term variance (TV). Finally, back-propagation neural network (BP), linear discriminant analysis (LDA) and support vector machine (SVM) classification models were established. To improve the reliability of the model, we used a fivefold cross validation to measure the prediction performance between different models. In this experiment, 33 classification models were established. Integrating 4 classification criteria, PLS-Relief-F-BP, PLS-F-Score-BP, PLS-LDA and PLS-Relief-F-SVM had better effects, and their accuracy rates reached 97.58%, 96.33%, 97.87% and 96.19%, respectively. The experimental results show that feature engineering can select more representative features, reduce computational time complexity and simplify the model. The classification model established in this experiment can not only increase the robustness of the model and shorten the discrimination time but also realize the rapid, stable and accurate diagnosis of glioma patients, which has high clinical application value.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Ferlay J, Shin HR, Bray F, Forman D, Mathers C, Parkin DM (2010) Estimates of worldwide burden of cancer in 2008: GLOBOCAN 2008. Int J Cancer 127:2893–2917

    Article  CAS  Google Scholar 

  2. Zong H, Verhaak RG, Canoll P (2012) The cellular origin for malignant glioma and prospects for clinical advancements. Expert Rev Mol Diagn 12:383–394

    Article  CAS  Google Scholar 

  3. Schwartzbaum JA, Fisher JL, Aldape KD, Wrensch M (2006) Epidemiology and molecular pathology of glioma. Nat Clin Pract Neurol 2:494–503

    Article  Google Scholar 

  4. Ostrom QT, Bauchet L, Davis FG, Deltour I, Fisher JL, Langer CE, Pekmezci M, Schwartzbaum JA, Turner MC, Walsh KM, Wrensch MR, Barnholtz-Sloan JS (2014) The epidemiology of glioma in adults: a “state of the science” review. Neuro Oncol 16:896–913

    Article  CAS  Google Scholar 

  5. Ohgaki H, Kleihues P (2005) Epidemiology and etiology of gliomas. Acta Neuropathol 109:93–108

    Article  Google Scholar 

  6. Zhou X, Zhang S, Niu X, Li T, Zuo M, Yang W, Li M, Li J, Yang Y, Wang X (2020) Risk factors for early mortality among patients with glioma: a population-based study. World Neurosurg 136:e496–e503

    Article  Google Scholar 

  7. Abd-Elghany AA, Naji AA, Alonazi B, Aldosary H, Alsufayan MA, Alnasser M, Mohammad EA, Mahmoud MZ (2019) Radiological characteristics of glioblastoma multiforme using CT and MRI examination. J Radiat Res Appl Sci 12:289–293

    Article  Google Scholar 

  8. Ranjith G, Parvathy R, Vikas V, Chandrasekharan K, Nair S (2015) Machine learning methods for the classification of gliomas: Initial results using features extracted from MR spectroscopy. Neuroradiol J 28:106–111

    Article  CAS  Google Scholar 

  9. Verger A, Langen K-J (2017) PET Imaging in glioblastoma: use in clinical practice. Exon Publications, pp 155-174

  10. Nasiriavanaki M, Xia J, Wan H, Bauer AQ, Culver JP, Wang LV (2014) High-resolution photoacoustic tomography of resting-state functional connectivity in the mouse brain. Proc Natl Acad Sci 111:21–26

    Article  CAS  Google Scholar 

  11. Perry A, Wesseling P (2016) Histologic classification of gliomas. Handb Clin Neurol 134:71–95

    Article  Google Scholar 

  12. Ralbovsky NM, Lednev IK (2020) Towards development of a novel universal medical diagnostic method: Raman spectroscopy and machine learning. Chem Soc Rev 49(20):7428–7453. https://doi.org/10.1039/D0CS01019G

    Article  CAS  PubMed  Google Scholar 

  13. Abramczyk H, Imiela A (2018) The biochemical, nanomechanical and chemometric signatures of brain cancer. Spectrochim Acta Part A Mol Biomol Spectrosc 188:8–19

    Article  CAS  Google Scholar 

  14. Sitnikova VE, Kotkova MA, Nosenko TN, Kotkova TN, Martynova DM, Uspenskaya MV (2020) Breast cancer detection by ATR-FTIR spectroscopy of blood serum and multivariate data-analysis. Talanta 214:120857

    Article  CAS  Google Scholar 

  15. Radzol A, Lee KY, Mansor W, Wong P, Looi I (2017) PCA-MLP SVM distinction of salivary Raman spectra of dengue fever infection. In: 2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), IEEE. p 2875–2878

  16. Chen C, Du G, Tong D, Lv G, Lv X, Si R, Tang J, Li H, Ma H, Mo J (2020) Exploration research on the fusion of multimodal spectrum technology to improve performance of rapid diagnosis scheme for Thyroid Dysfunction. J Biophotonics 13:e201900099

    PubMed  Google Scholar 

  17. Zhang C, Han Y, Sun B, Zhang W, Liu S, Liu J, Lv H, Zhang G, Kang X (2020) Label-free serum detection based on Raman spectroscopy for the diagnosis and classification of glioma. J Raman Spectrosc 51:1977–1985

    Article  CAS  Google Scholar 

  18. Kavitha R, Kannan E (2016) An efficient framework for heart disease classification using feature extraction and feature selection technique in data mining. In: 2016 international conference on emerging trends in engineering, technology and science (icetets), IEEE. p 1–5.

  19. Motoda H, Liu H (2002) Feature selection, extraction and construction, vol 5. Communication of IICM (Institute of Information and Computing Machinery), Taiwan, p 2

  20. Dai H, MacBeth C (1997) Effects of learning parameters on learning procedure and performance of a BPNN. Neural Netw 10:1505–1521

    Article  Google Scholar 

  21. Zheng Y, Vanderbeek B, Daniel E, Stambolian D, Maguire M, Brainard D, Gee J (2013) An automated drusen detection system for classifying age-related macular degeneration with color fundus photographs. In: 2013 IEEE 10th International Symposium on Biomedical Imaging, IEEE. p 1448–1451

  22. Flach P (2012) Machine learning: the art and science of algorithms that make sense of data. Cambridge University Press

    Book  Google Scholar 

  23. Cameron JM, Butler HJ, Smith BR, Hegarty MG, Jenkinson MD, Syed K, Brennan PM, Ashton K, Dawson T, Palmer DS (2019) Developing infrared spectroscopic detection for stratifying brain tumour patients: glioblastoma multiforme vs. lymphoma. Analyst 144:6736–6750

    Article  CAS  Google Scholar 

  24. Yan Z, Ma C, Mo J, Han W, Lv X, Chen C, Chen C, Nie X (2020) Rapid identification of benign and malignant pancreatic tumors using serum Raman spectroscopy combined with classification algorithms. Optik 208:164473

    Article  CAS  Google Scholar 

  25. KutlugSahin E, Ipbuker C, Kavzoglu T (2017) Investigation of automatic feature weighting methods (Fisher, Chi-square and Relief-F) for landslide susceptibility mapping. Geocarto Int 32:956–977

    Article  Google Scholar 

  26. Adler J, Parmryd I (2010) Quantifying colocalization by correlation: the Pearson correlation coefficient is superior to the Mander’s overlap coefficient. Cytometry A 77:733–742

    Article  Google Scholar 

  27. Gao J, Wang Z, Yang Y, Zhang W, Tao C, Guan J, Rao N (2013) A novel approach for lie detection based on F-score and extreme learning machine. PloS one 8:e64704

    Article  CAS  Google Scholar 

  28. Veisi H, Aflaki N, Parsafard P (2020) Variance-based features for keyword extraction in Persian and English text documents. Scientia Iranica 27:1301–1315

    Google Scholar 

  29. Huang P, Li Y, Lv X, Chen W, Liu S (2020) Recognition of common non-normal walking actions based on relief-F feature selection and relief-bagging-SVM. Sensors 20:1447

    Article  Google Scholar 

  30. Saidi R, Bouaguel W, Essoussi N (2019) Hybrid feature selection method based on the genetic algorithm and Pearson correlation coefficient. Machine Learning Paradigms: Theory and Application. Springer

  31. Agnihotri D, Verma K, Tripathi P (2017) Mutual information using sample variance for text feature selection. In: Proceedings of the 3rd International Conference on Communication and Information Processing. p 39–44

  32. Ennett CM, Frize M, Walker CR (2001) Influence of missing values on artificial neural network performance. In: Medinfo. p 449–453

  33. Kaur E, Sahu A, Hole AR, Rajendra J, Chaubal R, Gardi N, Dutt A, Moiyadi A, Krishna CM, Dutt S (2016) Unique spectral markers discern recurrent glioblastoma cells from heterogeneous parent population. Sci Rep 6:1–13

    Article  Google Scholar 

  34. Chen H, Li X, Broderick NG, Xu W (2020) Low-resolution fiber-optic Raman spectroscopy for bladder cancer diagnosis: a comparison study of varying laser power, integration time, and classification methods. J Raman Spectrosc 51:323–334

    Article  CAS  Google Scholar 

  35. Lahmiri S (2017) Glioma detection based on multi-fractal features of segmented brain MRI by particle swarm optimization techniques. Biomed Signal Process Control 31:148–155

    Article  Google Scholar 

  36. Vanitha L, Venmathi A (2011) Classification of medical images using support vector machine, in proceedings of international conference on information and network technology (ICINT 2011)

  37. Blumenthal D, Artzi M, Liberman G, Bokstein F, Aizenstein O, Bashat DB (2017) Classification of high-grade glioma into tumor and nontumor components using support vector machine. Am J Neuroradiol 38:908–914

    Article  CAS  Google Scholar 

  38. Wang X, Zuo M, Song L (2017) A feature selection method based on information gain and BP neural network. In: Chinese intelligent systems Conference. Springer, p 23–30

  39. Golla H, Nettekoven C, Bausewein C, Tonn J-C, Thon N, Feddersen B, Schnell O, Böhlke C, Becker G, Rolke R (2020) Effect of early palliative care for patients with glioblastoma (EPCOG): a randomised phase III clinical trial protocol. BMJ open 10:e034378

    Article  Google Scholar 

  40. Stupp R, Taillibert S, Kanner A, Read W, Steinberg DM, Lhermitte B, Toms S, Idbaih A, Ahluwalia MS, Fink K, Di Meco F, Lieberman F, Zhu J-J, Stragliotto G, Tran DD, Brem S, Hottinger AF, Kirson ED, Lavy-Shahaf G, Weinberg U, Kim C-Y, Paek S-H, Nicholas G, Bruna J, Hirte H, Weller M, Palti Y, Hegi ME, Ram Z (2017) Effect of tumor-treating fields plus maintenance temozolomide vs maintenance temozolomide alone on survival in patients with glioblastoma: a randomized clinical trial. JAMA 318:2306–2316

    Article  CAS  Google Scholar 

  41. Corrochano EB (2005) Handbook of Geometric Computing. Springer

    Book  Google Scholar 

  42. Rosipal R, Krämer N (2005) Overview and recent advances in partial least squares, in international statistical and optimization perspectives workshop" subspace, latent structure and feature selection". Springer, p 34–51.

  43. Zhang D, Chen S, Zhou Z-H (2008) Constraint score: a new filter method for feature selection with pairwise constraints. Pattern Recogn 41:1440–1451

    Article  Google Scholar 

  44. Liu H, Motoda H (1998) Feature extraction, construction and selection: a data mining perspective. Springer Science & Business Media

  45. Mahmood T, Nawaz H, Ditta A, Majeed M, Hanif M, Rashid N, Bhatti H, Nargis H, Saleem M, Bonnier F (2018) Raman spectral analysis for rapid screening of dengue infection. Spectrochim Acta Part A Mol Biomol Spectrosc 200:136–142

    Article  CAS  Google Scholar 

  46. Witkowska E, Jagielski T, Kamińska A (2018) Genus-and species-level identification of dermatophyte fungi by surface-enhanced Raman spectroscopy. Spectrochim Acta Part A Mol Biomol Spectrosc 192:285–290

    Article  CAS  Google Scholar 

  47. Xiao R, Zhang X, Rong Z, Xiu B, Yang X, Wang C, Hao W, Zhang Q, Liu Z, Duan C (2016) Non-invasive detection of hepatocellular carcinoma serum metabolic profile through surface-enhanced Raman spectroscopy. Nanomedicine: Nanotechnology. Biol Med 12:2475–2484

    CAS  Google Scholar 

  48. Bai S-K, Lee S-J, Na H-J, Ha K-S, Han J-A, Lee H, Kwon Y-G, Chung C-K, Kim Y-M (2005) β-Carotene inhibits inflammatory gene expression in lipopolysaccharide-stimulated macrophages by suppressing redox-based NF-κB activation. Exp Mol Med 37:323–334

    Article  CAS  Google Scholar 

  49. Denkert C, Budczies J, Weichert W, Wohlgemuth G, Scholz M, Kind T, Niesporek S, Noske A, Buckendahl A, Dietel M (2008) Metabolite profiling of human colon carcinoma–deregulation of TCA cycle and amino acid turnover. Mol Cancer 7:1–15

    Article  Google Scholar 

  50. Lin J, Cook NR, Albert C, Zaharris E, Gaziano JM, Van Denburgh M, Buring JE, Manson JE (2009) Vitamins C and E and beta carotene supplementation and cancer risk: a randomized controlled trial. J Natl Cancer Inst 101:14–23

    Article  CAS  Google Scholar 

  51. Paraskevaidi M, Ashton KM, Stringfellow HF, Wood NJ, Keating PJ, Rowbottom AW, Martin-Hirsch PL, Martin FL (2018) Raman spectroscopic techniques to detect ovarian cancer biomarkers in blood plasma. Talanta 189:281–288

    Article  CAS  Google Scholar 

  52. Yamamoto H, Yamaji H, Abe Y, Harada K, Waluyo D, Fukusaki E, Kondo A, Ohno H, Fukuda H (2009) Dimensionality reduction for metabolome data using PCA, PLS, OPLS, and RFDA with differential penalties to latent variables. Chemom Intell Lab Syst 98:136–142

    Article  CAS  Google Scholar 

Download references

Funding

This work was supported by the special scientific research project for young medical science (2019Q003), Xinjiang Uygur Autonomous Region Science and Technology Branch Project of China (2019E0282) and National Natural Science Foundation of China (Grant No. 81760444).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Wenjia Guo or Xiaoyi Lv.

Ethics declarations

Ethics approval

All procedures performed in this study involving human participants were in accordance with the ethical standards of the Ethics Committee of the Affiliated Tumor Hospital of Xinjiang Medical University.

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ma, M., Tian, X., Chen, F. et al. The application of feature engineering in establishing a rapid and robust model for identifying patients with glioma. Lasers Med Sci 37, 1007–1015 (2022). https://doi.org/10.1007/s10103-021-03346-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10103-021-03346-6

Keywords

Navigation