Skip to main content
Top
Published in: BMC Medical Informatics and Decision Making 1/2021

Open Access 01-12-2021 | Respiratory Microbiota | Research article

Kernel principal components based cascade forest towards disease identification with human microbiota

Authors: Jiayu Zhou, Yanqing Ye, Jiang Jiang

Published in: BMC Medical Informatics and Decision Making | Issue 1/2021

Login to get access

Abstract

Background

Numerous pieces of clinical evidence have shown that many phenotypic traits of human disease are related to their gut microbiome, i.e., inflammation, obesity, HIV, and diabetes. Through supervised classification, it is feasible to determine the human disease states by revealing the intestinal microbiota compositional information. However, the abundance matrix of microbiome data is so sparse, an interpretable deep model is crucial to further represent and mine the data for expansion, such as the deep forest model. What’s more, overfitting can still exist in the original deep forest model when dealing with such “large p, small n” biology data. Feature reduction is considered to improve the ensemble forest model especially towards the disease identification in the human microbiota.

Methods

In this work, we propose the kernel principal components based cascade forest method, so-called KPCCF, to classify the disease states of patients by using taxonomic profiles of the microbiome at the family level. In detail, the kernel principal components analysis method is first used to reduce the original dimension of human microbiota datasets. Besides, the processed data is fed into the cascade forest to preliminarily discriminate against the disease state of the samples.

Results

The proposed KPCCF algorithm can represent the small-scale and high-dimension human microbiota datasets with the sparse feature matrix. Systematic comparison experiments demonstrate that our method consistently outperforms the state-of-the-art methods with the comparative study on 4 datasets.

Conclusion

Despite sharing some common characteristics, a one-size-fits-all solution does not exist in any space. The traditional depth model has limitations in the biological application of the unbalanced scale between small samples and high dimensions. KPCCF distinguishes from the standard deep forest model for its excellent performance in the microbiota field. Additionally, compared to other dimensionality reduction methods, the kernel principal components analysis method is more suitable for microbiota datasets.
Literature
1.
go back to reference Dan K, Costello EK, Rob K. Supervised classification of human microbiota. FEMS Microbiol Rev. 2011;35(2):343–59.CrossRef Dan K, Costello EK, Rob K. Supervised classification of human microbiota. FEMS Microbiol Rev. 2011;35(2):343–59.CrossRef
2.
go back to reference Qin J, ea RJ, Li R. A human gut microbial gene catalogue established by metagenomic sequencing. Nature. 2010;464(7285):59–65.CrossRef Qin J, ea RJ, Li R. A human gut microbial gene catalogue established by metagenomic sequencing. Nature. 2010;464(7285):59–65.CrossRef
3.
go back to reference Ilseung C, Blaser MJ. The human microbiome: at the interface of health and disease. Nat Rev Genet. 2012;13(4):260–70.CrossRef Ilseung C, Blaser MJ. The human microbiome: at the interface of health and disease. Nat Rev Genet. 2012;13(4):260–70.CrossRef
4.
go back to reference Koh H, Blaser MJ, Li H. A powerful microbiome-based association test and a microbial taxa discovery framework for comprehensive association mapping. Microbiome. 2017;5(1):45.CrossRef Koh H, Blaser MJ, Li H. A powerful microbiome-based association test and a microbial taxa discovery framework for comprehensive association mapping. Microbiome. 2017;5(1):45.CrossRef
5.
go back to reference Morgan XC, Tickle TL, Sokol H, Gevers D, Devaney KL, Ward DV, Reyes JA, Shah SA, Leleiko N, Snapper SB. Dysfunction of the intestinal microbiome in inflammatory bowel disease and treatment. Genome Biol. 2012;13(9):79.CrossRef Morgan XC, Tickle TL, Sokol H, Gevers D, Devaney KL, Ward DV, Reyes JA, Shah SA, Leleiko N, Snapper SB. Dysfunction of the intestinal microbiome in inflammatory bowel disease and treatment. Genome Biol. 2012;13(9):79.CrossRef
6.
go back to reference Turnbaugh PJ, Ley RE, Mahowald MA, Vincent M, Mardis ER, Gordon JI. An obesity-associated gut microbiome with increased capacity for energy harvest. Nature. 2006;444(7122):1027–31.CrossRef Turnbaugh PJ, Ley RE, Mahowald MA, Vincent M, Mardis ER, Gordon JI. An obesity-associated gut microbiome with increased capacity for energy harvest. Nature. 2006;444(7122):1027–31.CrossRef
7.
go back to reference Walters WA, Xu Z, Knight R. Meta-analyses of human gut microbes associated with obesity and IBD. FEBS Lett. 2014;588(22):4223–33.CrossRef Walters WA, Xu Z, Knight R. Meta-analyses of human gut microbes associated with obesity and IBD. FEBS Lett. 2014;588(22):4223–33.CrossRef
8.
go back to reference Sze MA, Schloss PD. Looking for a signal in the noise: revisiting obesity and the microbiome. Mbio. 2016;7(4):01018–16.CrossRef Sze MA, Schloss PD. Looking for a signal in the noise: revisiting obesity and the microbiome. Mbio. 2016;7(4):01018–16.CrossRef
9.
go back to reference Finucane MM, Sharpton TJ, Laurent TJ, Pollard KS. A taxonomic signature of obesity in the microbiome? Getting to the guts of the matter. PLoS ONE. 2014;9(1):84689.CrossRef Finucane MM, Sharpton TJ, Laurent TJ, Pollard KS. A taxonomic signature of obesity in the microbiome? Getting to the guts of the matter. PLoS ONE. 2014;9(1):84689.CrossRef
10.
go back to reference Dae-Wook K, ea EIZ, Gyoon PJ. Reduced incidence of prevotella and other fermenters in intestinal microflora of autistic children. PLoS ONE. 2013;8(7):68322.CrossRef Dae-Wook K, ea EIZ, Gyoon PJ. Reduced incidence of prevotella and other fermenters in intestinal microflora of autistic children. PLoS ONE. 2013;8(7):68322.CrossRef
11.
go back to reference Son JS, Ling ZJ, Rowehl LM, Xinyu T, Yuanhao Z, Wei Z, Leighann LK, Gadow KD, Grace G, Robertson CE. Comparison of fecal microbiota in children with autism spectrum disorders and neurotypical siblings in the simons simplex collection. PLoS ONE. 2015;10(10):0137725. Son JS, Ling ZJ, Rowehl LM, Xinyu T, Yuanhao Z, Wei Z, Leighann LK, Gadow KD, Grace G, Robertson CE. Comparison of fecal microbiota in children with autism spectrum disorders and neurotypical siblings in the simons simplex collection. PLoS ONE. 2015;10(10):0137725.
12.
go back to reference Hooper LV, Dan LR, Macpherson AJ. Interactions between the microbiota and the immune system. Science. 2012;336(6086):1268–73.CrossRef Hooper LV, Dan LR, Macpherson AJ. Interactions between the microbiota and the immune system. Science. 2012;336(6086):1268–73.CrossRef
13.
go back to reference Hsiao EY, Mcbride SW, Sophia H, Gil S, Hyde ER, Tyler MC, Codelli JA, Janet C, Reisman SE, Petrosino JF. Microbiota modulate behavioral and physiological abnormalities associated with neurodevelopmental disorders. Cell. 2013;155(7):1451–63.CrossRef Hsiao EY, Mcbride SW, Sophia H, Gil S, Hyde ER, Tyler MC, Codelli JA, Janet C, Reisman SE, Petrosino JF. Microbiota modulate behavioral and physiological abnormalities associated with neurodevelopmental disorders. Cell. 2013;155(7):1451–63.CrossRef
14.
go back to reference Wang T, ea QY, Cai G. Structural segregation of gut microbiota between colorectal cancer patients and healthy volunteers. ISME J. 2011;6(2):320–9.CrossRef Wang T, ea QY, Cai G. Structural segregation of gut microbiota between colorectal cancer patients and healthy volunteers. ISME J. 2011;6(2):320–9.CrossRef
15.
go back to reference Baxter NT, Ruffin MT, Rogers MAM, Schloss PD. Microbiota-based model improves the sensitivity of fecal immunochemical test for detecting colonic lesions. Genome Med. 2016;8(1):37.CrossRef Baxter NT, Ruffin MT, Rogers MAM, Schloss PD. Microbiota-based model improves the sensitivity of fecal immunochemical test for detecting colonic lesions. Genome Med. 2016;8(1):37.CrossRef
16.
go back to reference Zeller G, Tap J, Voigt AY, Sunagawa S, Kultima JR, Costea PI, Amiot A, Böhm J, Brunetti F, Habermann N. Potential of fecal microbiota for early-stage detection of colorectal cancer. Mol Syst Biol. 2015;10(11):766.CrossRef Zeller G, Tap J, Voigt AY, Sunagawa S, Kultima JR, Costea PI, Amiot A, Böhm J, Brunetti F, Habermann N. Potential of fecal microbiota for early-stage detection of colorectal cancer. Mol Syst Biol. 2015;10(11):766.CrossRef
17.
20.
go back to reference Duvallet C, Gibbons SM, Gurry T, Irizarry RA, Alm EJ. Meta-analysis of gut microbiome studies identifies disease-specific and shared responses. Nat Commun. 2017;8(1):1784.CrossRef Duvallet C, Gibbons SM, Gurry T, Irizarry RA, Alm EJ. Meta-analysis of gut microbiome studies identifies disease-specific and shared responses. Nat Commun. 2017;8(1):1784.CrossRef
22.
go back to reference Hinton GE, T YW, Osindero S. A fast learning algorithm for deep belief nets. Neural Comput. 2006;18(7):1527–54.CrossRef Hinton GE, T YW, Osindero S. A fast learning algorithm for deep belief nets. Neural Comput. 2006;18(7):1527–54.CrossRef
23.
24.
go back to reference Ditzler G, Polikar R, Rosen G. Multi-layer and recursive neural networks for metagenomic classification. IEEE Trans Nanobiosci. 2015;14(6):608.CrossRef Ditzler G, Polikar R, Rosen G. Multi-layer and recursive neural networks for metagenomic classification. IEEE Trans Nanobiosci. 2015;14(6):608.CrossRef
25.
go back to reference Angermueller C, Pärnamaa T, Parts L, Stegle O. Deep learning for computational biology. Mol Syst Biol. 2016;12(7):878.CrossRef Angermueller C, Pärnamaa T, Parts L, Stegle O. Deep learning for computational biology. Mol Syst Biol. 2016;12(7):878.CrossRef
26.
go back to reference Ditzler G, Polikar R, Rosen GL. Multi-layer and recursive neural networks for metagenomic classification. IEEE Trans NanoBiosci. 2015;14:608–16.CrossRef Ditzler G, Polikar R, Rosen GL. Multi-layer and recursive neural networks for metagenomic classification. IEEE Trans NanoBiosci. 2015;14:608–16.CrossRef
27.
go back to reference Zhu Q, Zhu Q. The phylogenetic tree based deep forest for metagenomic data classification. In: 2018 IEEE international conference on bioinformatics and biomedicine (BIBM), 2018; p. 279–82. Zhu Q, Zhu Q. The phylogenetic tree based deep forest for metagenomic data classification. In: 2018 IEEE international conference on bioinformatics and biomedicine (BIBM), 2018; p. 279–82.
28.
go back to reference Fioravanti D, Giarratano Y, Maggio V, Agostinelli C, Chierici M, Jurman G, Furlanello C. Phylogenetic convolutional neural networks in metagenomics. BMC Bioinform. 2018;19(2):49.CrossRef Fioravanti D, Giarratano Y, Maggio V, Agostinelli C, Chierici M, Jurman G, Furlanello C. Phylogenetic convolutional neural networks in metagenomics. BMC Bioinform. 2018;19(2):49.CrossRef
31.
33.
go back to reference Guo Y, Liu S, Li Z, Shang X. BCDForest: a boosting cascade deep forest model towards the classification of cancer subtypes based on gene expression data. BMC Bioinform. 2018;19(Suppl 5):118.CrossRef Guo Y, Liu S, Li Z, Shang X. BCDForest: a boosting cascade deep forest model towards the classification of cancer subtypes based on gene expression data. BMC Bioinform. 2018;19(Suppl 5):118.CrossRef
34.
go back to reference Zhu Q, Pan M, Liu L, Li B, He T, Jiang X, Hu X. An ensemble feature selection method based on deep forest for microbiome-wide association studies. In: 2018 IEEE international conference on bioinformatics and biomedicine (BIBM), 248–253; 2018. Zhu Q, Pan M, Liu L, Li B, He T, Jiang X, Hu X. An ensemble feature selection method based on deep forest for microbiome-wide association studies. In: 2018 IEEE international conference on bioinformatics and biomedicine (BIBM), 248–253; 2018.
35.
go back to reference Ma T, Zhang A. Affinitynet: semi-supervised few-shot learning for disease type prediction. In: AAAI; 2018. Ma T, Zhang A. Affinitynet: semi-supervised few-shot learning for disease type prediction. In: AAAI; 2018.
36.
go back to reference Mika S, Rätsch G, Weston J, Schölkopf B, Smola AJ, Müller KR. Constructing descriptive and discriminative nonlinear features: Rayleigh coefficients in kernel feature spaces. IEEE Trans Pattern Anal Mach Intell. 2003;25(5):623–33.CrossRef Mika S, Rätsch G, Weston J, Schölkopf B, Smola AJ, Müller KR. Constructing descriptive and discriminative nonlinear features: Rayleigh coefficients in kernel feature spaces. IEEE Trans Pattern Anal Mach Intell. 2003;25(5):623–33.CrossRef
37.
go back to reference Lee JM, Yoo CK, Choi SW, Vanrolleghem PA, Lee IB. Nonlinear process monitoring using kernel principal component analysis. Chem Eng Sci. 2004;59(1):223–34.CrossRef Lee JM, Yoo CK, Choi SW, Vanrolleghem PA, Lee IB. Nonlinear process monitoring using kernel principal component analysis. Chem Eng Sci. 2004;59(1):223–34.CrossRef
38.
go back to reference Deng X, Tian X. Nonlinear process fault pattern recognition using statistics kernel PCA similarity factor. Neurocomputing. 2013;121(18):298–308.CrossRef Deng X, Tian X. Nonlinear process fault pattern recognition using statistics kernel PCA similarity factor. Neurocomputing. 2013;121(18):298–308.CrossRef
41.
go back to reference Schubert AM, Rogers MAM, Cathrin R, Jill M, Petrosino JP, Young VB, Aronoff DM, Schloss PD. Microbiome data distinguish patients with clostridium difficile infection and non-c. difficile-associated diarrhea from healthy controls. Mbio. 2014;5(3):01021.CrossRef Schubert AM, Rogers MAM, Cathrin R, Jill M, Petrosino JP, Young VB, Aronoff DM, Schloss PD. Microbiome data distinguish patients with clostridium difficile infection and non-c. difficile-associated diarrhea from healthy controls. Mbio. 2014;5(3):01021.CrossRef
42.
go back to reference Papa E, Docktor M, Smillie C, Weber S, Preheim SP, Gevers D, Giannoukos G, Ciulla D, Tabbaa D, Ingram J. Non-invasive mapping of the gastrointestinal microbiota identifies children with inflammatory bowel disease. PLoS ONE. 2012;7(6):39242.CrossRef Papa E, Docktor M, Smillie C, Weber S, Preheim SP, Gevers D, Giannoukos G, Ciulla D, Tabbaa D, Ingram J. Non-invasive mapping of the gastrointestinal microbiota identifies children with inflammatory bowel disease. PLoS ONE. 2012;7(6):39242.CrossRef
43.
go back to reference Goodrich JK, Waters JL, Poole AC, Sutter JL, Omry K, Ran B, Michelle B, William VT, Rob K, Bell JT. Human genetics shape the gut microbiome. Cell. 2014;159(4):789–99.CrossRef Goodrich JK, Waters JL, Poole AC, Sutter JL, Omry K, Ran B, Michelle B, William VT, Rob K, Bell JT. Human genetics shape the gut microbiome. Cell. 2014;159(4):789–99.CrossRef
44.
go back to reference Dixon B, Candade N. Multispectral landuse classification using neural networks and support vector machines: one or the other, or both? Int J Remote Sens. 2008;29(4):1185–206.CrossRef Dixon B, Candade N. Multispectral landuse classification using neural networks and support vector machines: one or the other, or both? Int J Remote Sens. 2008;29(4):1185–206.CrossRef
45.
go back to reference Wu M, Hughes MC, Parbhoo S, Zazzi M, Roth V, Doshi-Velez F. Beyond sparsity: tree regularization of deep models for interpretability. In: Proceedings of the thirty second AAAI conference on artificial intelligence; 2018. Wu M, Hughes MC, Parbhoo S, Zazzi M, Roth V, Doshi-Velez F. Beyond sparsity: tree regularization of deep models for interpretability. In: Proceedings of the thirty second AAAI conference on artificial intelligence; 2018.
Metadata
Title
Kernel principal components based cascade forest towards disease identification with human microbiota
Authors
Jiayu Zhou
Yanqing Ye
Jiang Jiang
Publication date
01-12-2021
Publisher
BioMed Central
Published in
BMC Medical Informatics and Decision Making / Issue 1/2021
Electronic ISSN: 1472-6947
DOI
https://doi.org/10.1186/s12911-021-01705-5

Other articles of this Issue 1/2021

BMC Medical Informatics and Decision Making 1/2021 Go to the issue