Skip to main content
Top
Published in: Journal of Translational Medicine 1/2021

Open Access 01-12-2021 | Research

iTTCA-RF: a random forest predictor for tumor T cell antigens

Authors: Shihu Jiao, Quan Zou, Huannan Guo, Lei Shi

Published in: Journal of Translational Medicine | Issue 1/2021

Login to get access

Abstract

Background

Cancer is one of the most serious diseases threatening human health. Cancer immunotherapy represents the most promising treatment strategy due to its high efficacy and selectivity and lower side effects compared with traditional treatment. The identification of tumor T cell antigens is one of the most important tasks for antitumor vaccines development and molecular function investigation. Although several machine learning predictors have been developed to identify tumor T cell antigen, more accurate tumor T cell antigen identification by existing methodology is still challenging.

Methods

In this study, we used a non-redundant dataset of 592 tumor T cell antigens (positive samples) and 393 tumor T cell antigens (negative samples). Four types feature encoding methods have been studied to build an efficient predictor, including amino acid composition, global protein sequence descriptors and grouped amino acid and peptide composition. To improve the feature representation ability of the hybrid features, we further employed a two-step feature selection technique to search for the optimal feature subset. The final prediction model was constructed using random forest algorithm.

Results

Finally, the top 263 informative features were selected to train the random forest classifier for detecting tumor T cell antigen peptides. iTTCA-RF provides satisfactory performance, with balanced accuracy, specificity and sensitivity values of 83.71%, 78.73% and 88.69% over tenfold cross-validation as well as 73.14%, 62.67% and 83.61% over independent tests, respectively. The online prediction server was freely accessible at http://​lab.​malab.​cn/​~acy/​iTTCA.

Conclusions

We have proven that the proposed predictor iTTCA-RF is superior to the other latest models, and will hopefully become an effective and useful tool for identifying tumor T cell antigens presented in the context of major histocompatibility complex class I.
Appendix
Available only for authorised users
Literature
1.
go back to reference Zhang ZM, et al. Early diagnosis of pancreatic ductal adenocarcinoma by combining relative expression orderings with machine-learning method. Front Cell Dev Biol. 2020;8:582864.PubMedPubMedCentralCrossRef Zhang ZM, et al. Early diagnosis of pancreatic ductal adenocarcinoma by combining relative expression orderings with machine-learning method. Front Cell Dev Biol. 2020;8:582864.PubMedPubMedCentralCrossRef
2.
go back to reference Cheng L, et al. DincRNA: a comprehensive web-based bioinformatics toolkit for exploring disease associations and ncRNA function. Bioinformatics. 2018;34(11):1953–6.PubMedCrossRef Cheng L, et al. DincRNA: a comprehensive web-based bioinformatics toolkit for exploring disease associations and ncRNA function. Bioinformatics. 2018;34(11):1953–6.PubMedCrossRef
3.
go back to reference Burugu S, Dancsok AR, Nielsen TO. Emerging targets in cancer immunotherapy. Semin Cancer Biol. 2018;52:39–52.PubMedCrossRef Burugu S, Dancsok AR, Nielsen TO. Emerging targets in cancer immunotherapy. Semin Cancer Biol. 2018;52:39–52.PubMedCrossRef
4.
go back to reference Dong Y-M, et al. ESDA: an improved approach to accurately identify human snoRNAs for precision cancer therapy. Curr Bioinform. 2020;15(1):34–40.CrossRef Dong Y-M, et al. ESDA: an improved approach to accurately identify human snoRNAs for precision cancer therapy. Curr Bioinform. 2020;15(1):34–40.CrossRef
5.
6.
go back to reference Behl T, et al. Gene therapy in the management of Parkinson’s disease: potential of gdnf as a promising therapeutic strategy. Curr Gene Ther. 2020;20(3):207–22.PubMedCrossRef Behl T, et al. Gene therapy in the management of Parkinson’s disease: potential of gdnf as a promising therapeutic strategy. Curr Gene Ther. 2020;20(3):207–22.PubMedCrossRef
8.
go back to reference Li Z, et al. Research on gastric cancer’s drug-resistant gene regulatory network model. Curr Bioinform. 2020;15(3):225–34.CrossRef Li Z, et al. Research on gastric cancer’s drug-resistant gene regulatory network model. Curr Bioinform. 2020;15(3):225–34.CrossRef
9.
go back to reference Ding Y, Tang J, Guo F. Identification of drug-target interactions via dual laplacian regularized least squares with multiple kernel fusion. Knowl Based Syst. 2020;204:106254.CrossRef Ding Y, Tang J, Guo F. Identification of drug-target interactions via dual laplacian regularized least squares with multiple kernel fusion. Knowl Based Syst. 2020;204:106254.CrossRef
10.
go back to reference Ding Y, Tang J, Guo F. Identification of drug-target interactions via fuzzy bipartite local model. Neural Comput Appl. 2020;23:10303–19.CrossRef Ding Y, Tang J, Guo F. Identification of drug-target interactions via fuzzy bipartite local model. Neural Comput Appl. 2020;23:10303–19.CrossRef
11.
go back to reference Ding Y, Tang J, Guo F. Identification of drug-target interactions via multiple information integration. Inf Sci. 2017;418:546–60.CrossRef Ding Y, Tang J, Guo F. Identification of drug-target interactions via multiple information integration. Inf Sci. 2017;418:546–60.CrossRef
12.
go back to reference Zhang G, et al. TANTIGEN 2.0: a knowledge base of tumor T cell antigens and epitopes. BMC Bioinform. 2021;22:1–8.CrossRef Zhang G, et al. TANTIGEN 2.0: a knowledge base of tumor T cell antigens and epitopes. BMC Bioinform. 2021;22:1–8.CrossRef
13.
go back to reference Zhao X, et al. Predicting drug side effects with compact integration of heterogeneous networks. Curr Bioinform. 2019;14(8):709–20.CrossRef Zhao X, et al. Predicting drug side effects with compact integration of heterogeneous networks. Curr Bioinform. 2019;14(8):709–20.CrossRef
14.
go back to reference Ding Y, Tang J, Guo F. Identification of drug-side effect association via multiple information integration with centered kernel alignment. Neurocomputing. 2019;325:211–24.CrossRef Ding Y, Tang J, Guo F. Identification of drug-side effect association via multiple information integration with centered kernel alignment. Neurocomputing. 2019;325:211–24.CrossRef
15.
go back to reference Shang Y, et al. Prediction of drug-target interactions based on multi-layer network representation learning. Neurocomputing. 2021;434:80–9.CrossRef Shang Y, et al. Prediction of drug-target interactions based on multi-layer network representation learning. Neurocomputing. 2021;434:80–9.CrossRef
17.
go back to reference Liu Y, et al. A review on the methods of peptide-MHC binding prediction. Curr Bioinform. 2020;15(8):878–88.CrossRef Liu Y, et al. A review on the methods of peptide-MHC binding prediction. Curr Bioinform. 2020;15(8):878–88.CrossRef
18.
go back to reference Wang P, et al. Comprehensive analysis of TCR repertoire in COVID-19 using single cell sequencing. Genomics. 2020;113(2):456–62.PubMedCrossRef Wang P, et al. Comprehensive analysis of TCR repertoire in COVID-19 using single cell sequencing. Genomics. 2020;113(2):456–62.PubMedCrossRef
20.
go back to reference Liu K, Chen W. iMRM: a platform for simultaneously identifying multiple kinds of RNA modifications. Bioinformatics. 2020;36(11):3336–42.PubMedCrossRef Liu K, Chen W. iMRM: a platform for simultaneously identifying multiple kinds of RNA modifications. Bioinformatics. 2020;36(11):3336–42.PubMedCrossRef
21.
go back to reference Ao C, Yu L, Zou Q. Prediction of bio-sequence modifications and the associations with diseases. Brief Funct Genomics. 2021;20(1):1–18.PubMedCrossRef Ao C, Yu L, Zou Q. Prediction of bio-sequence modifications and the associations with diseases. Brief Funct Genomics. 2021;20(1):1–18.PubMedCrossRef
22.
go back to reference Liu B, Gao X, Zhang H. BioSeq-Analysis2.0: an updated platform for analyzing DNA, RNA and protein sequences at sequence level and residue level based on machine learning approaches. Nucleic Acids Res. 2019;47(20):e127.PubMedPubMedCentralCrossRef Liu B, Gao X, Zhang H. BioSeq-Analysis2.0: an updated platform for analyzing DNA, RNA and protein sequences at sequence level and residue level based on machine learning approaches. Nucleic Acids Res. 2019;47(20):e127.PubMedPubMedCentralCrossRef
23.
go back to reference Zulfiqar H, et al. Screening of prospective plant compounds as H1R and CL1R inhibitors and its antiallergic efficacy through molecular docking approach. Comput Math Methods Med. 2021;2021:6683407.CrossRef Zulfiqar H, et al. Screening of prospective plant compounds as H1R and CL1R inhibitors and its antiallergic efficacy through molecular docking approach. Comput Math Methods Med. 2021;2021:6683407.CrossRef
24.
go back to reference Yang H, et al. Risk prediction of diabetes: big data mining with fusion of multifarious physical examination indicators. Inf Fus. 2021;75:140–9.CrossRef Yang H, et al. Risk prediction of diabetes: big data mining with fusion of multifarious physical examination indicators. Inf Fus. 2021;75:140–9.CrossRef
25.
go back to reference Yu L, Shi Y, Zou Q, Wang S, Zheng L, Gao L. Exploring drug treatment patterns based on the action of drug and multilayer network model. Int J Mol Sci. 2020;21(14):5014.PubMedCentralCrossRef Yu L, Shi Y, Zou Q, Wang S, Zheng L, Gao L. Exploring drug treatment patterns based on the action of drug and multilayer network model. Int J Mol Sci. 2020;21(14):5014.PubMedCentralCrossRef
26.
go back to reference Fu X, et al. StackCPPred: a stacking and pairwise energy content-based prediction of cell-penetrating peptides and their uptake efficiency. Bioinformatics. 2020;36(10):3028–34.PubMedCrossRef Fu X, et al. StackCPPred: a stacking and pairwise energy content-based prediction of cell-penetrating peptides and their uptake efficiency. Bioinformatics. 2020;36(10):3028–34.PubMedCrossRef
28.
go back to reference Zeng X, et al. Prediction and validation of disease genes using HeteSim scores. IEEE/ACM Trans Comput Biol Bioinf. 2017;14(3):687–95.CrossRef Zeng X, et al. Prediction and validation of disease genes using HeteSim scores. IEEE/ACM Trans Comput Biol Bioinf. 2017;14(3):687–95.CrossRef
29.
go back to reference Cheng L, et al. MetSigDis: a manually curated resource for the metabolic signatures of diseases. Brief Bioinform. 2019;20(1):203–9.PubMedCrossRef Cheng L, et al. MetSigDis: a manually curated resource for the metabolic signatures of diseases. Brief Bioinform. 2019;20(1):203–9.PubMedCrossRef
30.
31.
go back to reference Beltran Lissabet JF, Herrera Belen L, Farias JG. TTAgP 10: a computational tool for the specific prediction of tumor T cell antigens. Comput Biol Chem. 2019;83:107103.PubMedCrossRef Beltran Lissabet JF, Herrera Belen L, Farias JG. TTAgP 10: a computational tool for the specific prediction of tumor T cell antigens. Comput Biol Chem. 2019;83:107103.PubMedCrossRef
32.
go back to reference Ao C, et al. Prediction of antioxidant proteins using hybrid feature representation method and random forest. Genomics. 2020;112(6):4666–74.PubMedCrossRef Ao C, et al. Prediction of antioxidant proteins using hybrid feature representation method and random forest. Genomics. 2020;112(6):4666–74.PubMedCrossRef
33.
go back to reference Charoenkwan P, et al. iTTCA-Hybrid: Improved and robust identification of tumor T cell antigens by utilizing hybrid feature representation. Anal Biochem. 2020;599:113747.PubMedCrossRef Charoenkwan P, et al. iTTCA-Hybrid: Improved and robust identification of tumor T cell antigens by utilizing hybrid feature representation. Anal Biochem. 2020;599:113747.PubMedCrossRef
34.
go back to reference Olsen LR, et al. TANTIGEN: a comprehensive database of tumor T cell antigens. Cancer Immunol Immunother. 2017;66(6):731–5.PubMedCrossRef Olsen LR, et al. TANTIGEN: a comprehensive database of tumor T cell antigens. Cancer Immunol Immunother. 2017;66(6):731–5.PubMedCrossRef
35.
go back to reference Vita R, et al. The immune epitope database (IEDB): 2018 update. Nucleic Acids Res. 2019;47(D1):D339–43.PubMedCrossRef Vita R, et al. The immune epitope database (IEDB): 2018 update. Nucleic Acids Res. 2019;47(D1):D339–43.PubMedCrossRef
36.
37.
go back to reference Chen Z, et al. iLearn: an integrated platform and meta-learner for feature engineering, machine-learning analysis and modeling of DNA, RNA and protein sequence data. Brief Bioinform. 2020;21(3):1047–57.PubMedCrossRef Chen Z, et al. iLearn: an integrated platform and meta-learner for feature engineering, machine-learning analysis and modeling of DNA, RNA and protein sequence data. Brief Bioinform. 2020;21(3):1047–57.PubMedCrossRef
38.
go back to reference Wang H, et al. Identification of membrane protein types via multivariate information fusion with Hilbert-Schmidt Independence Criterion. Neurocomputing. 2020;383:257–69.CrossRef Wang H, et al. Identification of membrane protein types via multivariate information fusion with Hilbert-Schmidt Independence Criterion. Neurocomputing. 2020;383:257–69.CrossRef
39.
go back to reference Li J, et al. DeepAVP: a dual-channel deep neural network for identifying variable-length antiviral peptides. IEEE J Biomed Health Inform. 2020;24(10):3012–9.PubMedCrossRef Li J, et al. DeepAVP: a dual-channel deep neural network for identifying variable-length antiviral peptides. IEEE J Biomed Health Inform. 2020;24(10):3012–9.PubMedCrossRef
40.
go back to reference Shen Y, Tang J, Guo F. Identification of protein subcellular localization via integrating evolutionary and physicochemical information into Chou’s general PseAAC. J Theor Biol. 2019;462:230–9.PubMedCrossRef Shen Y, Tang J, Guo F. Identification of protein subcellular localization via integrating evolutionary and physicochemical information into Chou’s general PseAAC. J Theor Biol. 2019;462:230–9.PubMedCrossRef
41.
go back to reference Shen Y, et al. Critical evaluation of web-based prediction tools for human protein subcellular localization. Brief Bioinform. 2019;21:1628–40.CrossRef Shen Y, et al. Critical evaluation of web-based prediction tools for human protein subcellular localization. Brief Bioinform. 2019;21:1628–40.CrossRef
42.
go back to reference Tang Y-J, Pang Y-H, Liu B. IDP-Seq2Seq: identification of intrinsically disordered regions based on sequence to sequence learning. Bioinformaitcs. 2020;36(21):5177–86.CrossRef Tang Y-J, Pang Y-H, Liu B. IDP-Seq2Seq: identification of intrinsically disordered regions based on sequence to sequence learning. Bioinformaitcs. 2020;36(21):5177–86.CrossRef
44.
go back to reference Cai L, et al. ITP-Pred: an interpretable method for predicting, therapeutic peptides with fused features low-dimension representation. Brief Bioinform. 2020;22:bbaa367.CrossRef Cai L, et al. ITP-Pred: an interpretable method for predicting, therapeutic peptides with fused features low-dimension representation. Brief Bioinform. 2020;22:bbaa367.CrossRef
45.
go back to reference Jin S, et al. Application of deep learning methods in biological networks. Brief Bioinform. 2020;22(2):1902–17.CrossRef Jin S, et al. Application of deep learning methods in biological networks. Brief Bioinform. 2020;22(2):1902–17.CrossRef
46.
go back to reference Zhao T, et al. DeepLGP: a novel deep learning method for prioritizing lncRNA target genes. Bioinformatics. 2020;36:4466–72.PubMedCrossRef Zhao T, et al. DeepLGP: a novel deep learning method for prioritizing lncRNA target genes. Bioinformatics. 2020;36:4466–72.PubMedCrossRef
47.
49.
go back to reference Li Y, Niu M, Zou Q. ELM-MHC: an improved MHC identification method with extreme learning machine algorithm. J Proteome Res. 2019;18(3):1392–401.PubMedCrossRef Li Y, Niu M, Zou Q. ELM-MHC: an improved MHC identification method with extreme learning machine algorithm. J Proteome Res. 2019;18(3):1392–401.PubMedCrossRef
50.
go back to reference Xuan JJ, et al. RMBase v2.0: deciphering the map of RNA modifications from epitranscriptome sequencing data. Nucleic Acids Res. 2018;46:D327–34.PubMedCrossRef Xuan JJ, et al. RMBase v2.0: deciphering the map of RNA modifications from epitranscriptome sequencing data. Nucleic Acids Res. 2018;46:D327–34.PubMedCrossRef
51.
go back to reference Lin C-W, et al. Kaempferol reduces matrix metalloproteinase-2 expression by down-regulating ERK1/2 and the activator protein-1 signaling pathways in oral cancer cells. PLoS ONE. 2013;8(11):e80883.PubMedPubMedCentralCrossRef Lin C-W, et al. Kaempferol reduces matrix metalloproteinase-2 expression by down-regulating ERK1/2 and the activator protein-1 signaling pathways in oral cancer cells. PLoS ONE. 2013;8(11):e80883.PubMedPubMedCentralCrossRef
52.
go back to reference Chen Z, et al. iFeature: a Python package and web server for features extraction and selection from protein and peptide sequences. Bioinformatics. 2018;34(14):2499–502.PubMedPubMedCentralCrossRef Chen Z, et al. iFeature: a Python package and web server for features extraction and selection from protein and peptide sequences. Bioinformatics. 2018;34(14):2499–502.PubMedPubMedCentralCrossRef
53.
go back to reference Wei L, Tang J, Zou Q. SkipCPP-Pred: an improved and promising sequence-based predictor for predicting cell-penetrating peptides. BMC Genom. 2017;18:1.CrossRef Wei L, Tang J, Zou Q. SkipCPP-Pred: an improved and promising sequence-based predictor for predicting cell-penetrating peptides. BMC Genom. 2017;18:1.CrossRef
54.
go back to reference Wei L, et al. ACPred-FL: a sequence-based predictor using effective feature representation to improve the prediction of anti-cancer peptides. Bioinformatics. 2018;34(23):4007–16.PubMedPubMedCentralCrossRef Wei L, et al. ACPred-FL: a sequence-based predictor using effective feature representation to improve the prediction of anti-cancer peptides. Bioinformatics. 2018;34(23):4007–16.PubMedPubMedCentralCrossRef
55.
go back to reference Zhang D, et al. iBLP: an XGBoost-based predictor for identifying bioluminescent proteins. Comput Math Methods Med. 2021;2021:6664362.PubMedPubMedCentral Zhang D, et al. iBLP: an XGBoost-based predictor for identifying bioluminescent proteins. Comput Math Methods Med. 2021;2021:6664362.PubMedPubMedCentral
57.
go back to reference Chou K-C. Some remarks on protein attribute prediction and pseudo amino acid composition. J Theor Biol. 2011;273(1):236–47.PubMedCrossRef Chou K-C. Some remarks on protein attribute prediction and pseudo amino acid composition. J Theor Biol. 2011;273(1):236–47.PubMedCrossRef
58.
go back to reference Chou KC. Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes. Bioinformatics. 2005;21(1):10–9.PubMedCrossRef Chou KC. Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes. Bioinformatics. 2005;21(1):10–9.PubMedCrossRef
59.
go back to reference Liu B, Zhu Y, Yan K. Fold-LTR-TCP: protein fold recognition based on triadic closure principle. Brief Bioinform. 2020;21(6):2185–93.PubMedCrossRef Liu B, Zhu Y, Yan K. Fold-LTR-TCP: protein fold recognition based on triadic closure principle. Brief Bioinform. 2020;21(6):2185–93.PubMedCrossRef
60.
go back to reference Pedregosa F, et al. Scikit-learn: machine learning in python. J Mach Learn Res. 2011;12:2825–30. Pedregosa F, et al. Scikit-learn: machine learning in python. J Mach Learn Res. 2011;12:2825–30.
61.
go back to reference Blanca MJ, et al. Non-normal data: is ANOVA still a valid option? Psicothema. 2017;29(4):552–7.PubMed Blanca MJ, et al. Non-normal data: is ANOVA still a valid option? Psicothema. 2017;29(4):552–7.PubMed
62.
go back to reference Tang H, Chen W, Lin H. Identification of immunoglobulins using Chou’s pseudo amino acid composition with feature selection technique. Mol BioSyst. 2016;12(4):1269–75.PubMedCrossRef Tang H, Chen W, Lin H. Identification of immunoglobulins using Chou’s pseudo amino acid composition with feature selection technique. Mol BioSyst. 2016;12(4):1269–75.PubMedCrossRef
63.
go back to reference Jung Y, Zhang H, Hu J. Transformed low-rank ANOVA models for high-dimensional variable selection. Stat Methods Med Res. 2019;28(4):1230–46.PubMedCrossRef Jung Y, Zhang H, Hu J. Transformed low-rank ANOVA models for high-dimensional variable selection. Stat Methods Med Res. 2019;28(4):1230–46.PubMedCrossRef
64.
go back to reference Tan JX, et al. Identification of hormone binding proteins based on machine learning methods. Math Biosci Eng. 2019;16(4):2466–80.PubMedCrossRef Tan JX, et al. Identification of hormone binding proteins based on machine learning methods. Math Biosci Eng. 2019;16(4):2466–80.PubMedCrossRef
66.
go back to reference Ju Z, Wang S-Y. iLys-Khib: identify lysine 2-Hydroxyisobutyrylation sites using mRMR feature selection and fuzzy SVM algorithm. Chemom Intell Lab Syst. 2019;191:96–102.CrossRef Ju Z, Wang S-Y. iLys-Khib: identify lysine 2-Hydroxyisobutyrylation sites using mRMR feature selection and fuzzy SVM algorithm. Chemom Intell Lab Syst. 2019;191:96–102.CrossRef
67.
go back to reference Mostafa SS, Morgado-Dias F, Ravelo-Garcia AG. Comparison of SFS and mRMR for oximetry feature selection in obstructive sleep apnea detection. Neural Comput Appl. 2020;32(20):15711–31.CrossRef Mostafa SS, Morgado-Dias F, Ravelo-Garcia AG. Comparison of SFS and mRMR for oximetry feature selection in obstructive sleep apnea detection. Neural Comput Appl. 2020;32(20):15711–31.CrossRef
68.
go back to reference Wang J, Zhang D, Li J. PREAL: prediction of allergenic protein by maximum Relevance Minimum Redundancy (mRMR) feature selection. BMC Syst Biol. 2013;7:1–9.CrossRef Wang J, Zhang D, Li J. PREAL: prediction of allergenic protein by maximum Relevance Minimum Redundancy (mRMR) feature selection. BMC Syst Biol. 2013;7:1–9.CrossRef
69.
go back to reference Meng C, et al. CWLy-pred: a novel cell wall lytic enzyme identifier based on an improved MRMD feature selection method. Genomics. 2020;112(6):4715–21.PubMedCrossRef Meng C, et al. CWLy-pred: a novel cell wall lytic enzyme identifier based on an improved MRMD feature selection method. Genomics. 2020;112(6):4715–21.PubMedCrossRef
71.
go back to reference He S, et al. MRMD2.0: a python tool for machine learning with feature ranking and reduction. Curr Bioinform. 2020;15(10):1213–21.CrossRef He S, et al. MRMD2.0: a python tool for machine learning with feature ranking and reduction. Curr Bioinform. 2020;15(10):1213–21.CrossRef
72.
go back to reference Lu XX, Zhao SZ. Gene-based therapeutic tools in the treatment of cornea disease. Curr Gene Ther. 2019;19(1):7–19.PubMedCrossRef Lu XX, Zhao SZ. Gene-based therapeutic tools in the treatment of cornea disease. Curr Gene Ther. 2019;19(1):7–19.PubMedCrossRef
73.
go back to reference Zou Q, et al. A novel features ranking metric with application to scalable visual and bioinformatics data classification. Neurocomputing. 2016;173:346–54.CrossRef Zou Q, et al. A novel features ranking metric with application to scalable visual and bioinformatics data classification. Neurocomputing. 2016;173:346–54.CrossRef
74.
go back to reference Lemaitre G, Nogueira F, Aridas CK. Imbalanced-learn: a python toolbox to tackle the curse of imbalanced datasets in machine learning. J Mach Learn Res. 2017;18:559–63. Lemaitre G, Nogueira F, Aridas CK. Imbalanced-learn: a python toolbox to tackle the curse of imbalanced datasets in machine learning. J Mach Learn Res. 2017;18:559–63.
75.
go back to reference Yang X-F, et al. Predicting LncRNA subcellular localization using unbalanced pseudo-k nucleotide compositions. Curr Bioinform. 2020;15(6):554–62.CrossRef Yang X-F, et al. Predicting LncRNA subcellular localization using unbalanced pseudo-k nucleotide compositions. Curr Bioinform. 2020;15(6):554–62.CrossRef
76.
go back to reference Hasan MAM, et al. Citrullination site prediction by incorporating sequence coupled effects into PseAAC and resolving data imbalance issue. Curr Bioinform. 2020;15(3):235–45.CrossRef Hasan MAM, et al. Citrullination site prediction by incorporating sequence coupled effects into PseAAC and resolving data imbalance issue. Curr Bioinform. 2020;15(3):235–45.CrossRef
77.
go back to reference Chao L, Wei L, Zou Q. SecProMTB: a SVM-based classifier for secretory proteins of Mycobacterium tuberculosis with imbalanced data set. Proteomics. 2019;19:e1900007.CrossRef Chao L, Wei L, Zou Q. SecProMTB: a SVM-based classifier for secretory proteins of Mycobacterium tuberculosis with imbalanced data set. Proteomics. 2019;19:e1900007.CrossRef
78.
go back to reference Yu L, et al. Prediction of drug response in multilayer networks based on fusion of multiomics data. Methods (San Diego, Calif). 2020;192:85–92.CrossRef Yu L, et al. Prediction of drug response in multilayer networks based on fusion of multiomics data. Methods (San Diego, Calif). 2020;192:85–92.CrossRef
80.
go back to reference Zeng X, et al. Prediction of potential disease-associated microRNAs using structural perturbation method. Bioinformatics. 2018;34(14):2425–32.PubMedCrossRef Zeng X, et al. Prediction of potential disease-associated microRNAs using structural perturbation method. Bioinformatics. 2018;34(14):2425–32.PubMedCrossRef
81.
go back to reference Kaur H, Pannu HS, Malhi AK. A systematic review on imbalanced data challenges in machine learning: applications and solutions. ACM Comput Surv. 2019;52(4):1–36. Kaur H, Pannu HS, Malhi AK. A systematic review on imbalanced data challenges in machine learning: applications and solutions. ACM Comput Surv. 2019;52(4):1–36.
82.
go back to reference Branco P, Torgo L, Ribeiro RP. A survey of predictive modeling on IM balanced domains. ACM Comput Surv. 2016;49(2):1–50.CrossRef Branco P, Torgo L, Ribeiro RP. A survey of predictive modeling on IM balanced domains. ACM Comput Surv. 2016;49(2):1–50.CrossRef
83.
go back to reference Zou Q, et al. Finding the best classification threshold in imbalanced classification. Big Data Res. 2016;5:2–8.CrossRef Zou Q, et al. Finding the best classification threshold in imbalanced classification. Big Data Res. 2016;5:2–8.CrossRef
84.
go back to reference Chawla NV, et al. SMOTE: synthetic minority over-sampling technique. J Artif Intell Res. 2002;16:321–57.CrossRef Chawla NV, et al. SMOTE: synthetic minority over-sampling technique. J Artif Intell Res. 2002;16:321–57.CrossRef
85.
go back to reference Tomek I. Two modifications of CNN. IEEE Trans Syst Man Cybern. 1976;SMC6(11):769–72. Tomek I. Two modifications of CNN. IEEE Trans Syst Man Cybern. 1976;SMC6(11):769–72.
87.
go back to reference Li J, Pu Y, Tang J, Zou Q, Guo F. DeepATT: a hybrid category attention neural network for identifying functional effects of DNA sequences. Brief Bioinform. 2020;22:bbaa59. Li J, Pu Y, Tang J, Zou Q, Guo F. DeepATT: a hybrid category attention neural network for identifying functional effects of DNA sequences. Brief Bioinform. 2020;22:bbaa59.
88.
go back to reference Hong Z, et al. Identifying enhancer-promoter interactions with neural network based on pre-trained DNA vectors and attention mechanism. Bioinformatics. 2020;36(4):1037–43.PubMedCrossRef Hong Z, et al. Identifying enhancer-promoter interactions with neural network based on pre-trained DNA vectors and attention mechanism. Bioinformatics. 2020;36(4):1037–43.PubMedCrossRef
89.
go back to reference Jin Q, et al. DUNet: a deformable network for retinal vessel segmentation. Knowl-Based Syst. 2019;178:149–62.CrossRef Jin Q, et al. DUNet: a deformable network for retinal vessel segmentation. Knowl-Based Syst. 2019;178:149–62.CrossRef
90.
go back to reference Su R, et al. Empirical comparison and analysis of web-based cell-penetrating peptide prediction tools. Brief Bioinform. 2020;21(2):408–20.PubMedCrossRef Su R, et al. Empirical comparison and analysis of web-based cell-penetrating peptide prediction tools. Brief Bioinform. 2020;21(2):408–20.PubMedCrossRef
91.
go back to reference Wei L, Chen H, Su R. M6APred-EL: a sequence-based predictor for identifying n6-methyladenosine sites using ensemble learning. Mol Ther Nucleic Acids. 2018;12:635–44.PubMedPubMedCentralCrossRef Wei L, Chen H, Su R. M6APred-EL: a sequence-based predictor for identifying n6-methyladenosine sites using ensemble learning. Mol Ther Nucleic Acids. 2018;12:635–44.PubMedPubMedCentralCrossRef
92.
go back to reference Wei L, et al. Computational prediction and interpretation of cell-specific replication origin sites from multiple eukaryotes by exploiting stacking framework. Brief Bioinform. 2020;22:bbaa275.CrossRef Wei L, et al. Computational prediction and interpretation of cell-specific replication origin sites from multiple eukaryotes by exploiting stacking framework. Brief Bioinform. 2020;22:bbaa275.CrossRef
93.
go back to reference Wei L, et al. Comparative analysis and prediction of quorum-sensing peptides using feature representation learning and machine learning algorithms. Brief Bioinform. 2020;21(1):106–19. Wei L, et al. Comparative analysis and prediction of quorum-sensing peptides using feature representation learning and machine learning algorithms. Brief Bioinform. 2020;21(1):106–19.
94.
go back to reference Wei L, et al. Improved and promising identification of human MicroRNAs by incorporating a high-quality negative set. IEEE/ACM Trans Comput Biol Bioinf. 2014;11(1):192–201.CrossRef Wei L, et al. Improved and promising identification of human MicroRNAs by incorporating a high-quality negative set. IEEE/ACM Trans Comput Biol Bioinf. 2014;11(1):192–201.CrossRef
95.
go back to reference Wei L, et al. A novel hierarchical selective ensemble classifier with bioinformatics application. Artif Intell Med. 2017;83:82–90.PubMedCrossRef Wei L, et al. A novel hierarchical selective ensemble classifier with bioinformatics application. Artif Intell Med. 2017;83:82–90.PubMedCrossRef
96.
go back to reference Wei L, et al. Improved prediction of protein-protein interactions using novel negative samples, features, and an ensemble classifier. Artif Intell Med. 2017;83:67–74.PubMedCrossRef Wei L, et al. Improved prediction of protein-protein interactions using novel negative samples, features, and an ensemble classifier. Artif Intell Med. 2017;83:67–74.PubMedCrossRef
98.
go back to reference Jiang Q, et al. Predicting human microRNA-disease associations based on support vector machine. Int J Data Min Bioinform. 2013;8(3):282–93.PubMedCrossRef Jiang Q, et al. Predicting human microRNA-disease associations based on support vector machine. Int J Data Min Bioinform. 2013;8(3):282–93.PubMedCrossRef
99.
go back to reference Yu L, Xu F, Gao L. Predict new therapeutic drugs for hepatocellular carcinoma based on gene mutation and expression. Front Bioeng Biotechnol. 2020;8:8.PubMedPubMedCentralCrossRef Yu L, Xu F, Gao L. Predict new therapeutic drugs for hepatocellular carcinoma based on gene mutation and expression. Front Bioeng Biotechnol. 2020;8:8.PubMedPubMedCentralCrossRef
101.
go back to reference Hong Z, et al. Identifying enhancer–promoter interactions with neural network based on pre-trained DNA vectors and attention mechanism. Bioinformatics. 2019;36(4):1037–43.CrossRef Hong Z, et al. Identifying enhancer–promoter interactions with neural network based on pre-trained DNA vectors and attention mechanism. Bioinformatics. 2019;36(4):1037–43.CrossRef
102.
go back to reference van der Maaten L, Hinton G. Visualizing data using t-SNE. J Mach Learn Res. 2008;9:2579–605. van der Maaten L, Hinton G. Visualizing data using t-SNE. J Mach Learn Res. 2008;9:2579–605.
103.
go back to reference Lv H, et al. Deep-Kcr: accurate detection of lysine crotonylation sites using deep learning method. Brief Bioinform. 2020;22:bbaa255.CrossRef Lv H, et al. Deep-Kcr: accurate detection of lysine crotonylation sites using deep learning method. Brief Bioinform. 2020;22:bbaa255.CrossRef
104.
go back to reference Dao FY, et al. DeepYY1: a deep learning approach to identify YY1-mediated chromatin loops. Brief Bioinform. 2020;22:bbaa356.CrossRef Dao FY, et al. DeepYY1: a deep learning approach to identify YY1-mediated chromatin loops. Brief Bioinform. 2020;22:bbaa356.CrossRef
Metadata
Title
iTTCA-RF: a random forest predictor for tumor T cell antigens
Authors
Shihu Jiao
Quan Zou
Huannan Guo
Lei Shi
Publication date
01-12-2021
Publisher
BioMed Central
Published in
Journal of Translational Medicine / Issue 1/2021
Electronic ISSN: 1479-5876
DOI
https://doi.org/10.1186/s12967-021-03084-x

Other articles of this Issue 1/2021

Journal of Translational Medicine 1/2021 Go to the issue
Live Webinar | 27-06-2024 | 18:00 (CEST)

Keynote webinar | Spotlight on medication adherence

Live: Thursday 27th June 2024, 18:00-19:30 (CEST)

WHO estimates that half of all patients worldwide are non-adherent to their prescribed medication. The consequences of poor adherence can be catastrophic, on both the individual and population level.

Join our expert panel to discover why you need to understand the drivers of non-adherence in your patients, and how you can optimize medication adherence in your clinics to drastically improve patient outcomes.

Prof. Kevin Dolgin
Prof. Florian Limbourg
Prof. Anoop Chauhan
Developed by: Springer Medicine
Obesity Clinical Trial Summary

At a glance: The STEP trials

A round-up of the STEP phase 3 clinical trials evaluating semaglutide for weight loss in people with overweight or obesity.

Developed by: Springer Medicine

Highlights from the ACC 2024 Congress

Year in Review: Pediatric cardiology

Watch Dr. Anne Marie Valente present the last year's highlights in pediatric and congenital heart disease in the official ACC.24 Year in Review session.

Year in Review: Pulmonary vascular disease

The last year's highlights in pulmonary vascular disease are presented by Dr. Jane Leopold in this official video from ACC.24.

Year in Review: Valvular heart disease

Watch Prof. William Zoghbi present the last year's highlights in valvular heart disease from the official ACC.24 Year in Review session.

Year in Review: Heart failure and cardiomyopathies

Watch this official video from ACC.24. Dr. Biykem Bozkurt discusses last year's major advances in heart failure and cardiomyopathies.