Skip to main content
Top
Published in: Journal of Translational Medicine 1/2019

Open Access 01-12-2019 | Biomarkers | Research

-Omics biomarker identification pipeline for translational medicine

Authors: Laura Bravo-Merodio, John A. Williams, Georgios V. Gkoutos, Animesh Acharjee

Published in: Journal of Translational Medicine | Issue 1/2019

Login to get access

Abstract

Background

Translational medicine (TM) is an emerging domain that aims to facilitate medical or biological advances efficiently from the scientist to the clinician. Central to the TM vision is to narrow the gap between basic science and applied science in terms of time, cost and early diagnosis of the disease state. Biomarker identification is one of the main challenges within TM. The identification of disease biomarkers from -omics data will not only help the stratification of diverse patient cohorts but will also provide early diagnostic information which could improve patient management and potentially prevent adverse outcomes. However, biomarker identification needs to be robust and reproducible. Hence a robust unbiased computational framework that can help clinicians identify those biomarkers is necessary.

Methods

We developed a pipeline (workflow) that includes two different supervised classification techniques based on regularization methods to identify biomarkers from -omics or other high dimension clinical datasets. The pipeline includes several important steps such as quality control and stability of selected biomarkers. The process takes input files (outcome and independent variables or -omics data) and pre-processes (normalization, missing values) them. After a random division of samples into training and test sets, Least Absolute Shrinkage and Selection Operator and Elastic Net feature selection methods are applied to identify the most important features representing potential biomarker candidates. The penalization parameters are optimised using 10-fold cross validation and the process undergoes 100 iterations and a combinatorial analysis to select the best performing multivariate model. An empirical unbiased assessment of their quality as biomarkers for clinical use is performed through a Receiver Operating Characteristic curve and its Area Under the Curve analysis on both permuted and real data for 1000 different randomized training and test sets. We validated this pipeline against previously published biomarkers.

Results

We applied this pipeline to three different datasets with previously published biomarkers: lipidomics data by Acharjee et al. (Metabolomics 13:25, 2017) and transcriptomics data by Rajamani and Bhasin (Genome Med 8:38, 2016) and Mills et al. (Blood 114:1063–1072, 2009). Our results demonstrate that our method was able to identify both previously published biomarkers as well as new variables that add value to the published results.

Conclusions

We developed a robust pipeline to identify clinically relevant biomarkers that can be applied to different -omics datasets. Such identification reveals potentially novel drug targets and can be used as a part of a machine-learning based patient stratification framework in the translational medicine settings.
Appendix
Available only for authorised users
Literature
1.
go back to reference Howells DW, Sena ES, Macleod MR. Bringing rigour to translational medicine. Nat Rev Neurol. 2014;10:37–43.CrossRef Howells DW, Sena ES, Macleod MR. Bringing rigour to translational medicine. Nat Rev Neurol. 2014;10:37–43.CrossRef
2.
go back to reference Han H. Diagnostic biases in translational bioinformatics. BMC Med Genomics. 2015;8:46.CrossRef Han H. Diagnostic biases in translational bioinformatics. BMC Med Genomics. 2015;8:46.CrossRef
3.
go back to reference Fang FC, Casadevall A. Lost in translation—basic science in the era of translational research. Infect Immun. 2010;78:563–6.CrossRef Fang FC, Casadevall A. Lost in translation—basic science in the era of translational research. Infect Immun. 2010;78:563–6.CrossRef
4.
go back to reference Mischak H, Allmaier G, Apweiler R, Attwood T, Baumann M, Benigni A, et al. Recommendations for biomarker identification and qualification in clinical proteomics. Sci Transl Med. 2010;2:46ps42.CrossRef Mischak H, Allmaier G, Apweiler R, Attwood T, Baumann M, Benigni A, et al. Recommendations for biomarker identification and qualification in clinical proteomics. Sci Transl Med. 2010;2:46ps42.CrossRef
5.
go back to reference Satagopam V, Gu W, Eifes S, Gawron P, Ostaszewski M, Gebel S, et al. Integration and visualization of translational medicine data for better understanding of human diseases. Big Data. 2016;4:97–108.CrossRef Satagopam V, Gu W, Eifes S, Gawron P, Ostaszewski M, Gebel S, et al. Integration and visualization of translational medicine data for better understanding of human diseases. Big Data. 2016;4:97–108.CrossRef
6.
go back to reference Narayanasamy S, Jarosz Y, Muller EEL, Heintz-Buschart A, Herold M, Kaysen A, et al. IMP: a pipeline for reproducible reference-independent integrated metagenomic and metatranscriptomic analyses. Genome Biol. 2016;17:260.CrossRef Narayanasamy S, Jarosz Y, Muller EEL, Heintz-Buschart A, Herold M, Kaysen A, et al. IMP: a pipeline for reproducible reference-independent integrated metagenomic and metatranscriptomic analyses. Genome Biol. 2016;17:260.CrossRef
7.
go back to reference Feng J, Ding C, Qiu N, Ni X, Zhan D, Liu W, et al. Firmiana: towards a one-stop proteomic cloud platform for data processing and analysis. Nat Biotechnol. 2017;35:409–12.CrossRef Feng J, Ding C, Qiu N, Ni X, Zhan D, Liu W, et al. Firmiana: towards a one-stop proteomic cloud platform for data processing and analysis. Nat Biotechnol. 2017;35:409–12.CrossRef
8.
go back to reference Xia J, Wishart DS. Using MetaboAnalyst 3.0 for comprehensive metabolomics data analysis. Curr Protoc Bioinform. 2016;55:14.10.1–10.91.CrossRef Xia J, Wishart DS. Using MetaboAnalyst 3.0 for comprehensive metabolomics data analysis. Curr Protoc Bioinform. 2016;55:14.10.1–10.91.CrossRef
9.
go back to reference Acharjee A, Finkers R, Visser RG, Maliepaard C. Comparison of regularized regression methods for ~ omics data. Metabolomics. 2013;3:1–9. Acharjee A, Finkers R, Visser RG, Maliepaard C. Comparison of regularized regression methods for ~ omics data. Metabolomics. 2013;3:1–9.
10.
go back to reference Hermida L, Poussin C, Stadler MB, Gubian S, Sewer A, Gaidatzis D, et al. Confero: an integrated contrast data and gene set platform for computational analysis and biological interpretation of omics data. BMC Genomics. 2013;14:514.CrossRef Hermida L, Poussin C, Stadler MB, Gubian S, Sewer A, Gaidatzis D, et al. Confero: an integrated contrast data and gene set platform for computational analysis and biological interpretation of omics data. BMC Genomics. 2013;14:514.CrossRef
11.
go back to reference Tibshirani R. Regression shrinkage and selection via the Lasso. J R Stat Soc Ser B Methodol. 1996;58:267–88. Tibshirani R. Regression shrinkage and selection via the Lasso. J R Stat Soc Ser B Methodol. 1996;58:267–88.
12.
go back to reference Zou H, Hastie T. Regularization and variable selection via the elastic net. J R Stat Soc Ser B Stat Methodol. 2005;67:301–20.CrossRef Zou H, Hastie T. Regularization and variable selection via the elastic net. J R Stat Soc Ser B Stat Methodol. 2005;67:301–20.CrossRef
13.
go back to reference Hoerl AE. Application of ridge analysis to regression problems. Chem Eng Prog. 1962;58:54–9. Hoerl AE. Application of ridge analysis to regression problems. Chem Eng Prog. 1962;58:54–9.
15.
go back to reference Chen EY, Tan CM, Kou Y, Duan Q, Wang Z, Meirelles GV, et al. Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool. BMC Bioinform. 2013;14:128.CrossRef Chen EY, Tan CM, Kou Y, Duan Q, Wang Z, Meirelles GV, et al. Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool. BMC Bioinform. 2013;14:128.CrossRef
16.
go back to reference Acharjee A, Prentice P, Acerini C, Smith J, Hughes IA, Ong K, et al. The translation of lipid profiles to nutritional biomarkers in the study of infant metabolism. Metabolomics. 2017;13:25.CrossRef Acharjee A, Prentice P, Acerini C, Smith J, Hughes IA, Ong K, et al. The translation of lipid profiles to nutritional biomarkers in the study of infant metabolism. Metabolomics. 2017;13:25.CrossRef
17.
go back to reference Prentice P, Koulman A, Matthews L, Acerini CL, Ong KK, Dunger DB. Lipidomic analyses, breast- and formula-feeding, and growth in infants. J Pediatr. 2015;166(276–281):e6. Prentice P, Koulman A, Matthews L, Acerini CL, Ong KK, Dunger DB. Lipidomic analyses, breast- and formula-feeding, and growth in infants. J Pediatr. 2015;166(276–281):e6.
18.
go back to reference Rajamani D, Bhasin MK. Identification of key regulators of pancreatic cancer progression through multidimensional systems-level analysis. Genome Med. 2016;8:38.CrossRef Rajamani D, Bhasin MK. Identification of key regulators of pancreatic cancer progression through multidimensional systems-level analysis. Genome Med. 2016;8:38.CrossRef
19.
go back to reference Mills KI, Kohlmann A, Williams PM, Wieczorek L, Liu W, Li R, et al. Microarray-based classifiers and prognosis models identify subgroups with distinct clinical outcomes and high risk of AML transformation of myelodysplastic syndrome. Blood. 2009;114:1063–72.CrossRef Mills KI, Kohlmann A, Williams PM, Wieczorek L, Liu W, Li R, et al. Microarray-based classifiers and prognosis models identify subgroups with distinct clinical outcomes and high risk of AML transformation of myelodysplastic syndrome. Blood. 2009;114:1063–72.CrossRef
20.
go back to reference Langfelder P, Horvath S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinform. 2008;9:559.CrossRef Langfelder P, Horvath S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinform. 2008;9:559.CrossRef
22.
go back to reference Friedman J, Hastie T, Tibshirani R. Regularization paths for generalized linear models via coordinate descent. J Stat Softw. 2010;33:1–22.CrossRef Friedman J, Hastie T, Tibshirani R. Regularization paths for generalized linear models via coordinate descent. J Stat Softw. 2010;33:1–22.CrossRef
23.
go back to reference Barretina J, Caponigro G, Stransky N, Venkatesan K, Margolin AA, Kim S, et al. The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature. 2012;483:603–7.CrossRef Barretina J, Caponigro G, Stransky N, Venkatesan K, Margolin AA, Kim S, et al. The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature. 2012;483:603–7.CrossRef
24.
go back to reference Kanehisa M, Goto S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000;28:27–30.CrossRef Kanehisa M, Goto S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000;28:27–30.CrossRef
25.
go back to reference Hornick NI, Doron B, Abdelhamed S, Huan J, Harrington CA, Shen R, et al. AML suppresses hematopoiesis by releasing exosomes that contain microRNAs targeting c-MYB. Sci Signal. 2016;9:ra88.CrossRef Hornick NI, Doron B, Abdelhamed S, Huan J, Harrington CA, Shen R, et al. AML suppresses hematopoiesis by releasing exosomes that contain microRNAs targeting c-MYB. Sci Signal. 2016;9:ra88.CrossRef
26.
go back to reference Uttarkar S, Frampton J, Klempnauer K-H. Targeting the transcription factor Myb by small-molecule inhibitors. Exp Hematol. 2017;47:31–5.CrossRef Uttarkar S, Frampton J, Klempnauer K-H. Targeting the transcription factor Myb by small-molecule inhibitors. Exp Hematol. 2017;47:31–5.CrossRef
27.
go back to reference Ma X, Liu Y, Liu Y, Alexandrov LB, Edmonson MN, Gawad C, et al. Pan-cancer genome and transcriptome analyses of 1,699 paediatric leukaemias and solid tumours. Nature. 2018;555:371–6.CrossRef Ma X, Liu Y, Liu Y, Alexandrov LB, Edmonson MN, Gawad C, et al. Pan-cancer genome and transcriptome analyses of 1,699 paediatric leukaemias and solid tumours. Nature. 2018;555:371–6.CrossRef
28.
go back to reference Perera RM, Stoykova S, Nicolay BN, Ross KN, Fitamant J, Boukhali M, et al. Transcriptional control of autophagy-lysosome function drives pancreatic cancer metabolism. Nature. 2015;524:361–5.CrossRef Perera RM, Stoykova S, Nicolay BN, Ross KN, Fitamant J, Boukhali M, et al. Transcriptional control of autophagy-lysosome function drives pancreatic cancer metabolism. Nature. 2015;524:361–5.CrossRef
29.
go back to reference Yang M-C, Wang H-C, Hou Y-C, Tung H-L, Chiu T-J, Shan Y-S. Blockade of autophagy reduces pancreatic cancer stem cell activity and potentiates the tumoricidal effect of gemcitabine. Mol Cancer. 2015;14:179.CrossRef Yang M-C, Wang H-C, Hou Y-C, Tung H-L, Chiu T-J, Shan Y-S. Blockade of autophagy reduces pancreatic cancer stem cell activity and potentiates the tumoricidal effect of gemcitabine. Mol Cancer. 2015;14:179.CrossRef
30.
go back to reference Clarke R, Ressom HW, Wang A, Xuan J, Liu MC, Gehan EA, et al. The properties of high-dimensional data spaces: implications for exploring gene and protein expression data. Nat Rev Cancer. 2008;8:37–49.CrossRef Clarke R, Ressom HW, Wang A, Xuan J, Liu MC, Gehan EA, et al. The properties of high-dimensional data spaces: implications for exploring gene and protein expression data. Nat Rev Cancer. 2008;8:37–49.CrossRef
31.
go back to reference Kong Y, Yu T. A deep neural network model using random forest to extract feature representation for gene expression data classification. Sci Rep. 2018;8:16477.CrossRef Kong Y, Yu T. A deep neural network model using random forest to extract feature representation for gene expression data classification. Sci Rep. 2018;8:16477.CrossRef
32.
go back to reference Shen R, Mo Q, Schultz N, Seshan VE, Olshen AB, Huse J, et al. Integrative subtype discovery in glioblastoma using iCluster. PLoS ONE. 2012;7:e35236.CrossRef Shen R, Mo Q, Schultz N, Seshan VE, Olshen AB, Huse J, et al. Integrative subtype discovery in glioblastoma using iCluster. PLoS ONE. 2012;7:e35236.CrossRef
33.
go back to reference Seoane JA, Day INM, Gaunt TR, Campbell C. A pathway-based data integration framework for prediction of disease progression. Bioinform Oxf Engl. 2014;30:838–45.CrossRef Seoane JA, Day INM, Gaunt TR, Campbell C. A pathway-based data integration framework for prediction of disease progression. Bioinform Oxf Engl. 2014;30:838–45.CrossRef
34.
go back to reference Zhu B, Song N, Shen R, Arora A, Machiela MJ, Song L, et al. Integrating clinical and multiple omics data for prognostic assessment across human cancers. Sci Rep. 2017;7:16954.CrossRef Zhu B, Song N, Shen R, Arora A, Machiela MJ, Song L, et al. Integrating clinical and multiple omics data for prognostic assessment across human cancers. Sci Rep. 2017;7:16954.CrossRef
35.
go back to reference Acharjee A, Ament Z, West JA, Stanley E, Griffin JL. Integration of metabolomics, lipidomics and clinical data using a machine learning method. BMC Bioinform. 2016;17:37–49.CrossRef Acharjee A, Ament Z, West JA, Stanley E, Griffin JL. Integration of metabolomics, lipidomics and clinical data using a machine learning method. BMC Bioinform. 2016;17:37–49.CrossRef
36.
go back to reference Bakker OB, Aguirre-Gamboa R, Sanna S, Oosting M, Smeekens SP, Jaeger M, et al. Integration of multi-omics data and deep phenotyping enables prediction of cytokine responses. Nat Immunol. 2018;19:776–86.CrossRef Bakker OB, Aguirre-Gamboa R, Sanna S, Oosting M, Smeekens SP, Jaeger M, et al. Integration of multi-omics data and deep phenotyping enables prediction of cytokine responses. Nat Immunol. 2018;19:776–86.CrossRef
37.
go back to reference Argelaguet R, Velten B, Arnol D, Dietrich S, Zenz T, Marioni JC, et al. Multi-omics factor analysis–a framework for unsupervised integration of multi-omics data sets. Mol Syst Biol. 2018;14:e8124.CrossRef Argelaguet R, Velten B, Arnol D, Dietrich S, Zenz T, Marioni JC, et al. Multi-omics factor analysis–a framework for unsupervised integration of multi-omics data sets. Mol Syst Biol. 2018;14:e8124.CrossRef
38.
go back to reference López de Maturana E, Alonso L, Alarcón P, Martín-Antoniano IA, Pineda S, Piorno L, et al. Challenges in the integration of omics and non-omics data. Genes. 2019;10:238.CrossRef López de Maturana E, Alonso L, Alarcón P, Martín-Antoniano IA, Pineda S, Piorno L, et al. Challenges in the integration of omics and non-omics data. Genes. 2019;10:238.CrossRef
40.
go back to reference Macaulay IC, Ponting CP, Voet T. Single-cell multiomics: multiple measurements from single cells. Trends Genet. 2017;33:155–68.CrossRef Macaulay IC, Ponting CP, Voet T. Single-cell multiomics: multiple measurements from single cells. Trends Genet. 2017;33:155–68.CrossRef
41.
go back to reference Shapiro E, Biezuner T, Linnarsson S. Single-cell sequencing-based technologies will revolutionize whole-organism science. Nat Rev Genet. 2013;14:618–30.CrossRef Shapiro E, Biezuner T, Linnarsson S. Single-cell sequencing-based technologies will revolutionize whole-organism science. Nat Rev Genet. 2013;14:618–30.CrossRef
42.
go back to reference Levitin HM, Yuan J, Sims PA. Single-cell transcriptomic analysis of tumor heterogeneity. Trends Cancer. 2018;4:264–8.CrossRef Levitin HM, Yuan J, Sims PA. Single-cell transcriptomic analysis of tumor heterogeneity. Trends Cancer. 2018;4:264–8.CrossRef
43.
go back to reference Winterhoff B, Talukdar S, Chang Z, Wang J, Starr TK. Single-cell sequencing in ovarian cancer: a new frontier in precision medicine. Curr Opin Obstet Gynecol. 2019;31:49–55.CrossRef Winterhoff B, Talukdar S, Chang Z, Wang J, Starr TK. Single-cell sequencing in ovarian cancer: a new frontier in precision medicine. Curr Opin Obstet Gynecol. 2019;31:49–55.CrossRef
45.
go back to reference Stegle O, Teichmann SA, Marioni JC. Computational and analytical challenges in single-cell transcriptomics. Nat Rev Genet. 2015;16:133–45.CrossRef Stegle O, Teichmann SA, Marioni JC. Computational and analytical challenges in single-cell transcriptomics. Nat Rev Genet. 2015;16:133–45.CrossRef
46.
go back to reference Kim K-T, Lee HW, Lee H-O, Song HJ, Jeong DE, Shin S, et al. Application of single-cell RNA sequencing in optimizing a combinatorial therapeutic strategy in metastatic renal cell carcinoma. Genome Biol. 2016;17:80.CrossRef Kim K-T, Lee HW, Lee H-O, Song HJ, Jeong DE, Shin S, et al. Application of single-cell RNA sequencing in optimizing a combinatorial therapeutic strategy in metastatic renal cell carcinoma. Genome Biol. 2016;17:80.CrossRef
Metadata
Title
-Omics biomarker identification pipeline for translational medicine
Authors
Laura Bravo-Merodio
John A. Williams
Georgios V. Gkoutos
Animesh Acharjee
Publication date
01-12-2019
Publisher
BioMed Central
Keyword
Biomarkers
Published in
Journal of Translational Medicine / Issue 1/2019
Electronic ISSN: 1479-5876
DOI
https://doi.org/10.1186/s12967-019-1912-5

Other articles of this Issue 1/2019

Journal of Translational Medicine 1/2019 Go to the issue
Live Webinar | 27-06-2024 | 18:00 (CEST)

Keynote webinar | Spotlight on medication adherence

Live: Thursday 27th June 2024, 18:00-19:30 (CEST)

WHO estimates that half of all patients worldwide are non-adherent to their prescribed medication. The consequences of poor adherence can be catastrophic, on both the individual and population level.

Join our expert panel to discover why you need to understand the drivers of non-adherence in your patients, and how you can optimize medication adherence in your clinics to drastically improve patient outcomes.

Prof. Kevin Dolgin
Prof. Florian Limbourg
Prof. Anoop Chauhan
Developed by: Springer Medicine
Obesity Clinical Trial Summary

At a glance: The STEP trials

A round-up of the STEP phase 3 clinical trials evaluating semaglutide for weight loss in people with overweight or obesity.

Developed by: Springer Medicine