Abstract
The field of biomarkers discovery is one of the leading research areas in proteomics. One of the most exploited approaches to this purpose consists of the identification of potential biomarkers from spot volume datasets produced by 2D gel electrophoresis. In this case, problems may arise due to the large number of spots present in each map and the small number of maps available for each class (control/pathological). Multivariate methods are therefore usually applied together with variable selection procedures, to provide a subset of potential candidates. The variable selection procedures available usually pursue the so-called principle of parsimony: the most parsimonious set of spots is selected, providing the best classification performances. This approach is not effective in proteomics since all potential biomarkers must be identified: not only the most discriminating spots, usually related to general responses to inflammatory events, but also the smallest differences and all redundant molecules, i.e. biomarkers showing similar behaviour. The principle of exhaustiveness should be pursued rather than parsimony. To solve this problem, a new ranking and classification method, “Ranking-PCA”, based on principal component analysis and variable selection in forward search, is proposed here for the exhaustive identification of all possible biomarkers. The method is successfully applied to three different proteomic datasets to prove its effectiveness.
Similar content being viewed by others
References
U.S. Human Genome Project (Department of Energy and the National Institutes of Health of USA). http://www.ornl.gov/sci/techresources/Human_Genome/home.shtml
Tonge R, Shaw J, Middleton B, Rowlinson R, Rayner S, Young J, Pognan F, Hawkins E, Currie I, Davison M (2001) Validation and development of fluorescence two-dimensional differential gel electrophoresis proteomics technology. Proteomics 1(3):377–396
Heidema AG, Thissen U, Boer JMA, Bouwman FG, Feskens EJM, Mariman ECM (2009) The association of 83 plasma proteins with CHD mortality, BMI, HDL-, and total-cholesterol in men: applying multivariate statistics to identify proteins with prognostic value and biological relevance. J Prot Res 8(6):2640–2649
Rodriguez-Pineiro AM, Rodriguez-Berrocal FJ, de la Cadena MP (2007) Improvements in the search for potential biomarkers by proteomics: application of principal component and discriminant analyses for two-dimensional maps evaluation. J Chromatogr B 849(1–2):251–260
Lilley KS, Dupree P (2006) Methods of quantitative proteomics and their application to plant organelle characterization. J Exper Botany 57(7):1493–1499
Marengo E, Robotti E, Bobba M, Righetti PG (2008) Evaluation of the variables characterized by significant discriminating power in the application of SIMCA classification method to proteomic studies. J Prot Res 7(7):2789–2796
Marengo E, Robotti E, Righetti PG, Campostrini N, Pascali J, Ponzoni M, Hamdan M, Astner H (2004) Study of proteomic changes associated with healthy and tumoral murine samples in neuroblastoma by principal component analysis and classification methods. Clin Chim Acta 345(1–2):55–67
Karp NA, Griffin JL, Lilley KL (2005) Application of partial least squares discriminant analysis to two-dimensional difference gel studies in expression proteomics. Proteomics 5(1):81–90
Seasholtz MB, Kowalski B (1993) The parsimony principle applied to multivariate calibration. Anal Chim Acta 277:165
Booksh KS, Kowalski BR (1997) Calibration method choice by comparison of model basis functions to the theoretical instrumental response function. Anal Chim Acta 348(1–3):1–9
Gributs CE, Burns DH (2006) Parsimonious calibration models for near-infrared spectroscopy using wavelets and scaling functions. Chemometr Intell Lab Syst 83(1):44–53
Lo Re VIII, Bellini LM (2002) William of Occam and Occam's razor. Annals Int Med 136(8):634–635
Massart DL, Vanderginste BGM, Buydens LMC, De Jong S, Lewi PJ, Smeyers-Verbeke J (1998) Handbook of chemometrics and qualimetrics: part A. Elsevier, Amsterdam
Massart DL, Vanderginste BGM, Deming SM, Michotte Y, Kaufman L (1988) Chemometrics: a textbook. Elsevier, Amsterdam
Marengo E, Robotti E, Bobba M, Milli A, Campostrini N, Righetti SC, Cecconi D, Righetti PG (2008) Application of partial least squares discriminant analysis and variable selection procedures: a 2D-PAGE Proteomic-Based Study. Anal Bioanal Chem 390:1327–1342
Acknowledgements
The authors gratefully acknowledge the collaboration of Prof Pier Giorgio Righetti (Polytechnic of Milan, Italy) and Dr Daniela Cecconi (University of Verona, Italy) who provided the proteomic datasets used in this study.
Author information
Authors and Affiliations
Corresponding author
Additional information
Awarded an ABC Poster Prize on the occasion of ‘Euroanalysis 2009’ held in Innsbruck, Austria, from 6-10 September 2009.
Rights and permissions
About this article
Cite this article
Marengo, E., Robotti, E., Bobba, M. et al. The principle of exhaustiveness versus the principle of parsimony: a new approach for the identification of biomarkers from proteomic spot volume datasets based on principal component analysis. Anal Bioanal Chem 397, 25–41 (2010). https://doi.org/10.1007/s00216-009-3390-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00216-009-3390-8