Skip to main content

Machine Learning: An Indispensable Tool in Bioinformatics

  • Protocol
  • First Online:
Bioinformatics Methods in Clinical Research

Part of the book series: Methods in Molecular Biology ((MIMB,volume 593))

Abstract

The increase in the number and complexity of biological databases has raised the need for modern and powerful data analysis tools and techniques. In order to fulfill these requirements, the machine learning discipline has become an everyday tool in bio-laboratories. The use of machine learning techniques has been extended to a wide spectrum of bioinformatics applications. It is broadly used to investigate the underlying mechanisms and interactions between biological molecules in many diseases, and it is an essential tool in any biomarker discovery process.

In this chapter, we provide a basic taxonomy of machine learning algorithms, and the characteristics of main data preprocessing, supervised classification, and clustering techniques are shown. Feature selection, classifier evaluation, and two supervised classification topics that have a deep impact on current bioinformatics are presented. We make the interested reader aware of a set of popular web resources, open source software tools, and benchmarking data repositories that are frequently used by the machine learning community.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 159.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Prompramote S, Chen Y, Chen Y-PP. (2005) Machine learning in bioinformatics. In Bioinformatics Technologies (Chen Y-PP., ed.), Springer, Heidelberg, Germany, pp. 117–153.

    Chapter  Google Scholar 

  2. Somorjai RL, Dolenko B, Baumgartner R. (2003) Class prediction and discovery using gene microarray and proteomics mass spectroscopy data: curses, caveats, cautions. Bioinformatics 19:1484–1491.

    Article  CAS  PubMed  Google Scholar 

  3. Larrañaga P, Calvo B, Santana R, Bielza C, Galdiano J, Inza I, Lozano JA, Armañanzas R, Santafé G, Pérez A, Robles V. (2006) Machine learning in bioinformatics. Briefings in Bioinformatics 7: 86–112.

    Article  PubMed  Google Scholar 

  4. Alpaydin E. (2004) Introduction to Machine Learning, MIT Press, Cambridge, MA.

    Google Scholar 

  5. Mitchell T. (1997) Machine Learning, McGraw Hill, New York.

    Google Scholar 

  6. Causton HC, Quackenbush J, Brazma A. (2003) A Beginner’s Guide. Microarray Gene Expression Data Analysis, Blackwell Publishing, Oxford.

    Google Scholar 

  7. Parmigiani G, Garett ES, Irizarry RA, Zeger SL. (2003) The Analysis of Gene Expression Data, Springer-Verlag, New York.

    Book  Google Scholar 

  8. Hilario M, Kalousis A, Pellegrini C, Muller M. (2006) Processing and classification of protein mass spectra. Mass Spectrometry Rev 25:409–449.

    Article  CAS  Google Scholar 

  9. Shin H, Markey M. (2006) A machine learning perspective on the development of clinical decision support systems utilizing mass spectra of blood samples. J Biomed Inform 39:227–248.

    Article  CAS  PubMed  Google Scholar 

  10. Fayyad UM, Irani KB. (1993) Multi-interval discretization of continuous-valued attributes for classification learning. In Proceedings of the 13th International Joint Conference on Artificial Intelligence, pp. 1022–1029.

    Google Scholar 

  11. Friedman N, Geiger D, Goldszmidt M. (1997) Bayesian network classifiers. Mach Learn 29:131–163.

    Article  Google Scholar 

  12. Witten IH, Frank E. (2005) Data Mining. Practical Machine Learning Tools and Techniques (2nd ed.), Morgan Kaufmann, San Francisco.

    Google Scholar 

  13. Dietterich TG. (1998) Approximate statistical test for comparing supervised classification learning algorithms. Neural Comp 10:1895–1923.

    Article  Google Scholar 

  14. Sima C, Braga-Neto U, Dougherty E. (2005) Superior feature-set ranking for small samples using bolstered error estimation. Bioinformatics 21:1046–1054.

    Article  CAS  PubMed  Google Scholar 

  15. Kanji GK. (2006) 100 Statistical Tests, SAGE Publications, Thousand Oaks, CA.

    Google Scholar 

  16. Demsar J. (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30.

    Google Scholar 

  17. Liu H, Motoda H. (2007) Computational Methods of Feature Selection, Chapman and Hall–CRC Press, Boca Raton, FL.

    Google Scholar 

  18. Saeys Y, Inza I, Larrañaga P. (2007) A review of feature selection methods in bioinformatics. Bioinformatics 23:2507–2517.

    Article  CAS  PubMed  Google Scholar 

  19. Sheng Q, Moreau Y, De Smet F, Marchal K, De Moor B. (2005) Advances in cluster analysis of microarray data. In Data Analysis and Visualization in Genomics and Proteomics (Azuaje F, Dopazo J, Eds.), Wiley, New York, pp. 153–173.

    Chapter  Google Scholar 

  20. Cheng Y, Church GM. (2000) Biclustering of expression data. In Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology, pp. 93–103.

    Google Scholar 

  21. Kdnuggets: Data Mining, Web Mining and Knowledge Discovery (2008) http://www.kdnuggets.com

  22. Kmining: Business Intelligence, Knowledge Discovery in Databases and Data Mining News (2008) http://www.kmining.com

  23. Google Group – Machine Learning News (2008) http://groups.google.com/group/ML-news/

  24. Kohavi R, Sommerfield D, Dougherty J. (1997) Data mining using MLC++, a machine learning library in C++. Int J Artif Intell Tools 6:537–566.

    Article  Google Scholar 

  25. Dalgaard R. (2002) Introductory Statistics with R, Springer, New York.

    Google Scholar 

  26. Gentleman R, Carey VJ, Huber W, Irizarry RA, Dudoit S. (2005) Bioinformatics and Computational Biology Solutions Using R and Bioconductor, Springer, New York.

    Book  Google Scholar 

  27. Mierswa I, Wurst M, Klinkerberg R, Scholz M, Euler T. (2006) YALE: Rapid prototyping for complex data mining tasks. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 935–940.

    Google Scholar 

  28. Demsar J, Zupan B, Leban G. (2004) Orange: From Experimental Machine Learning to Interactive Data Mining, White Paper, Faculty of Computer and Information Science, University of Ljubljana, Slovenia.

    Google Scholar 

  29. Asunción A, Newman DJ. (2008) UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences. http://archive.ics.uci.edu/ml/

    Google Scholar 

  30. Hettich S, Bay SD. (1999) The UCI KDD Archive, University of California, Irvine, School of Information and Computer Sciences. http://kdd.ics.uci.edu

    Google Scholar 

  31. Swivel project – Tasty Data Goodies (2008) http://www.swivel.com

  32. Kent Ridge Biomedical Data Set Repository (2008) http://research.i2r.a-star.edu.sg/rp/

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Humana Press, a part of Springer Science+Business Media, LLC

About this protocol

Cite this protocol

Inza, I., Calvo, B., Armañanzas, R., Bengoetxea, E., Larrañaga, P., Lozano, J.A. (2010). Machine Learning: An Indispensable Tool in Bioinformatics. In: Matthiesen, R. (eds) Bioinformatics Methods in Clinical Research. Methods in Molecular Biology, vol 593. Humana Press. https://doi.org/10.1007/978-1-60327-194-3_2

Download citation

  • DOI: https://doi.org/10.1007/978-1-60327-194-3_2

  • Published:

  • Publisher Name: Humana Press

  • Print ISBN: 978-1-60327-193-6

  • Online ISBN: 978-1-60327-194-3

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics