skip to main content
research-article

Bioinformatics—an introduction for computer scientists

Published:01 June 2004Publication History
Skip Abstract Section

Abstract

The article aims to introduce computer scientists to the new field of bioinformatics. This area has arisen from the needs of biologists to utilize and help interpret the vast amounts of data that are constantly being gathered in genomic research---and its more recent counterparts, proteomics and functional genomics. The ultimate goal of bioinformatics is to develop in silico models that will complement in vitro and in vivo biological experiments. The article provides a bird's eye view of the basic concepts in molecular cell biology, outlines the nature of the existing data, and describes the kind of computer algorithms and techniques that are necessary to understand cell behavior. The underlying motivation for many of the bioinformatics approaches is the evolution of organisms and the complexity of working with incomplete and noisy data. The topics covered include: descriptions of the current software especially developed for biologists, computer and mathematical cell models, and areas of computer science that play an important role in bioinformatics.

References

  1. For interesting graphical gallery of biology consult (downloadable drawings) sponsored by the National Health Museum http://www.accessexcellence.org/AB/GG/.Google ScholarGoogle Scholar
  2. A recommended glossary of genetic terms http://www.ornl.gov/TechResources/Human_Genome/publicat/primer2001/glossary.html.Google ScholarGoogle Scholar
  3. NCBI (National Center for Biotechnology Information) http://www.ncbi.nlm.nih.gov.Google ScholarGoogle Scholar
  4. A summary of interesting sites in bioinformatics is given by the URLs.Google ScholarGoogle Scholar
  5. On line lectures in bioinformatics---Heidelberg http://www.dkfz-heidelberg.de/tbi/bioinfo/Biol/Intro/.Google ScholarGoogle Scholar
  6. A special interest group with news and pointers http://www.bioinformatrix.com.Google ScholarGoogle Scholar
  7. Bioinformatics Bulletin Board http://bioinformatics.org/faq/#education.Google ScholarGoogle Scholar
  8. Bioinformatics resources http://www.brc.dcs.gla.ac.uk/∼actan/resources.html.Google ScholarGoogle Scholar
  9. Interesting and useful URL's on existing courses.Google ScholarGoogle Scholar
  10. Jackson's Laboratory Web Page with educational links http://www.jax.org/courses.Google ScholarGoogle Scholar
  11. Course in bioinformatics (recommended set of slides by R. L. Bernstein) http://www.swbic.org/education/bioinfo/.Google ScholarGoogle Scholar
  12. Highly recommended texts in molecular cell biology {Alberts et al. 2004; Lodish et al. 2003}.Google ScholarGoogle Scholar
  13. Some texts in computational biology or bio-informatics {Baldi and Brunak 2002; Baxevanis and Ouellette 1998; Campbell and Heyer 2002; Claverie and Notredame 2003; Durbin et al. 1998; Dwyer 2002; Felsenstein 2003; Gonick and Wheelis 1991; Gusfield 1997; Krane and Raymer 2003; Jones and Pevzner 2004; Mount 2001; Orengo et al. 2003; Pevsner 2003, Pevzner 2000; Setubal and Meidanis 1997; Salzberg et al. 1998; Waterman 1995}.Google ScholarGoogle Scholar
  14. Main Journals in BioInformatics Bioinformatics, Oxford University Press IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB). Journal of Computational Biology, Mary Ann Liebert, Inc, PublishersGoogle ScholarGoogle Scholar
  15. Note: Many biology journals publish articles related to bioinformatics, e.g., Science, Nature, Nucleic Acids Research, Journal of Molecular Biology, Proceedings of the National Academy of Sciences (PNAS), etc. In particular Nucleic Acid Research publishes a compendium of URL's in its yearly January issue.Google ScholarGoogle Scholar
  16. Yearly Conferences RECOMB, Research in Computational Molecular IEE Biology Computer Society Bioinformatics Conference PSB Pacific Symposium on Biocomputing ISMB Intelligent Systems for Molecular BiologyGoogle ScholarGoogle Scholar
  17. Articles and BooksGoogle ScholarGoogle Scholar
  18. Abelson, H., Allen, D., Coore, D., Hanson, C., Homsy, G. Knight, Jr., T. F., Nagpal, R., Rauch, E., Sussman, G. J., and Weiss, R. 1995. Amorphous Computing. Commun. ACM. Google ScholarGoogle Scholar
  19. Alberts, B., Bray, D., Johnson, A., Lewis, J., Raff, M., Roberts, K., and Walter, F. 2004. Essential Cell Biology, 2nd ed. Garland Publishing.Google ScholarGoogle Scholar
  20. Ashburner, M. and Goodman, N. 1997. Informatics: Genome and genetic databases. Curr. Op. Gen. Develop. 7, 750--756.Google ScholarGoogle Scholar
  21. Baldi, P. and Brunak, S. 2002. Bioinformatics: The Machine Learning Approach, MIT Press. Google ScholarGoogle Scholar
  22. Bar-Joseph, Z., Gerber, G., Gifford, D., and Jaakkola, T. 2002. A new approach to analyzing gene expression time series data. In RECOMB The Sixth Annual International Conference on Research in Computational Molecular Biology. Google ScholarGoogle Scholar
  23. Baxevanis, A., and Ouellette, B. F. F. (Eds.). 1998. Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins. Wiley, New York.Google ScholarGoogle Scholar
  24. Bennett, C., Li, M., and Ma, B. 2003. Linking chain letters. Sci. Amer. (June) 77--81.Google ScholarGoogle Scholar
  25. Brown, M. P. S., Grundy, W. N., Lin, D., Cristianini, N., Sugnet, C. W., Furey, T. S., Ares, Jr., M. and Haussler, D. 2000. Knowledge-based analysis of microarray gene expression data using support vector machines. Proc. Nat. Acad. Sci. 97, 1, 262--267.Google ScholarGoogle Scholar
  26. Campbell, A. M. and Heyer, L. 2002. Discovering Genomics, Proteomics and BioInformatics. Benjamin Cummings.Google ScholarGoogle Scholar
  27. Claverie, J. M. and Notredame, C. 2003. Bioinformatics for Dummies. Wiley, New York.Google ScholarGoogle Scholar
  28. Cohen, J. 2001. Classification of approaches used to study cell regulation: Search for a unified view using constraints and machine learning. Electronic Transactions in Artificial Intelligence, Machine Intelligence 18. Linköping Electronic Articles in Computer and Information Science ISSN 1401-9841, 6(025).Google ScholarGoogle Scholar
  29. Cohen, J. 2003. Guidelines for establishing undergraduate bioinformatics courses. J. Sci. Educat. Tech. 12, 4 (Dec.) 449--456.Google ScholarGoogle Scholar
  30. DeJong, H. 2002. Modeling and simulation of genetic regulatory systems: A literature review. J. Comput. Biol. 9, 1, 67--103.Google ScholarGoogle Scholar
  31. Delcher, A., Kasif, S., Fleischmann, R. D., Peterson, J., White, O., and Salzberg, S. L. 1999. Alignment of whole genomes. Nucl. Acid Res. 27, 11, 2369--2376.Google ScholarGoogle Scholar
  32. Duenwald, M. 2003. Gene is linked to susceptibility to depression. The New York Times, July 18, Sect. A, Page 14, Col. 1.Google ScholarGoogle Scholar
  33. Durbin, R., Eddy, S., Krogh, A., and Mitchison, G. 1998. Biological Sequence Analysis. Cambridge University Press, Cambridge, Mass.Google ScholarGoogle Scholar
  34. Dwyer, R. A. 2002. Genomic Perl: From Bioinformatics Basics to Working Code. Cambridge University Press, Cambridge, Mass. Google ScholarGoogle Scholar
  35. Felsenstein, J. 2003. Inferring Phylogenies, Sinauer Associates.Google ScholarGoogle Scholar
  36. Friedman, N., Linial, M., Nachman, I., Peer, D. 2000. Using Bayesian networks to analyze expression data. In Proceedings RECOMB---Computational Molecular Biology, pp. 127--135. Google ScholarGoogle Scholar
  37. Gilbert, D. R., Westhead, D. R., Nagano, N., and Thornton, J. M. 1999. Motif-based searching in TOPS protein topology databases. Bioinformatics 5, 4, 317--326. Also see http://www.sander. embl-ebi.ac.uk/tops/.Google ScholarGoogle Scholar
  38. Gonick, L. and Wheelis, M. 1991. A Cartoon Guide to Genetics. Harper Perennial.Google ScholarGoogle Scholar
  39. Goodman, N. 2002. Biological data becomes computer literate: new advances in bioinformatics. Curr. Op. Biotech. 13, 66--71.Google ScholarGoogle Scholar
  40. Gusfield, D. 1997. Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press. Google ScholarGoogle Scholar
  41. Hand, D. J., Mannila, H., and Smyth, P. 2000. Principles of Data Mining. MIT Press, Cambridge, Mass. Google ScholarGoogle Scholar
  42. Huson, D. H., Reinert, K., and Myers, E. W. 2002. The greedy path-merging algorithm for contig scaffolding. J. ACM 49, 5 (Sept.), 603--615. Google ScholarGoogle Scholar
  43. Jain, A., Murty, M., and Flynn, P. 1999. Data clustering: A review. ACM Comput. Surv. 31, 3, 264--323. Google ScholarGoogle Scholar
  44. Jones, N. C. and Pevzner, P. A. 2004. An Introduction to Bioinformatics Algorithms, MIT Press, Cambridge, Mass.Google ScholarGoogle Scholar
  45. Karp, P. 2001. Pathway databases: A case study in computational symbolic theories. Science 293, 2040--2044.Google ScholarGoogle Scholar
  46. Kelly, H. C. 2003. Terrorism and the biology lab. New York Times Op-Ed Page, July 2.Google ScholarGoogle Scholar
  47. Knuth, D. E. 1993. Computer Literacy Bookshops Interview (Dec.) (Available at http://dmoz.org/Computers/History/Pioneers/Knuth,_Donald/).Google ScholarGoogle Scholar
  48. Krane, D. and Raymer, M. 2003. Fundamental Concepts of BioInformatics. Benjamin Cummings.Google ScholarGoogle Scholar
  49. Krogh, A. 1998. An introduction to hidden Markov models for biological sequences. In S. L. Salzberg, D. B. Searls, and S. Kasif (eds.), Computational Methods in Molecular Biology. Elsevier, Amsterdam, The Netherlands, pp. 45--63.Google ScholarGoogle Scholar
  50. Kuipers, B. J. 1994. Qualitative Reasoning: Modeling and Simulation with Incomplete Knowledge. MIT Press, Cambridge, Mass. Google ScholarGoogle Scholar
  51. Lathrop, R. H. and Smith, T. F. 1996. Global optimum protein threading with gapped alignment and empirical pair potentials. J. Molec. Biol. 255, 641--665.Google ScholarGoogle Scholar
  52. Li, H., Helling, R., Tang, C., and Wingreen, N. 1996. Emergence of preferred structures in a simple model of protein folding. Science 273, 666--669.Google ScholarGoogle Scholar
  53. Liang, S., Fuhrman, S., and Somogyi, R. 1998. REVEAL, A general reverse engineering algorithm for inference of genetic network architectures. In Pacific Symposium on Biocomputing 3, pp. 18--29.Google ScholarGoogle Scholar
  54. Lodish, H., Berk, A., Matsudaira, P., Kaiser, C. A., Krieger, M., Scott, M. P., Zipursky, L., and Darnell, J. 2003. Molecular Cell Biology. W.H. Freeman.Google ScholarGoogle Scholar
  55. Luscombe, N. M., Greenbaum, D., and Gerstein, M. 2001. What is bioinformatics? A proposed definition and overview of the field. Methods Inf. Med. 40, 346--358 (Also available at http:// bioinfo.mbb.yale.edu/papers/).Google ScholarGoogle Scholar
  56. Miller, W. 2001. Comparison of genomic DNA sequences: Solved and unsolved problems. Bioinformatics 17, 5, 391--397.Google ScholarGoogle Scholar
  57. Mitchell, T. 1997. Machine Learning, McGraw Hill, New York. Google ScholarGoogle Scholar
  58. Mount, D. W. 2001. Bioinformatics: Sequence and Genome Analysis, Cold Spring Harbor Press, Cold Spring Harbor, N.Y.Google ScholarGoogle Scholar
  59. Myers, E. 1999. Whole genome DNA-sequencing. IEEE Computat. Eng. Sci. 3, 1, 33--43. Google ScholarGoogle Scholar
  60. Orengo, C. A., Jones, D. T., and Thornton, J. M. 2003. Bioinformatics: Genes, Proteins and Computers. BIOS Scientific Publishers, Oxford, England.Google ScholarGoogle Scholar
  61. Parsons, R. J., Forrest, S., and Burks, C. 1995. Genetic algorithms, operators, and DNA fragment assembly. Mach. Learn. 21, 1--2, 11--33. (Also see paper by Parsons in Computational Methods in Molecular Biology, S. L. Salzberg, D. B. Searls, and S. Kasif (Eds.). Elsevier, Amsterdam, The Netherlands, 1998.) Google ScholarGoogle Scholar
  62. Pevsner, J. 2003. Bioinformatics and Functional Genomics. Wiley-Liss.Google ScholarGoogle Scholar
  63. Pevzner, P. A. 2000. Computational Molecular Biology: An Algorithmic Approach. MIT Press, Cambridge, Mass.Google ScholarGoogle Scholar
  64. Rabiner, L. R. 1989. A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77, 2, 257--286.Google ScholarGoogle Scholar
  65. Regev, E. and Shapiro, E. 2002. Cellular abstractions: Cells as computation. Nature 419 (Sept.), 419--443.Google ScholarGoogle Scholar
  66. Rivas, E. and Eddy, S. R. 2000. The language of RNA: A formal grammar that includes pseudo knots. Bioinformatics 18, 4, 334--340.Google ScholarGoogle Scholar
  67. Salzberg, S. L., Searls, D. B., and Kasif, S., Eds. 1998. Computational Methods in Molecular Biology. Elsevier, Amsterdam, The Netherlands.Google ScholarGoogle Scholar
  68. Schwartz, S., Zhang, Z., Frazer, K. A., Smit, A., Riemer, C., Bouck, J., Gibbs, R., Hardison, R., and Miller, W. 2000. PipMaker---A web server for aligning two genomic DNA sequence. Genome Res. 10, 4 (Apr.), 577--586.Google ScholarGoogle Scholar
  69. Searls, D. B. 1992. The linguistics of DNA. Amer. Sci. 80, 579--591.Google ScholarGoogle Scholar
  70. Searls, D. B. 1998. Grand challenges in computational Biology. In Computational Methods in Molecular Biology, S. L. Salzberg, D. B. Searls, and S. Kasif, Eds. Elsevier Amsterdam, The Netherlands.Google ScholarGoogle Scholar
  71. Searls, D. B. 2002. The language of genes. Nature 420 (November), 211--217.Google ScholarGoogle Scholar
  72. Setubal, J. and Meidanis, J. 1997. Introduction to Computational Molecular Biology, PWS Publishing.Google ScholarGoogle Scholar
  73. Thierry-Mieg, N. 2000. Protein-protein interaction prediction for C. elegans: In Knowledge Discovery in Biology, Workshop at the PKDD2000 (Conference on Principles and Practice of Knowledge Discovery in Databases) (Lyon, France, Sept.).Google ScholarGoogle Scholar
  74. Thompson, J. D., Higgins, D. G., and Gibson, T. J. 1994. CLUSTAL W: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positions-specific gap penalties and weight matrix choice. Nuc. Acid Res. 22, 4673--4680.Google ScholarGoogle Scholar
  75. Tomita, M., Hashimoto, K., Takahashi, K., Shimizu, T. S., Matsuzaki, Y., Miyoshi, F., Saito, K., Tanida, S., Yugi, K., Venter, J. C., and Hutchinson III, C. A. 1999. E-CELL: Software environment for whole cell simulation. Bioinformatics 15, 1, 72--84.Google ScholarGoogle Scholar
  76. Waterman, M. S. 1995. Introduction to Computational Biology: Maps, Sequences and Genomes. CRC Press.Google ScholarGoogle Scholar
  77. Watson, J. D. and Berry, A. 2003. DNA: The Secret of Life. Knopf.Google ScholarGoogle Scholar
  78. Wetherell, C. S. 1980. Probabilistic languages: A review and some open questions. ACM Comput. Surv. 12, 4, 361--379. Google ScholarGoogle Scholar
  79. Zuker, M. and Stiegler, P. 1981. Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information. Nuc. Acids Res. 9, 133--148. (Also see http://www.bioinfo.rpi.edu/∼zukerm/).Google ScholarGoogle Scholar

Index Terms

  1. Bioinformatics—an introduction for computer scientists

                                    Recommendations

                                    Comments

                                    Login options

                                    Check if you have access through your login credentials or your institution to get full access on this article.

                                    Sign in

                                    Full Access

                                    PDF Format

                                    View or Download as a PDF file.

                                    PDF

                                    eReader

                                    View online with eReader.

                                    eReader