Abstract
The article aims to introduce computer scientists to the new field of bioinformatics. This area has arisen from the needs of biologists to utilize and help interpret the vast amounts of data that are constantly being gathered in genomic research---and its more recent counterparts, proteomics and functional genomics. The ultimate goal of bioinformatics is to develop in silico models that will complement in vitro and in vivo biological experiments. The article provides a bird's eye view of the basic concepts in molecular cell biology, outlines the nature of the existing data, and describes the kind of computer algorithms and techniques that are necessary to understand cell behavior. The underlying motivation for many of the bioinformatics approaches is the evolution of organisms and the complexity of working with incomplete and noisy data. The topics covered include: descriptions of the current software especially developed for biologists, computer and mathematical cell models, and areas of computer science that play an important role in bioinformatics.
- For interesting graphical gallery of biology consult (downloadable drawings) sponsored by the National Health Museum http://www.accessexcellence.org/AB/GG/.Google Scholar
- A recommended glossary of genetic terms http://www.ornl.gov/TechResources/Human_Genome/publicat/primer2001/glossary.html.Google Scholar
- NCBI (National Center for Biotechnology Information) http://www.ncbi.nlm.nih.gov.Google Scholar
- A summary of interesting sites in bioinformatics is given by the URLs.Google Scholar
- On line lectures in bioinformatics---Heidelberg http://www.dkfz-heidelberg.de/tbi/bioinfo/Biol/Intro/.Google Scholar
- A special interest group with news and pointers http://www.bioinformatrix.com.Google Scholar
- Bioinformatics Bulletin Board http://bioinformatics.org/faq/#education.Google Scholar
- Bioinformatics resources http://www.brc.dcs.gla.ac.uk/∼actan/resources.html.Google Scholar
- Interesting and useful URL's on existing courses.Google Scholar
- Jackson's Laboratory Web Page with educational links http://www.jax.org/courses.Google Scholar
- Course in bioinformatics (recommended set of slides by R. L. Bernstein) http://www.swbic.org/education/bioinfo/.Google Scholar
- Highly recommended texts in molecular cell biology {Alberts et al. 2004; Lodish et al. 2003}.Google Scholar
- Some texts in computational biology or bio-informatics {Baldi and Brunak 2002; Baxevanis and Ouellette 1998; Campbell and Heyer 2002; Claverie and Notredame 2003; Durbin et al. 1998; Dwyer 2002; Felsenstein 2003; Gonick and Wheelis 1991; Gusfield 1997; Krane and Raymer 2003; Jones and Pevzner 2004; Mount 2001; Orengo et al. 2003; Pevsner 2003, Pevzner 2000; Setubal and Meidanis 1997; Salzberg et al. 1998; Waterman 1995}.Google Scholar
- Main Journals in BioInformatics Bioinformatics, Oxford University Press IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB). Journal of Computational Biology, Mary Ann Liebert, Inc, PublishersGoogle Scholar
- Note: Many biology journals publish articles related to bioinformatics, e.g., Science, Nature, Nucleic Acids Research, Journal of Molecular Biology, Proceedings of the National Academy of Sciences (PNAS), etc. In particular Nucleic Acid Research publishes a compendium of URL's in its yearly January issue.Google Scholar
- Yearly Conferences RECOMB, Research in Computational Molecular IEE Biology Computer Society Bioinformatics Conference PSB Pacific Symposium on Biocomputing ISMB Intelligent Systems for Molecular BiologyGoogle Scholar
- Articles and BooksGoogle Scholar
- Abelson, H., Allen, D., Coore, D., Hanson, C., Homsy, G. Knight, Jr., T. F., Nagpal, R., Rauch, E., Sussman, G. J., and Weiss, R. 1995. Amorphous Computing. Commun. ACM. Google Scholar
- Alberts, B., Bray, D., Johnson, A., Lewis, J., Raff, M., Roberts, K., and Walter, F. 2004. Essential Cell Biology, 2nd ed. Garland Publishing.Google Scholar
- Ashburner, M. and Goodman, N. 1997. Informatics: Genome and genetic databases. Curr. Op. Gen. Develop. 7, 750--756.Google Scholar
- Baldi, P. and Brunak, S. 2002. Bioinformatics: The Machine Learning Approach, MIT Press. Google Scholar
- Bar-Joseph, Z., Gerber, G., Gifford, D., and Jaakkola, T. 2002. A new approach to analyzing gene expression time series data. In RECOMB The Sixth Annual International Conference on Research in Computational Molecular Biology. Google Scholar
- Baxevanis, A., and Ouellette, B. F. F. (Eds.). 1998. Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins. Wiley, New York.Google Scholar
- Bennett, C., Li, M., and Ma, B. 2003. Linking chain letters. Sci. Amer. (June) 77--81.Google Scholar
- Brown, M. P. S., Grundy, W. N., Lin, D., Cristianini, N., Sugnet, C. W., Furey, T. S., Ares, Jr., M. and Haussler, D. 2000. Knowledge-based analysis of microarray gene expression data using support vector machines. Proc. Nat. Acad. Sci. 97, 1, 262--267.Google Scholar
- Campbell, A. M. and Heyer, L. 2002. Discovering Genomics, Proteomics and BioInformatics. Benjamin Cummings.Google Scholar
- Claverie, J. M. and Notredame, C. 2003. Bioinformatics for Dummies. Wiley, New York.Google Scholar
- Cohen, J. 2001. Classification of approaches used to study cell regulation: Search for a unified view using constraints and machine learning. Electronic Transactions in Artificial Intelligence, Machine Intelligence 18. Linköping Electronic Articles in Computer and Information Science ISSN 1401-9841, 6(025).Google Scholar
- Cohen, J. 2003. Guidelines for establishing undergraduate bioinformatics courses. J. Sci. Educat. Tech. 12, 4 (Dec.) 449--456.Google Scholar
- DeJong, H. 2002. Modeling and simulation of genetic regulatory systems: A literature review. J. Comput. Biol. 9, 1, 67--103.Google Scholar
- Delcher, A., Kasif, S., Fleischmann, R. D., Peterson, J., White, O., and Salzberg, S. L. 1999. Alignment of whole genomes. Nucl. Acid Res. 27, 11, 2369--2376.Google Scholar
- Duenwald, M. 2003. Gene is linked to susceptibility to depression. The New York Times, July 18, Sect. A, Page 14, Col. 1.Google Scholar
- Durbin, R., Eddy, S., Krogh, A., and Mitchison, G. 1998. Biological Sequence Analysis. Cambridge University Press, Cambridge, Mass.Google Scholar
- Dwyer, R. A. 2002. Genomic Perl: From Bioinformatics Basics to Working Code. Cambridge University Press, Cambridge, Mass. Google Scholar
- Felsenstein, J. 2003. Inferring Phylogenies, Sinauer Associates.Google Scholar
- Friedman, N., Linial, M., Nachman, I., Peer, D. 2000. Using Bayesian networks to analyze expression data. In Proceedings RECOMB---Computational Molecular Biology, pp. 127--135. Google Scholar
- Gilbert, D. R., Westhead, D. R., Nagano, N., and Thornton, J. M. 1999. Motif-based searching in TOPS protein topology databases. Bioinformatics 5, 4, 317--326. Also see http://www.sander. embl-ebi.ac.uk/tops/.Google Scholar
- Gonick, L. and Wheelis, M. 1991. A Cartoon Guide to Genetics. Harper Perennial.Google Scholar
- Goodman, N. 2002. Biological data becomes computer literate: new advances in bioinformatics. Curr. Op. Biotech. 13, 66--71.Google Scholar
- Gusfield, D. 1997. Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press. Google Scholar
- Hand, D. J., Mannila, H., and Smyth, P. 2000. Principles of Data Mining. MIT Press, Cambridge, Mass. Google Scholar
- Huson, D. H., Reinert, K., and Myers, E. W. 2002. The greedy path-merging algorithm for contig scaffolding. J. ACM 49, 5 (Sept.), 603--615. Google Scholar
- Jain, A., Murty, M., and Flynn, P. 1999. Data clustering: A review. ACM Comput. Surv. 31, 3, 264--323. Google Scholar
- Jones, N. C. and Pevzner, P. A. 2004. An Introduction to Bioinformatics Algorithms, MIT Press, Cambridge, Mass.Google Scholar
- Karp, P. 2001. Pathway databases: A case study in computational symbolic theories. Science 293, 2040--2044.Google Scholar
- Kelly, H. C. 2003. Terrorism and the biology lab. New York Times Op-Ed Page, July 2.Google Scholar
- Knuth, D. E. 1993. Computer Literacy Bookshops Interview (Dec.) (Available at http://dmoz.org/Computers/History/Pioneers/Knuth,_Donald/).Google Scholar
- Krane, D. and Raymer, M. 2003. Fundamental Concepts of BioInformatics. Benjamin Cummings.Google Scholar
- Krogh, A. 1998. An introduction to hidden Markov models for biological sequences. In S. L. Salzberg, D. B. Searls, and S. Kasif (eds.), Computational Methods in Molecular Biology. Elsevier, Amsterdam, The Netherlands, pp. 45--63.Google Scholar
- Kuipers, B. J. 1994. Qualitative Reasoning: Modeling and Simulation with Incomplete Knowledge. MIT Press, Cambridge, Mass. Google Scholar
- Lathrop, R. H. and Smith, T. F. 1996. Global optimum protein threading with gapped alignment and empirical pair potentials. J. Molec. Biol. 255, 641--665.Google Scholar
- Li, H., Helling, R., Tang, C., and Wingreen, N. 1996. Emergence of preferred structures in a simple model of protein folding. Science 273, 666--669.Google Scholar
- Liang, S., Fuhrman, S., and Somogyi, R. 1998. REVEAL, A general reverse engineering algorithm for inference of genetic network architectures. In Pacific Symposium on Biocomputing 3, pp. 18--29.Google Scholar
- Lodish, H., Berk, A., Matsudaira, P., Kaiser, C. A., Krieger, M., Scott, M. P., Zipursky, L., and Darnell, J. 2003. Molecular Cell Biology. W.H. Freeman.Google Scholar
- Luscombe, N. M., Greenbaum, D., and Gerstein, M. 2001. What is bioinformatics? A proposed definition and overview of the field. Methods Inf. Med. 40, 346--358 (Also available at http:// bioinfo.mbb.yale.edu/papers/).Google Scholar
- Miller, W. 2001. Comparison of genomic DNA sequences: Solved and unsolved problems. Bioinformatics 17, 5, 391--397.Google Scholar
- Mitchell, T. 1997. Machine Learning, McGraw Hill, New York. Google Scholar
- Mount, D. W. 2001. Bioinformatics: Sequence and Genome Analysis, Cold Spring Harbor Press, Cold Spring Harbor, N.Y.Google Scholar
- Myers, E. 1999. Whole genome DNA-sequencing. IEEE Computat. Eng. Sci. 3, 1, 33--43. Google Scholar
- Orengo, C. A., Jones, D. T., and Thornton, J. M. 2003. Bioinformatics: Genes, Proteins and Computers. BIOS Scientific Publishers, Oxford, England.Google Scholar
- Parsons, R. J., Forrest, S., and Burks, C. 1995. Genetic algorithms, operators, and DNA fragment assembly. Mach. Learn. 21, 1--2, 11--33. (Also see paper by Parsons in Computational Methods in Molecular Biology, S. L. Salzberg, D. B. Searls, and S. Kasif (Eds.). Elsevier, Amsterdam, The Netherlands, 1998.) Google Scholar
- Pevsner, J. 2003. Bioinformatics and Functional Genomics. Wiley-Liss.Google Scholar
- Pevzner, P. A. 2000. Computational Molecular Biology: An Algorithmic Approach. MIT Press, Cambridge, Mass.Google Scholar
- Rabiner, L. R. 1989. A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77, 2, 257--286.Google Scholar
- Regev, E. and Shapiro, E. 2002. Cellular abstractions: Cells as computation. Nature 419 (Sept.), 419--443.Google Scholar
- Rivas, E. and Eddy, S. R. 2000. The language of RNA: A formal grammar that includes pseudo knots. Bioinformatics 18, 4, 334--340.Google Scholar
- Salzberg, S. L., Searls, D. B., and Kasif, S., Eds. 1998. Computational Methods in Molecular Biology. Elsevier, Amsterdam, The Netherlands.Google Scholar
- Schwartz, S., Zhang, Z., Frazer, K. A., Smit, A., Riemer, C., Bouck, J., Gibbs, R., Hardison, R., and Miller, W. 2000. PipMaker---A web server for aligning two genomic DNA sequence. Genome Res. 10, 4 (Apr.), 577--586.Google Scholar
- Searls, D. B. 1992. The linguistics of DNA. Amer. Sci. 80, 579--591.Google Scholar
- Searls, D. B. 1998. Grand challenges in computational Biology. In Computational Methods in Molecular Biology, S. L. Salzberg, D. B. Searls, and S. Kasif, Eds. Elsevier Amsterdam, The Netherlands.Google Scholar
- Searls, D. B. 2002. The language of genes. Nature 420 (November), 211--217.Google Scholar
- Setubal, J. and Meidanis, J. 1997. Introduction to Computational Molecular Biology, PWS Publishing.Google Scholar
- Thierry-Mieg, N. 2000. Protein-protein interaction prediction for C. elegans: In Knowledge Discovery in Biology, Workshop at the PKDD2000 (Conference on Principles and Practice of Knowledge Discovery in Databases) (Lyon, France, Sept.).Google Scholar
- Thompson, J. D., Higgins, D. G., and Gibson, T. J. 1994. CLUSTAL W: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positions-specific gap penalties and weight matrix choice. Nuc. Acid Res. 22, 4673--4680.Google Scholar
- Tomita, M., Hashimoto, K., Takahashi, K., Shimizu, T. S., Matsuzaki, Y., Miyoshi, F., Saito, K., Tanida, S., Yugi, K., Venter, J. C., and Hutchinson III, C. A. 1999. E-CELL: Software environment for whole cell simulation. Bioinformatics 15, 1, 72--84.Google Scholar
- Waterman, M. S. 1995. Introduction to Computational Biology: Maps, Sequences and Genomes. CRC Press.Google Scholar
- Watson, J. D. and Berry, A. 2003. DNA: The Secret of Life. Knopf.Google Scholar
- Wetherell, C. S. 1980. Probabilistic languages: A review and some open questions. ACM Comput. Surv. 12, 4, 361--379. Google Scholar
- Zuker, M. and Stiegler, P. 1981. Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information. Nuc. Acids Res. 9, 133--148. (Also see http://www.bioinfo.rpi.edu/∼zukerm/).Google Scholar
Index Terms
- Bioinformatics—an introduction for computer scientists
Recommendations
Bioinformatics and Constraints
This article introduces the topic of bioinformatics to an audience of computer scientists. We discuss the definition of bioinformatics, give a classification of the problem areas which bioinformatics addresses, and illustrate these in detail with ...
Whole genome transcriptional profiling of the mouse frontal cortex following repeated acamprosate administration
BEBI'09: Proceedings of the 2nd WSEAS international conference on Biomedical electronics and biomedical informaticsAcamprosate (calcium acetylhomotaurinate) has been used for the treatment of alcoholism for over 20 years, and while early evidence suggested it modulates the function of the N-methyl-D-aspartate (NMDA) glutamate receptor subtype, its precise mechanism ...
RobExT: a tool to customize microarray data for CellDesigner and Cytoscape
COMPUTE '12: Proceedings of the 5th ACM COMPUTE Conference: Intelligent & scalable system technologiesCellDesigner and Cytoscape are popularly used tools in systems biology studies for biological network construction. Plug-ins for these tools are also available which enable visualization of the microarray data in the context to the constructed ...
Comments