Abstract
Background: The composition and sequence of amino acids in a protein may serve the underlying needs of the nucleic acids that encode the protein (the genome phenotype). In extreme form, amino acids become mere placeholders inserted between functional segments or domains, and — apart from increasing protein length — playing no role in the specific function or structure of a protein (the conventional phenotype).
Methods: We studied the genomes of two malarial parasites and 521 prokaryotes (144 complete) that differ widely in GC% and optimum growth temperature, comparing the base compositions of the protein coding regions and corresponding lengths (kilobases).
Results: Malarial parasites show distinctive responses to base-compositional pressures that increase as protein lengths increase. A low-GC% species (Plasmodium falciparum) is likely to have more placeholder amino acids than an intermediate-GC% species (P. vivax), so that homologous proteins are longer. In prokaryotes, GC% is generally greater and AG% is generally less in open reading frames (ORFs) encoding long proteins. The increased GC% in long ORFs increases as species’ GC% increases, and decreases as species’ AG% increases. In low- and intermediate-GC% prokaryotic species, increases in ORF GC% as encoded proteins increase in length are largely accounted for by the base compositions of first and second (amino acid-determining) codon positions. In high-GC% prokaryotic species, first and third (non-amino acid-determining) codon positions play this role.
Conclusion: In low- and intermediate-GC% prokaryotes, placeholder amino acids are likely to be well defined, corresponding to codons enriched in G and/or C at first and second positions. In high-GC% prokaryotes, placeholder amino acids are likely to be less well defined. Increases in ORF GC% as encoded proteins increase in length are greater in mesophiles than in thermophiles, which are constrained from increasing protein lengths in response to base-composition pressures.
Similar content being viewed by others
References
Pizzi E, Frontali C. Low-complexity regions in Plasmodium falciparum proteins. Genome Res 2001; 11: 218–29
Forsdyke DR. Selective pressures that decrease synonymous mutations in Plasmodium falciparum. Trends Parasitol 2002; 18: 411–8
Xue HY, Forsdyke DR. Low complexity segments in Plasmodium falciparum are primarily nucleic acid level adaptations. Mol Biochem Parasitol 2003; 128: 21–32
Forsdyke DR, Mortimer JR. Chargaff’s legacy. Gene 2000; 261: 127–37
Forsdyke DR. Functional constraint and molecular evolution. In: Atkins D, editor. Nature encyclopedia of life sciences. Vol. 7. London: Macmillan Reference Ltd, 2002: 396–403
Schaap T. Dual information in DNA and the evolution of the genetic code. J Theor Biol 1971; 32: 293–8
Ball LA. Implications of secondary structure in messenger RNA. J Theor Biol 1972; 36: 313–20
Wan H, Wootton JC. A global compositional complexity measure for biological sequences: AT-rich and GC-rich genomes encode less complex proteins. Comput Chem 2000; 24: 71–94
Cristillo AD, Mortimer JR, Barrette IH, et al. Double stranded RNA as a not-self alarm signal: to evade, most viruses purine-load their RNAs, but some (HTLV-1, EBV) pyrimidine-load. J Theor Biol 2001; 208: 475–91
Lao PJ, Forsdyke DR. Thermophilic bacteria strictly obey Szybalski’s transcription direction rule and politely purine-load RNAs with both adenine and guanine. Genome Res 2000; 10: 228–36
Lambros RJ, Mortimer JR, Forsdyke DR. Optimum growth temperature and the base composition of open reading frames in prokaryotes. Extremophiles 2003; 7: 443–50
Paz A, Mester D, Baca I, et al. Adaptive role of increased frequency of polypurine tracts in mRNA sequences of thermophilic prokaryotes. Proc Natl Acad Sci U S A 2004; 101: 2951–6
Friedman R, Drake JW, Hughes AL. Genome-wide patterns of nucleotide substitution reveal stringent functional constraints on the protein sequences of thermophiles. Genetics 2004; 167: 1507–12
Nakamura Y, Gojobori T, Ikemura T. Codon usage tabulated from the international DNA sequence databases: status for the year 2000. Nucleic Acids Res 2000; 28: 292
Skovgaard M, Jensen LJ, Brunak S, et al. On the total number of genes and their length distribution in complete microbial genomes. Trends Genet 2001; 17: 425–8
Mortimer JR, Forsdyke DR. Comparison of responses by bacteriophages and bacteria to pressures on the base composition of open reading frames. Appl Bioinformatics 2003; 2: 47–62
Lobry JR, Chessel D. Internal correspondence analysis of codon and amino acid usage in thermophilic bacteria. J Appl Genet 2003; 44: 235–61
Osawa S, Jukes TH, Muto A, et al. Role of directional mutation pressure on the evolution of the eubacterial genetic code. Cold Spring Harb Symp Quant Biol 1987; 52: 777–89
Forsdyke DR. The origin of species, revisited. Montreal: McGill-Queen’s University Press, 2001
Chen SL, Lee W, Hottes AK, et al. Codon usage between genomes is constrained by genome-wide mutational processes. Proc Natl Acad Sci U S A 2004; 101: 3480–5
Lee S-J, Mortimer JR, Forsdyke DR. Genomic conflict settled in favour of the species rather than the gene at extreme GC percentage values. Appl Bioinformatics 2004; 3: 219–28
Khinchin AI. Mathematical foundations of information theory. New York: Dover Publications, 1957
Chang MSS, Benner SA. Empirical analysis of protein insertions and deletions determining parameters for the correct placement of gaps in protein sequence alignments. J Mol Biol 2004; 341: 617–31
Kurland CG. Major codon preference: theme and variation. Biochem Soc Trans 1993; 21: 841–6
Muto A, Osawa S. The guanine and cytosine content of genomic DNA and bacterial evolution. Proc Natl Acad Sci U S A 1987; 84: 166–9
Marais G, Duret L. Synonymous codon usage, accuracy of translation, and gene length in Caenorhabditis elegans. J Mol Evol 2001; 52: 275–80
Xia X, Xie Z, Li W-H. Effects of GC content and mutational pressure on the lengths of exons and coding sequences. J Mol Evol 2003; 56: 362–70
Wang H-C, Singer GAC, Hickey DA. Mutational bias affects protein evolution in flowering plants. Mol Biol Evol 2004; 21: 90–6
Russell RJM, Ferguson JMC, Hough DW, et al. The crystal structure of citrate synthase from the hyperthermophilic archaeon Pyrococcus furiosus at 1.9 Å resolution. Biochemistry 1997; 36: 9983–94
Tekaia F, Yeramian E, Dujon B. Amino acid composition of genomes, lifestyles of organisms, and evolutionary trends: a global picture with correspondence analysis. Gene 2002; 297: 51–60
Forsdyke DR. Sense in antisense? J Mol Evol 1995; 41: 582–6
Hooper SD, Berg OG. Gradients in nucleotide and codon usage along Escherichia coli genes. Nucleic Acids Res 2000; 28: 3517–23
Vinogradov AE. Compactness of human housekeeping genes: selection for economy or genomic design? Trends Genet 2004; 20: 248–53
Brocchieri L. Environmental signatures on proteome properties. Proc Natl Acad Sci U S A 2004; 101: 8257–8
Yagil G. The over-representation of binary DNA tracts in seven sequenced chromosomes. BMC Genomics 2004; 5: 19. Available from URL: http://www.biomedcentral.com/1471-2164/5/19 [Accessed 2005 Apr]
Desai D, Zhang K, Barik S, et al. Intragenic codon bias in a set of mouse and human genes. J Theor Biol 2004; 230: 215–25
Punnett RC. William Bateson. The Edinburgh Review 1926; 244: 71–86
Acknowledgments
We thank J.R. Mortimer for programs that extract base compositions from codon usage tables. Queen’s University hosts the webpages of Dr Forsdyke where partial or full-text versions of some of the cited references may be found (http://post.queensu.ca/~forsdyke/homepage.htm).
The authors have provided no information on sources of funding or on conflicts of interest directly relevant to the content of this article.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Rayment, J.H., Forsdyke, D.R. Amino Acids as Placeholders. Appl-Bioinformatics 4, 117–130 (2005). https://doi.org/10.2165/00822942-200504020-00005
Published:
Issue Date:
DOI: https://doi.org/10.2165/00822942-200504020-00005