Skip to main content
Log in

Amino Acids as Placeholders

Base-Composition Pressures on Protein Length in Malaria Parasites and Prokaryotes

  • Original Research Article
  • Published:
Applied Bioinformatics

Abstract

Background: The composition and sequence of amino acids in a protein may serve the underlying needs of the nucleic acids that encode the protein (the genome phenotype). In extreme form, amino acids become mere placeholders inserted between functional segments or domains, and — apart from increasing protein length — playing no role in the specific function or structure of a protein (the conventional phenotype).

Methods: We studied the genomes of two malarial parasites and 521 prokaryotes (144 complete) that differ widely in GC% and optimum growth temperature, comparing the base compositions of the protein coding regions and corresponding lengths (kilobases).

Results: Malarial parasites show distinctive responses to base-compositional pressures that increase as protein lengths increase. A low-GC% species (Plasmodium falciparum) is likely to have more placeholder amino acids than an intermediate-GC% species (P. vivax), so that homologous proteins are longer. In prokaryotes, GC% is generally greater and AG% is generally less in open reading frames (ORFs) encoding long proteins. The increased GC% in long ORFs increases as species’ GC% increases, and decreases as species’ AG% increases. In low- and intermediate-GC% prokaryotic species, increases in ORF GC% as encoded proteins increase in length are largely accounted for by the base compositions of first and second (amino acid-determining) codon positions. In high-GC% prokaryotic species, first and third (non-amino acid-determining) codon positions play this role.

Conclusion: In low- and intermediate-GC% prokaryotes, placeholder amino acids are likely to be well defined, corresponding to codons enriched in G and/or C at first and second positions. In high-GC% prokaryotes, placeholder amino acids are likely to be less well defined. Increases in ORF GC% as encoded proteins increase in length are greater in mesophiles than in thermophiles, which are constrained from increasing protein lengths in response to base-composition pressures.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Table 1
Table 2

Similar content being viewed by others

References

  1. Pizzi E, Frontali C. Low-complexity regions in Plasmodium falciparum proteins. Genome Res 2001; 11: 218–29

    Article  PubMed  CAS  Google Scholar 

  2. Forsdyke DR. Selective pressures that decrease synonymous mutations in Plasmodium falciparum. Trends Parasitol 2002; 18: 411–8

    Article  PubMed  CAS  Google Scholar 

  3. Xue HY, Forsdyke DR. Low complexity segments in Plasmodium falciparum are primarily nucleic acid level adaptations. Mol Biochem Parasitol 2003; 128: 21–32

    Article  PubMed  CAS  Google Scholar 

  4. Forsdyke DR, Mortimer JR. Chargaff’s legacy. Gene 2000; 261: 127–37

    Article  PubMed  CAS  Google Scholar 

  5. Forsdyke DR. Functional constraint and molecular evolution. In: Atkins D, editor. Nature encyclopedia of life sciences. Vol. 7. London: Macmillan Reference Ltd, 2002: 396–403

    Google Scholar 

  6. Schaap T. Dual information in DNA and the evolution of the genetic code. J Theor Biol 1971; 32: 293–8

    Article  PubMed  CAS  Google Scholar 

  7. Ball LA. Implications of secondary structure in messenger RNA. J Theor Biol 1972; 36: 313–20

    Article  PubMed  CAS  Google Scholar 

  8. Wan H, Wootton JC. A global compositional complexity measure for biological sequences: AT-rich and GC-rich genomes encode less complex proteins. Comput Chem 2000; 24: 71–94

    PubMed  CAS  Google Scholar 

  9. Cristillo AD, Mortimer JR, Barrette IH, et al. Double stranded RNA as a not-self alarm signal: to evade, most viruses purine-load their RNAs, but some (HTLV-1, EBV) pyrimidine-load. J Theor Biol 2001; 208: 475–91

    Article  PubMed  CAS  Google Scholar 

  10. Lao PJ, Forsdyke DR. Thermophilic bacteria strictly obey Szybalski’s transcription direction rule and politely purine-load RNAs with both adenine and guanine. Genome Res 2000; 10: 228–36

    Article  PubMed  CAS  Google Scholar 

  11. Lambros RJ, Mortimer JR, Forsdyke DR. Optimum growth temperature and the base composition of open reading frames in prokaryotes. Extremophiles 2003; 7: 443–50

    Article  PubMed  CAS  Google Scholar 

  12. Paz A, Mester D, Baca I, et al. Adaptive role of increased frequency of polypurine tracts in mRNA sequences of thermophilic prokaryotes. Proc Natl Acad Sci U S A 2004; 101: 2951–6

    Article  PubMed  CAS  Google Scholar 

  13. Friedman R, Drake JW, Hughes AL. Genome-wide patterns of nucleotide substitution reveal stringent functional constraints on the protein sequences of thermophiles. Genetics 2004; 167: 1507–12

    Article  PubMed  CAS  Google Scholar 

  14. Nakamura Y, Gojobori T, Ikemura T. Codon usage tabulated from the international DNA sequence databases: status for the year 2000. Nucleic Acids Res 2000; 28: 292

    Article  PubMed  CAS  Google Scholar 

  15. Skovgaard M, Jensen LJ, Brunak S, et al. On the total number of genes and their length distribution in complete microbial genomes. Trends Genet 2001; 17: 425–8

    Article  PubMed  CAS  Google Scholar 

  16. Mortimer JR, Forsdyke DR. Comparison of responses by bacteriophages and bacteria to pressures on the base composition of open reading frames. Appl Bioinformatics 2003; 2: 47–62

    PubMed  CAS  Google Scholar 

  17. Lobry JR, Chessel D. Internal correspondence analysis of codon and amino acid usage in thermophilic bacteria. J Appl Genet 2003; 44: 235–61

    PubMed  Google Scholar 

  18. Osawa S, Jukes TH, Muto A, et al. Role of directional mutation pressure on the evolution of the eubacterial genetic code. Cold Spring Harb Symp Quant Biol 1987; 52: 777–89

    Article  PubMed  CAS  Google Scholar 

  19. Forsdyke DR. The origin of species, revisited. Montreal: McGill-Queen’s University Press, 2001

    Google Scholar 

  20. Chen SL, Lee W, Hottes AK, et al. Codon usage between genomes is constrained by genome-wide mutational processes. Proc Natl Acad Sci U S A 2004; 101: 3480–5

    Article  PubMed  CAS  Google Scholar 

  21. Lee S-J, Mortimer JR, Forsdyke DR. Genomic conflict settled in favour of the species rather than the gene at extreme GC percentage values. Appl Bioinformatics 2004; 3: 219–28

    Article  PubMed  CAS  Google Scholar 

  22. Khinchin AI. Mathematical foundations of information theory. New York: Dover Publications, 1957

    Google Scholar 

  23. Chang MSS, Benner SA. Empirical analysis of protein insertions and deletions determining parameters for the correct placement of gaps in protein sequence alignments. J Mol Biol 2004; 341: 617–31

    Article  PubMed  CAS  Google Scholar 

  24. Kurland CG. Major codon preference: theme and variation. Biochem Soc Trans 1993; 21: 841–6

    PubMed  CAS  Google Scholar 

  25. Muto A, Osawa S. The guanine and cytosine content of genomic DNA and bacterial evolution. Proc Natl Acad Sci U S A 1987; 84: 166–9

    Article  PubMed  CAS  Google Scholar 

  26. Marais G, Duret L. Synonymous codon usage, accuracy of translation, and gene length in Caenorhabditis elegans. J Mol Evol 2001; 52: 275–80

    PubMed  CAS  Google Scholar 

  27. Xia X, Xie Z, Li W-H. Effects of GC content and mutational pressure on the lengths of exons and coding sequences. J Mol Evol 2003; 56: 362–70

    Article  PubMed  CAS  Google Scholar 

  28. Wang H-C, Singer GAC, Hickey DA. Mutational bias affects protein evolution in flowering plants. Mol Biol Evol 2004; 21: 90–6

    Article  PubMed  Google Scholar 

  29. Russell RJM, Ferguson JMC, Hough DW, et al. The crystal structure of citrate synthase from the hyperthermophilic archaeon Pyrococcus furiosus at 1.9 Å resolution. Biochemistry 1997; 36: 9983–94

    Article  PubMed  CAS  Google Scholar 

  30. Tekaia F, Yeramian E, Dujon B. Amino acid composition of genomes, lifestyles of organisms, and evolutionary trends: a global picture with correspondence analysis. Gene 2002; 297: 51–60

    Article  PubMed  CAS  Google Scholar 

  31. Forsdyke DR. Sense in antisense? J Mol Evol 1995; 41: 582–6

    PubMed  CAS  Google Scholar 

  32. Hooper SD, Berg OG. Gradients in nucleotide and codon usage along Escherichia coli genes. Nucleic Acids Res 2000; 28: 3517–23

    Article  PubMed  CAS  Google Scholar 

  33. Vinogradov AE. Compactness of human housekeeping genes: selection for economy or genomic design? Trends Genet 2004; 20: 248–53

    Article  PubMed  CAS  Google Scholar 

  34. Brocchieri L. Environmental signatures on proteome properties. Proc Natl Acad Sci U S A 2004; 101: 8257–8

    Article  PubMed  CAS  Google Scholar 

  35. Yagil G. The over-representation of binary DNA tracts in seven sequenced chromosomes. BMC Genomics 2004; 5: 19. Available from URL: http://www.biomedcentral.com/1471-2164/5/19 [Accessed 2005 Apr]

    Article  PubMed  Google Scholar 

  36. Desai D, Zhang K, Barik S, et al. Intragenic codon bias in a set of mouse and human genes. J Theor Biol 2004; 230: 215–25

    Article  PubMed  CAS  Google Scholar 

  37. Punnett RC. William Bateson. The Edinburgh Review 1926; 244: 71–86

    Google Scholar 

Download references

Acknowledgments

We thank J.R. Mortimer for programs that extract base compositions from codon usage tables. Queen’s University hosts the webpages of Dr Forsdyke where partial or full-text versions of some of the cited references may be found (http://post.queensu.ca/~forsdyke/homepage.htm).

The authors have provided no information on sources of funding or on conflicts of interest directly relevant to the content of this article.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Donald R. Forsdyke.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rayment, J.H., Forsdyke, D.R. Amino Acids as Placeholders. Appl-Bioinformatics 4, 117–130 (2005). https://doi.org/10.2165/00822942-200504020-00005

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.2165/00822942-200504020-00005

Keywords

Navigation