Identification of Promoter Regions in the Human Genome by Using a Retroviral Plasmid Library-Based Functional Reporter Gene Assay

  1. Shirin Khambata-Ford1,5,
  2. Yueyi Liu2,
  3. Christopher Gleason1,
  4. Mark Dickson3,
  5. Russ B. Altman2,
  6. Serafim Batzoglou4, and
  7. Richard M. Myers1,3,6
  1. 1 Department of Genetics, Stanford University School of Medicine, Stanford, California 94305, USA
  2. 2 Stanford Medical Informatics, Stanford University School of Medicine, Stanford, California 94305, USA
  3. 3 Stanford Human Genome Center, Stanford University School of Medicine, Stanford, California 94305, USA
  4. 4 Department of Computer Science, Stanford University, Stanford, California 94305, USA

Abstract

Attempts to identify regulatory sequences in the human genome have involved experimental and computational methods such as cross-species sequence comparisons and the detection of transcription factor binding-site motifs in coexpressed genes. Although these strategies provide information on which genomic regions are likely to be involved in gene regulation, they do not give information on their functions. We have developed a functional selection for promoter regions in the human genome that uses a retroviral plasmid library-based system. This approach enriches for and detects promoter function of isolated DNA fragments in an in vitro cell culture assay. By using this method, we have discovered likely promoters of known and predicted genes, as well as many other putative promoter regions based on the presence of features such as CpG islands. Comparison of sequences of 858 plasmid clones selected by this assay with the human genome draft sequence indicates that a significantly higher percentage of sequences align to the 500-bp segment upstream of the transcription start sites of known genes than would be expected from random genomic sequences. We also observed enrichment for putative promoter regions of genes predicted in at least two annotation databases and for clones overlapping with CpG islands. Functional validation of randomly selected clones enriched by this method showed that a large fraction of these putative promoters can drive the expression of a reporter gene in transient transfection experiments. This method promises to be a useful genome-wide function-based approach that can complement existing methods to look for promoters.

Footnotes

  • Article published online before print in June 2003.

    [Supplemental material is available online at www.genome.org. The sequence data from this study have been submitted to GenBank under accession nos. AY270202–AY271252.]

  • Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.529803.

  • 5 Present address: Pharmacogenomics, Bristol-Myers Squibb Pharmaceutical Research Institute, Princeton NJ 08543, USA.

  • 6 Corresponding author. E-MAIL myers{at}shgc.stanford.edu; FAX (650) 725-9689.

    • Accepted April 1, 2003.
    • Received June 14, 2002.
| Table of Contents

Preprint Server