Introduction

The remarkable chromatin-organizing factor CTCF was discovered in 1990 (Baniahmad et al. 1990; Lobanenkov et al. 1990) and has gained increasing attention especially during the last decade (Table 1). Important results in chromatin-mediated molecular mechanisms have been catalyzed by the CTCF connection. During this time, many key aspects of chromatin structure and function in general and the key role of CTCF in particular were brought to light. Here we detail important chromatin-mediated molecular mechanisms and highlight the fundamental role played in them by CTCF. The different levels of CTCF action start with chromatin binding and nucleosomal positioning. Three-dimensional enhancer function and blocking by CTCF is the next level. How are interactions in cis or in trans mediated, and what is the role of cohesin binding in these? Furthermore, the involvement of CTCF in specific chromatin features such as imprinting, X-chromosome inactivation, and heterochromatin barrier function are discussed.

Table 1 Time line in CTCF milestones

How are nucleosomes positioned, and what does it have to do with CTCF?

In eukaryotes, DNA is packaged into a protein–DNA complex termed chromatin where 147 bp of DNA is wrapped around histone octamers to form nucleosomes that are typically separated by 20–50 bp of linker DNA. For about half of the genome, the precise position of a given nucleosome relative to the genomic sequence varies between individual nuclei, i.e., the nucleosomes are not regularly positioned. However, in order for DNA-binding factors to bind, nucleosomes have to be properly positioned. Currently, a number of factors are known to play a role in determining the position of nucleosomes such as the underlying sequence of the DNA, the availability/binding of transcription factors, and subsequent recruitment of nucleosome remodeling activities (Segal and Widom 2009).

A first indication that the sequences in the vicinity of a CTCF binding site (CTS) are marked by a specific pattern of nucleosome occupancy came from the analysis of the H19 imprinting control region (ICR). In this study, the CTSs were found at linker regions between positioned nucleosomes (Kanduri et al. 2002). Recent advances in high-throughput analyses of nucleosome occupancy have now shown that CTSs are associated with precise positioning of up to 20 nucleosomes (Fu et al. 2008).

What causes this arrangement of precisely positioned nucleosomes in relationship to CTSs? Since no sequence conservation in the regions adjacent to the CTSs has been found, CTCF binding on its own was suggested to be able to move and arrange nucleosomes into evenly spaced positions (Fu et al. 2008). However, in the case of the H19 ICR, it was conclusively shown that the nucleosome positioning is a feature of the underlying sequence rather than the presence of CTCF (Kanduri et al. 2002). Moreover, there is intrinsic sequence information in a large fraction of the genome that controls nucleosome positioning despite the absence of any detectable sequence motif (Segal and Widom 2009). Thus, CTCF binding within an array of positioned nucleosomes may work through two different mechanisms. On the one hand, the evolution of large stretches of sequences capable of positioning nucleosomes may have co-evolved with the emergence of CTSs within the corresponding linker regions. On the other hand, the finding that CTCF interacts with the ATP-dependent chromatin remodeler CHD8 (Ishihara et al. 2006) may suggest an active role in controlling nucleosome movement to initiate nucleosomal phasing.

How do enhancers function?

Enhancers operate by increasing the likelihood of transcriptional activation of nearby genes (Fiering et al. 2000; Li et al. 2006). The enhancer regions, which are made up of multiple cis-regulatory elements attracting trans-acting factors, can be positioned on either side of the transcriptional start site as well as at long distances from the promoter, sometimes far beyond 100 kb (Sagai et al. 2009). The CTCF-dependent chromatin insulator activity antagonizes both short- and long-range enhancer functions in a manner that, although poorly understood, in all likelihood reflects upon the mode of enhancer action.

Despite several decades of intensive research, the inner workings of enhancer functions remain enigmatic. One mechanism of enhancer function has been suggested to take place via the formation of a chromatin loop, resulting in enhancer contacts with the gene promoter (Carter et al. 2002; Ohlsson et al. 2001; Phillips and Corces 2009; Tolhuis et al. 2002). Such a formation can be achieved either by the enhancer tracking along the chromatin fiber itself, or by the direct formation of enhancer–promoter chromatin loops or combinations thereof. An alternative mechanism for enhancer-mediated transcriptional activation involves the so-called transcription factory (Faro-Trindade and Cook 2006). This transient transcription factory structure, which can be visualized by antibodies against active RNA polymerase II, appears to support simultaneous transcription of many coding genes (Sutherland and Bickmore 2009). Thus, to provide insights into how enhancers work, the question of how these transcription factories are formed will need to be addressed.

Real-time analysis has revealed that transcriptional activation is preceded by increased mobility of chromatin fibers (Chuang et al. 2006). Chromatin marks, such as histone acetylation, are associated with increased chromatin mobility and flexibility (Li et al. 2006), raising the possibility that enhancers regulate these features by tracking along the chromatin fiber to leave acetylated histones in its wake. This process has two pivotal consequences: An enhanced ability of a transcriptional unit to explore its environments may eventually lead to the recognition of a nearby transcription factory and/or to the clustering of transcription units prior to overt transcriptional activation. Such clustering was observed at the TH2-cell locus with neighboring interleukin 4, 5, and 13 genes in type-2 helper T cells (TH2 cells) being primed for transcriptional activation (Cai et al. 2006). Moreover, the formation of a transcription factory on these physically aligned regulatory elements was suggested to coordinate their expression (Cai et al. 2006). Thus, compacting genomic sequences into a small volume increases the chances of enhancer/promoter interaction. Whether formation of a transcription factory may be assisted by the interaction of CTCF molecules flanking such clusters remains to be shown.

How are enhancers blocked to prevent unscheduled promoter activation?

Chromatin insulators may have evolved in concert with the emergence of gene clusters to differentially regulate the activity of cluster members. Indeed, coexpressed genes are generally flanked by CTS elements (Xie et al. 2007). Furthermore, insulators can act over large distances, in excess of 100 kb, to shield particular promoters from enhancer functions in developmentally regulated fashions (Wallace and Felsenfeld 2007). This property requires that the insulator is located between enhancer(s) and a promoter, in order to be kept inactive. In contrast to repressors, however, chromatin insulators act in a position-dependent manner, a feature strongly suggesting that this process acts in a linear manner. In support of this notion is the evidence that CTCF binds to the insulator of the maternally inherited H19 ICR, thereby physically preventing the enhancer from tracking along the chromatin fiber (Kurukuti et al. 2006). Finally, as already mentioned, the CTCF insulator-mediated blocking of enhancer-induced histone acetylation (Zhao and Dean 2004) could impede the mobility/flexibility of transcription units separated from the enhancer by the insulator.

Chromatin insulators were also suggested to play a role in organizing complex chromatin structures with distal cis-regulatory elements that have been alleged to prevent enhancer-mediated transcriptional activation. In the “inactive loop” (Kurukuti et al. 2006) and “knotted loop” (Qiu et al. 2008) models, the insulator is proposed to act as a topological barrier, creating a tight and transcriptionally inactive loop. In the “unproductive loop” model, the insulator competes with the enhancer for promoters, acting in effect as a promoter decoy for enhancers (Yoon et al. 2007). A more elaborate discussion on CTCF-dependent chromatin loops is presented in Phillips and Corces (2009). A unifying hypothesis invokes the possibility that the insulator has a dual function, i.e., the insulator can provide a physical barrier and, that as a generator of three-dimensional structures, it can also target distal cis-regulatory elements.

What is the function of the CTCF–cohesin connection?

Sister chromatid cohesion is the process that holds together sister chromatids after replication in S phase. Cohesion is promoted by a protein complex that forms a ring-shaped structure consisting of multiple cohesins. This process is necessary for proper chromosome segregation in mitosis as well as for post-replicative DNA-repair mechanisms. Recent studies have extended this canonical function of the cohesin complex toward a role in gene regulatory circuits. For example, in Drosophila and yeast, a role for cohesin subunits in transcriptional processes beyond sister chromatid cohesion was suggested (Dorsett 2007). In fact, these studies pointed to a role for cohesin in gene regulation via a mechanism resulting in organization of a higher order chromatin structure. Subsequently, cohesin components were found to interact with chromatin by CTCF-dependent recruitment (Fig. 1) during interphase (Parelho et al. 2008; Rubio et al. 2008; Stedman et al. 2008; Wendt et al. 2008). Furthermore, CTCF-dependent enhancer blocking at the H19 ICR was found to depend on cohesin components (Wendt et al. 2008), suggesting that cohesin mediates enhancer blocking via CTCF-dependent recruitment to insulator sites.

Fig. 1
figure 1

Three-dimensional chromatin interaction is observed on two levels of complexity. CTCF–cohesin complexes (green) contract chromatin fibers in cis by linking nearby CTCF binding sites. For longer-range interactions, such as between different chromosomes (light and dark blue), specificity beyond the CTCF–cohesin interaction appears to be necessary. This process may be mediated by unknown factors in concert with the CTCF–cohesin complex bound to one of the chromatin partners

Expression analysis after CTCF/cohesin knockdown through RNAi in HeLa cells demonstrated an overlap between the target genes of both factors, with a number of them being indeed bound by CTCF and cohesin (Wendt et al. 2008). This finding indicates not only that cohesin connects identical sequences on sister chromatids but that remote cis-regulatory regions may also be connected by cohesin bound to CTCF (Fig. 1). This CTCF/cohesin binding may result in chromatin loop formation, which in turn may play a role in several key aspects of chromatin function, such as enhancer action, enhancer blocking, or immunoglobulin recombination. Indeed, a recent study has demonstrated that CTCF-bound cohesin is required to activate the IFNG gene and that loop formation as well as IFNG expression are augmented by cohesin as shown by depletion of Rad21, a subunit of the cohesin complex (Hadjur et al. 2009).

Additional evidence that cohesin plays a role in transcriptional regulation has come from recent studies analyzing patients suffering from Cornelia de Lange syndrome, a disease caused by mutations in several genes coding for components of the cohesin complex (Liu et al. 2009). A significant overlap between genes changing their expression levels after cohesin reduction or CTCF depletion was detected (Wendt et al. 2008). However, in addition to CTCF binding, a more general role for cohesin in transcription was suggested as the promoters of genes deregulated in CdLS patients were markedly enriched for cohesin binding in the absence of CTCF (Liu et al. 2009). A correlation between cohesin binding and transcription has been seen in Drosophila as well (Misulovin et al. 2008; Schaaf et al. 2009).

Interaction of remote sites and loop formation is also required for V(D)J recombination at the immunoglobulin loci. Again, multiple binding sites for CTCF/RAD21 have been identified at the Igh and Igκ locus. Interestingly, CTCF binding at these sites appears largely unchanged throughout differentiation. In contrast, RAD21 recruitment to CTSs was found to be both lineage and stage specific (Degner et al. 2009). Thus, CTCF binding to DNA does not appear to determine the site-specific action of cohesin, but rather a site-specific modification of CTCF or the presence of other factors may regulate cohesin binding or function.

How are long-range chromatin contacts made, even between different chromosomes?

The spatial conformation of the interphase chromatin enables regions to be separated far from each other to make physical contact both within and between chromosomes (Gondor and Ohlsson 2009). There is also accumulating evidence to suggest that these interactions are often CTCF-dependent both in cis and in trans. Using methods based on the chromosome conformation capture (3C) technique (Dekker 2006), the insulator sites at the HS5 site at the 5′-boundary of the mouse β-globin gene (Splinter et al. 2006) as well as at the H19 ICR (Kurukuti et al. 2006) were found to generate loops involving CTCF and CTCF binding sites. Moreover, both of these regions can also interact with a wide range of sequences derived from almost all autosomal chromosomes (Ling et al. 2006; Sandhu et al. 2009; Zhao et al. 2006).

How are these regions making contact with each other, and how can the specificity in such long-range interactions be achieved? Stochastic movements of chromatin fibers presumably provide opportunities for chromatin fiber collisions with a frequency directly proportional to their proximities and affinity to each other. Research showing that CTCF–cohesin complexes contract chromatin fibers in cis at the IFNG (Hadjur et al. 2009) and apolipoprotein (Mishiro et al. 2009) loci suggests that the cohesin link brings more proximal CTSs together. Importantly, for the IFNG locus, it was conclusively shown that this interaction was strictly maintained in cis (Hadjur et al. 2009). It is thus conceivable that CTCF and components of the cohesin complex provide sufficient affinity between interacting CTSs to stabilize, perhaps even to immobilize, chromatin fiber interactions (Wallace and Felsenfeld 2007) resulting from proximal stochastic collisions (Fig. 1).

For longer-range interactions, an element of specificity beyond the CTCF–cohesin interaction appears to be necessary to avoid extensive tangling of chromatin fibers to tens of thousands of CTSs. This notion is further supported by the absence of any enrichment of CTSs in sequences from other chromosomes directly or indirectly interacting with the CTSs within the H19 ICR (Sandhu et al., unpublished observation). It thus appears that CTCF when bound to a chromatin fiber has shorter-range affinity for CTSs in cis, but longer-range affinities for other partners both in cis and in trans. Specificity for longer-range interaction may incorporate 3D features of higher order chromatin structures that provide an affinity for a similar region on another chromosome (Gondor and Ohlsson 2009) (Fig. 1).

How are imprinted chromatin marks read?

Our understanding of genomic imprinting, i.e., parent of origin-specific epigenetic marks frequently manifested in mono-allelic expression patterns, has been dramatically improved by the analysis of CTCF and chromatin insulation. Although the current research focus is heavily on H19 ICR, CTCF is also known to associate with several other imprinted domains, such as DLK1/GTL2 (Wylie et al. 2000), Meg1/Grb10 (Hikichi et al. 2003), Rasgrf1 (Yoon et al. 2005), MEG-3 (Rosa et al. 2005), and KvDMR (Fitzpatrick et al. 2007). For most of these domains, CTCF is known to bind in an allele-specific manner to a region pivotal for the regulation of the imprinted status.

CpG methylation, a key parent of origin-specific epigenetic mark, is not only strongly linked with regulating occupancy of CTSs (Mukhopadhyay et al. 2004) but also DNA binding of CTCF protects against de novo methylation (Pant et al. 2003; Schoenherr et al. 2003). In addition to this dual link between CTCF and an epigenetic state, CTCF also regulates asynchronous replication timing. In other words, for imprinted genes, one parental allele replicates earlier, whereas the opposite allele replicates generally late during the S phase. Thus, CTSs within the H19 ICR domain appear to influence the timing by delaying replication of the maternally inherited Igf2/H19 domain (Bergstrom et al. 2007).

In this context, it is of interest to note that the H19 ICR domain interacts preferentially with other imprinted domains in the germline as well as in stem and somatic cells and that CTSs within the H19 ICR domain confer replication timing patterns on the interacting sequences (Sandhu et al. 2009). The question then becomes what are the underlying structural features that enable chromatin to bring about these processes? One possibility is that particular combinations of repeat elements underlie a chromatin scaffold that together with CTCF and other factors provides an epigenetic signature with a high affinity to a chromatin structure on another chromosome (Gondor and Ohlsson 2009). This hypothesis is partially borne out by the observation that imprinted states can be predicted depending on constellations of repeat elements within imprinted domains (Walter et al. 2006). However, it remains to be established whether such features in combination with CTSs were selected to transfer epigenetic states in trans to facilitate evolution of genomic imprinting (Sandhu et al. 2009).

What are the mechanisms of X-chromosomal inactivation?

X-chromosomal inactivation is the process that compensates for the dosage difference in sex chromosomes between male and female mammals. A number of key players involved in this process have been identified through the past decades; however, there is still some debate about the exact mechanisms underlying this biological phenomenon. So far, it has been established that the two X-chromosomes come into close contact in the nuclear space, just at the onset of inactivation of one of the two X-chromosomes (Chow and Heard 2009). Moreover, pairing was found to be dependent on the X inactivation center (XIC), which itself is involved in the counting of X-chromosomes and the choice of inactivation (Bacher et al. 2006). Within the XIC, a short fragment was found to recapitulate pairing and that pairing was dependent on transcription as well as on CTCF (Xu et al. 2007). Recently, a model was proposed for XIC regulation (Donohoe et al. 2009). This model involves the transcriptional regulator Oct4 that interacts not only with CTCF bound to many sites, but also with the XIC regulator genes Tsix and Xite, which are located in antisense orientation next to the gene coding for the X inactive specific transcript (Xist). Oct-4 activated Tsix and Xite transcription inhibits transcription of Xist on the same chromosome. Upon CTCF-mediated pairing and cell differentiation, Oct4 levels are reduced such that only the active X will continue with Tsix transcription, whereas the lack of Tsix activity would enable Xist expression on the inactive chromosome.

How are heterochromatin boundaries maintained?

Heterochromatic (inactive) regions are insulated from euchromatic (active) regions (Fig. 2). If insulation at these boundaries is defective, as observed after chromosomal rearrangements, for example, a spreading of inactive chromatin modification into the active chromatin is observed; a phenomenon called position-effect variegation (Probst et al. 2009). A role for CTCF in barrier function was already proposed in the 1990s based on the finding that several CTS-containing insulator elements were able to block the position effects on reporter genes stably integrated in the genome (Chung et al. 1993; Li et al. 2002). In contrast, it was shown that the enhancer blocking and barrier functions of the chicken beta-globin HS4 were separable, indicating that the boundary function of this element was independent of CTCF (Recillas-Targa et al. 2002).

Fig. 2
figure 2

Heterochromatic (inactive) chromatin regions are insulated from euchromatic (active) regions. Active domains, as exemplified by active genes, histone acetylation, or histone H3 lysine 4 methylation (H3K4me2), are separated from inactive domains, which are identified by repressed genes, histone H3 lysine 27 methylation (H3K27me3), lamin B1, or polycomb binding. These heterochromatic regions are often associated with the nuclear lamina (gray arc). Several chromatin features have been identified at the border position between domains. These are CpG islands and active promoters, loss, or high turnover rate of nucleosomes and CTCF/cohesin binding. A border function may be mediated by chromatin activation at these regions to counteract any spreading of inactive chromatin marks into the active domains

Recent advances in high-throughput genomics have resulted in genome-wide binding profiles of CTCF in different organisms. These studies demonstrated a significant association of CTCF with boundary elements defining the borders between adjacent chromatin domains of opposing activity as determined by the association with specific histone modification marks (Barski et al. 2007; Bartkuhn et al. 2009; Cuddapah et al. 2009). Binding to boundaries between active and repressed chromatin was seen especially in the context of H3K27me3, a histone modification that occurs in large chromosomal domains ranging from several kb up to several 100 kb in Drosophila (Fig. 2). H3K27me3 modification is regarded as a hallmark of Polycomb-repressed chromatin. These regions are marked by low gene density. Furthermore, gene activity is found with lower levels as compared to genes at other genomic locations (Schwartz et al. 2006).

Similarly, CTCF was identified to bind to the margins of lamin-associated domains (LADs). Association of chromatin with the nuclear lamina is believed to negatively influence gene expression through recruitment of chromatin to the nuclear periphery. Interestingly, high levels of H3K27me3 are also characteristic of those domains, suggesting that LADs are similar to H3K27me3 domains (Guelen et al. 2008). Additionally, LAD borders were marked by active transcription from divergent promoters transcribing away from the repressed domains or by CpG islands, which again are indicative of active promoters (Fig. 2).

CTCF-dependent heterochromatin boundaries have been recently suggested to play a major role in the regulation of tumor suppressor genes (Witcher and Emerson 2009). Binding of CTCF to sites upstream of the promoters of the p16, CHD1, and RASSF1A genes is correlated with activation of the downstream genes, whereas the regions upstream of the CTSs are marked by repressive chromatin modification with the CTS demarcating the transition zones between heterochromatic and euchromatic histone modifications. Consequently, loss of CTCF binding leads to repressive chromatin marks spreading into the p16 promoter as well as to a loss of gene expression.

The mechanism of barrier function could be a physical block generated by CTCF binding. Furthermore, the presence of active promoters and/or the association of CTCF with RNA polymerase II (Chernukhin et al. 2007) and with active promoters (Bartkuhn et al. 2009) may provide a local active chromatin region counteracting any heterochromatic spreading.

Outlook

There are many important aspects concerning chromatin organization in general and the role of CTCF in particular that need addressing in the future. To start with, CTCF binding sites are known to be flanked by an array of phased nucleosomes. But what is the cause and effect of nucleosome positioning and CTCF binding? Did the nucleosome positioning feature evolve before or after the emergence of CTCF binding sites? As nucleosome positioning is determined by the underlying sequence and independent of CTCF binding sites within at least the H19 ICR, it is reasonable to assume that nucleosome positioning evolved prior to CTCF binding.

The insulator function in some cases has been shown to involve chromatin long range and CTCF and/or cohesin binding. But how does this mechanistically interfere with promoter/enhancer interaction? Could it operate by stabilizing transient interactions in cis, thereby allowing more time for epigenetic factors, such as Suz12 (Li et al. 2008) to silence the interacting region? Or does this interaction serve to repress the interacting region by contracting chromatin conformations?

ICRs display not only CTCF-dependent chromatin insulator function but also a CTCF-dependent ability to delay replication timing, which is strikingly different between the alleles (Sandhu et al. 2009). How do CTCF binding sites within such ICRs regulate BOTH insulation and replication timing? And how can the H19 ICR regulate replication timing in trans in a CTCF-dependent manner? Is there a division of labor between different CTCF binding sites, such that some govern insulation and others replication timing patterns?

Furthermore, it is not clear whether CTCF has the ability to interconnect its binding sites on a genome-wide scale. This is hinted at by the demonstrations that CTCF appears to contract chromatin structures by recruiting cohesin (Hadjur et al. 2009). However, there is currently no documentation that this occurs in trans. Even if this would be the case, how are the specificities in interactions between CTCF binding sites on different chromosomes achieved?

Finally, besides a striking colocalization of CTCF with chromatin domain boundaries, there is functional evidence of CTCF mediating a chromatin barrier function. Whether barrier sites are mechanistically different from sites mediating enhancer blocking remains to be shown. If so, what are the decisive hallmarks discriminating CTCF binding sites governing insulation and barrier functions, respectively? This question can be extended to all of the CTCF-mediated features: Are all of these functions realized at each of the 30,000 genomic binding sites (Cuddapah et al. 2009) or to which extent, are binding site-specific functions mediated by the DNA sequence or by neighboring factors? By extrapolating the timeline of CTCF discoveries, we have here identified what we believe are key issues within this research area. By formulating specific questions, we hope to stimulate discussions among colleagues dedicated to research on CTCF, which is truly a remarkable factor.