Keywords

1 Structural Biology in the Post-Genomics Era

1.1 Introduction

The past decade has seen an explosion of genome sequences, thanks to the many advances in sequencing technology. These global sequencing efforts have provided us with genetic blueprints for a myriad of organisms in all kingdoms of life. The approach to biomedical research therefore has undergone a radical and dramatic transformation in the post-genomics era. In the emerging era of genomic medicine, it is now possible to sequence completely 3 × 109 base pairs in the human genome for individual patients. We are now tasked with the annotation and description of the plethora of genomic data with regards to biological functions. Although the protein-coding genomic space (exome) is small, where protein-coding exons account for only 1% of the human genome, it represents a majority of the targets for drug development, and 85% of Mendelian diseases are caused by genetic variations in the exomic space. A protein is not merely an “alphabetical” sequence of amino acids, but a macromolecule with three-dimensional (3D) shape and form, capable of performing specialized biological functions in the cell via dynamic interactions with other proteins, small ligands and cellular components. These functional properties depend on a protein’s three-dimensional structure, and the field of structural biology is instrumental in directing research towards an understanding of protein function and disease. A large amount of resources have now been put in place, at the disposition of the broad community of non-structural biologists in biomedical research, to exploit the wealth of protein structure information.

In this chapter we aim to provide a brief overview of the current status in protein structure determination, and summarize how protein structure analysis is integral to two active and growing areas of biomedical research, namely understanding genetic variations at a protein level to help disease diagnosis and guiding the development of small molecule therapeutics. Due to the broad subject matter, it is beyond the scope of this chapter to provide an extensive discussion of all significant developments in the ever expanding applications of structural biology. We, however, refer the interested reader to some excellent articles in the relevant sections for more in-depth reviews. We also apologize to all those colleagues whose important work could not be cited, or was cited indirectly, because of space consideration and reference limits.

1.2 Methods of Obtaining Structural Information

1.2.1 Experimental Approaches

As of December 2011, the Protein Structure Database (PDB) contained ~77,700 protein structures in the public domain (http://www.pdb.org). These 3D structures are experimentally derived by methods such as X-ray crystallography, nuclear magnetic resonance (NMR) spectroscopy and electron microscopy (EM). Among them, X-ray crystallography is the dominant structure provider (Fig. 1a), contributing ~87% of total PDB entries. Since the first protein crystal structure in the 1960s (that of myoglobin [1]), the field of protein crystallography has made tremendous technological advances in all stages of the structure determination process. Examples of such development include the use of heterologous systems (e.g. bacteria, baculovirus-infected insect cells, yeast) to recombinantly express proteins in milligram quantities [2], use of fusion tags and automated chromatography platforms to purify proteins, use of robotics in performing nanolitre-scale crystallization experiments [3], improvement of synchrotron and in-house X-ray sources that reduces data collection time and extends resolution limits [4]; and software development to accelerate the in silico data processing steps [5]. At present high-resolution crystal structures can often be determined within days of obtaining diffraction-grade crystals.

Fig. 1
figure 1

Methods of protein structure determination. Experimentally protein structures can be determined by protein crystallography (PX), nuclear magnetic resonance (NMR) spectroscopy and electron microscopy (EM). These methods take advantage of recombinant technology that facilitates the heterologous expression and purification of protein domains or complexes. Structure models can also be constructed by homology modeling, if a structure template homologous to the protein of interest is available

A second method of structure determination, solution NMR, analyzes resonance assignment derived from short-range inter-proton distances in a protein (Fig. 1b). Compared to crystallography, which requires the protein in a crystalline state, solution NMR benefits from studying the protein in its native form, allowing the observation of protein conformational dynamics and flexibility [6]. NMR provides an alternative route to structure determination, especially for proteins difficult to crystallize, contributing ~11% of total PDB entries. It is also very informative in mapping ligand binding residues, by titration of the ligand onto the protein and analyzing chemical shifts in a heteronuclear single-quantum correlation (HSQC) spectrum. However, solution NMR consumes a considerable amount of radioactive isotope-labelled protein sample and time in the resonance assignment. There is also a size limit for proteins amenable to solution NMR measurement (<30 kDa), although this limit will continue to be pushed back by technological improvements [7].

Electron microscopy (EM) can determine macromolecular structures at medium to low atomic resolution, using single particle analysis where individual protein molecules are imaged in solution and the 3D structure is reconstructed by back projection of the 2D images (Fig. 1c) [8]. EM is useful in studying multiprotein supramolecular complexes, particularly when combined with crystallography studies of the protein components. This allows the fitting of the individual proteins, of which crystal structures were determined, into the molecular envelope of the intact complex as determined by EM, to understand their relative orientations within the complex. However the use of EM in small molecule development and understanding genetic variations is currently limited by the data resolution restraint.

1.2.2 Homology Modelling

Of the ~30,000 or so gene products predicted for the human genome, only around 15% have been structurally characterized by the experimental methods outlined above. For the many remaining proteins in human and other organisms, computational modeling continues to bridge the gap between known sequences and available structures. The method of comparative or homology modeling allows a structural model to be constructed for a target protein based on its similarity to one or more known structures [9], on the premise that proteins sharing similar sequences fold into similar 3D structures [10]. Today, a number of modeling programs are available (e.g. MODELLER, salilab.org/modeller), some developed as online servers where a sequence-to-structure process can be performed simply by a few clicks. Their popular usage can in part be reflected by the number of available protein models in online repositories such as SWISS-MODEL (swissmodel.expasy.org), Modbase (modbase.compbio.ucsf.edu) and Protein Model Portal (www.proteinmodelportal.org). Due to its popularity, homology modeling has played an influential role in functional annotation and drug discovery for many protein families, e.g. kinases and GPCRs (see examples in [11]).

Common to all modeling tools and servers is an overall four-step procedure (Fig. 1d): (1) given the sequence of the target protein, homologues with known 3D structures are identified; (2) a sequence alignment between the target protein and homologues assigns residue correspondence between sequences; (3) the alignment guides the model building of the target protein, using the homologue structure as template; (4) finally, the constructed model is subjected to refinement and validation of its stereo-chemical properties. In general, the accuracy of homology models depends heavily on the suitability of the template, with higher sequence homology between target and template resulting in less positional errors (as measured by root-mean-square deviations, rmsd, between their corresponding main-chain atoms). In practice, sequence identity cut-offs between 40% [12] and 70% [13] have been used to produce reliable models for understanding protein function and drug discovery (at a 60% identity level rmsd is usually <1 Å). Models derived from lower identity templates (<30%) often have higher main-chain and side-chain errors due to a poor quality sequence alignment with too many position gaps [14].

2 Protein Structure Analysis in Understanding Genetic Variations

2.1 Studying Diseases in the Next Generation Sequencing Era

The recent advent of next generation sequencing (NGS; also known as massively parallel sequencing) has progressed from the time- and cost-consuming Sanger sequencing models to much quicker and cheaper methods [15], and revolutionized our approaches to study the relationship between genotype and disease [16]. Making particular impact has been the use of exome sequencing (i.e. all exons in a genome) to investigate the genetic bases of rare Mendelian disorders with low and sporadic incidence in the population [17]. Its success stems partly from not being technically limited by small patient sample size, a major hurdle with conventional methods of disease gene discovery such as linkage analysis and homozygosity mapping. Today, exome sequencing has led to the discovery of new pathogenic variants and candidate genes for a number of genetic disorders (e.g. Miller syndrome [18], Freeman–Sheldon syndrome [19], Kabuki syndrome [20]), and has also offered opportunities to study complex polygenic diseases (e.g. diabetes, Alzheimer’s and heart disease) where susceptibility is affected by multiple genes with complex inheritance patterns.

NGS has therefore accelerated the rate of identifying variants in the human genome. An increasing emphasis is now placed on the effects of these variations on health and disease, although sieving through this huge volume of variant data is a laborious task. Most genetic variations occur at the single nucleotide level, represented as either single nucleotide polymorphisms (SNPs) if they have an incidence of >1% in the genome [21], or as rare variants with <1% occurrence. Rare variants, like SNPs, can be pathogenic (i.e. disease linked; often termed conveniently as mutations) or benign (i.e. not disease linked). Of particular importance to disease diagnostics are those SNPs and rare variants that lead to amino acid substitutions (missense variants) for two reasons. First, the contribution of missense variations to disease is much higher than the summation of all other variant types (e.g. frameshifts, insertions, deletions, splicing, nonsense), with 60–75% of Mendelian disorders caused by amino acid substitutions [22, 23]. Second, while the consequences of most nonsense, frameshifts and insertions/deletions are self-evident (e.g. resulting in truncated proteins), the effects of missense variations on protein function and stability are more subtle and difficult to predict. Structural information at the protein level is therefore needed to understand fully their molecular effects.

2.2 Structural Characterization of Missense Variations

While traditionally not a front-line method of analysis, protein structural information has increasingly been incorporated into bioinformatics and in silico methods to characterize missense variants and predict their pathogenicity at the molecular level. In the following subsections we outline several approaches of structure-guided investigation of missense variations and the lessons learnt from these studies.

2.2.1 Bioinformatics Predictors

Following the identification of genetic variants, the next indispensable step is to discriminate between pathogenic and benign variations. The sheer volume of genomic data, however, makes it too time-consuming and expensive to characterize every missense variant experimentally. To this end, numerous bioinformatics methods have been developed over the past decade to predict their molecular effects, and thus help prioritize a set of variants to be studied functionally. A number of excellent reviews on the available computational tools have been published recently ([2426] and references therein for programs described below). Many prediction tools are implemented as online servers, taking an input sequence, and applying various algorithms to sort and score mutations by their pathogenicity. Structure-based algorithms, which identify a structural match to the input sequence and analyze the contributions of the variant amino acid to protein structural properties such as electrostatics, inter-residue contacts, and steric effects [27], are increasingly incorporated into prediction servers. They serve as complementary approaches to the sequence-comparison programs (e.g. SIFT, Panther and PhD-SNP) that are based on the premise that disease-causing mutations are generally concentrated at conserved amino acids with critical roles in protein structure and function [28]. Nowadays, many prediction methods combine both structural information and sequence conservation to improve their prediction performance and accuracy (e.g. nsSNP Analyzer, PolyPhen-1/2, SNAP, SNP&GO and SNPs3D). An emerging trend is to utilize multiple sets of prediction programs and servers to increase confidence in interpreting the predictions, since different algorithms use different information and have their own strengths and weaknesses. Currently there is an urgent need for standard classification as well as unbiased and statistically-relevant comparisons among the various programs, an active area of bioinformatics research [29, 30].

2.2.2 In Silico Structural Analysis

Protein structure analysis offers a promising avenue to study the molecular consequences of missense variants, by revealing the atomic environment surrounding the mutation site in silico. The most direct approach is by experimental structure determination of the protein in its mutant form if it can be expressed recombinantly and purified. This, however, often proves difficult, in part due to unstable conformations of the mutant proteins that lead to their intracellular degradation. Indeed, three-quarters of disease-associated missense mutations are postulated to destabilize the protein as their primary functional defect [12, 31]. Therefore, although thousands of proteins involved in different biological pathways and functions have been structurally characterized, only a very small proportion of these structures represent proteins inclusive of a disease associated mutation. Recent structure examples falling into this category include the ryanodine receptors RyR1 and RyR2 [32], FGFR2 tyrosine kinase domain [33] and glycogenin GYG1 [34]. As structural determination of mutant proteins often proves intractable, the alternative is to “model” the missense variation onto the wild-type structural environment, by fitting the new amino acid side-chain into the substitution site. The modeled amino acid is inspected visually using molecular graphics software, such as PyMOL (Schrödinger, LLC.), Swiss-PDB viewer (Swiss Institute of Bioinformatics, Basel) and ICM (Molsoft, La Jolla), with particular attention paid to identifying the most acceptable side-chain conformation, from a library of allowed side-chain rotamers, that results in minimal steric clashes. This structure model is then subjected to refinement and energy minimization to yield an overall stabilized conformation.

The available mutant model, either from experimental methods or mutation modeling, can then be analyzed in silico to assess the impact of the amino acid substitution on a number of structural properties. These include possible changes in secondary structure elements, solvent accessibility, packing of neighbouring atoms and inter-atomic/inter-protein contacts, many of which can be examined using online tools ([24] and references therein). The in silico observations allow hypotheses about the molecular nature of the mutational defects to be made, and subsequently tested using a variety of biophysical and biochemical assay methods. Oligomeric state of the protein, for example, may be assessed by native gel electrophoresis, size-exclusion chromatography (SEC), analytical ultracentrifugation, or dynamic light scattering [35, 36]. Secondary and tertiary structure contents may be assessed by far-UV circular dichroism (CD) [37]. Protein unfolding may be monitored by chemical or thermal denaturing detected with far-UV CD or fluorescence [38]. Functional interactions with protein partners can be determined using co-immunoprecipitation followed by Western blot and SEC [39]; thermodynamics of protein binding to ligand or peptide can be determined via isothermal calorimetry (ITC) or surface plasmon resonance (SPR) [40]. Enzymatic catalysis and Michaelis–Menton kinetics can be measured if an assay specific to the protein of interest is available [33, 41]. The above list is non-exhaustive, as there are many options to examine every aspect of a protein’s functional properties in the laboratory. Regardless of the approach(es) chosen, however, it is important to compare the results obtained with the mutant protein against that of wild-type before interpretations are made. It is also important to complement in vitro observations with in vivo studies to comprehend fully the physiological consequences, for example by introducing the variants into the relevant cell lines or genetically engineered animal models.

2.3 The Structural “Rule-Book” Governing Missense Variations

In the following section we review examples illustrating how a structural analysis of the atomic environment surrounding the variant residues, complemented with biochemical and biophysical studies, can be used to attribute deleterious phenotypes to different molecular effects. Together, these examples allow us to formulate a set of “structural rules” to help predict the likely deleterious effects of a missense variation, and can serve as an important toolkit for clinicians and geneticists who need to assess the disease relevance for any newly-identified variations.

2.3.1 Disrupting Protein Fold and Architecture

2.3.1.1 Phenylketonuria as a Paradigm of Misfolding Diseases

Computational analysis of disease causing variations predicts that ~75% of mutations lead to protein destabilization, while only 7% directly affect biochemical function, suggesting that, for many monogenic diseases, a change in protein stability is the major contributor to disease pathology [12, 31, 42] and giving rise to the concept of misfolding diseases [43]. A classic example of a misfolding disease is phenylketonuria (PKU; OMIM 261600) caused by destabilizing mutations in phenylalanine hydroxylase (PAH). PKU is the most common inborn error of amino acid metabolism (incidence of ~1 in 15,000) with more than 500 deleterious mutations reported, 60% of which are missense mutations scattered across the polypeptide [44]. The majority of mutations result in enzyme forms with reduced stability and a propensity to aggregate, resulting in protein degradation and turnover [45]. To understand how these mutations lead to a misfolded state, the available crystal structures have served as excellent tools to scrutinize the atomic environment of the missense mutation sites [46, 47] and to correlate between genotypes and phenotypes [4850].

A common cause of destabilizing mutations is a structural perturbation to the protein core by a number of molecular mechanisms, depending on the nature of the original wild type and mutant residues. (1) Mutation of a large buried residue to a small one will create an unfavourable solvent cavity within the core, with larger cavities resulting in greater destabilization [51]. In PAH, mutations of buried phenylalanines (F39L, F55L, F372L), valines (V177A, V190A, V245A) and leucines (L255V, L348V) to smaller residues are commonly found (Fig. 2a). (2) The reverse of the above is also true. Mutations of small residues to large ones require the protein to accommodate bulky side-chains by disturbing the surrounding packing and secondary structure arrangements. Examples in PAH include a number of alanine-to-valine substitutions (A47V, A246V, A259V, A403V) (Fig. 2b). (3) Mutations of non-polar residues within a hydrophobic environment to polar residues may also destabilize a protein because of the thermodynamic penalties incurred on the unbonded polar group. These include mutations of isoleucine to serine (I94S) or threonine (I164T, I174T) or mutations of leucine to serine (L48S, L255S) (Fig. 2c). (4) Finally, mutations of polar and charged side-chains to hydrophobic ones may remove important stabilizing contacts (e.g. electrostatic or hydrogen-bonding interactions). This is especially true of arginines, such as Arg241 (R241C, R241H) and Arg252 (R252G, R252Q, R252W) in PAH (Fig. 2d).

Fig. 2
figure 2

Structure of human phenylalanine hydroxylase PAH. The tetrameric architecture of PAH (PDB code 2PAH) is shown with one of its monomer subunits coloured in green. Six regions of the PAH monomer are highlighted in panels af to illustrate the different molecular mechanisms that can govern a destabilizing missense mutation. These include (a) mutation of larger to smaller residues; (b) mutation of smaller to larger residues; (c) mutation of nonpolar to polar residues in hydrophobic core; (d) mutations of polar to nonpolar residues; (e) mutation of surface polar residues; and (f) mutations of residues involved in the oligomerization interface

In contrast to core residues, very few protein destabilizing mutations reside on the protein surface, as they can often be substituted with little effect [51]. However, there are exceptions if the mutation disrupts a hydrogen bond or electrostatic interaction at the surface (e.g. D84Y, R176L, R413P mutations in PAH) (Fig. 2e), or if the mutation affects the functional oligomeric state. To this end, PAH forms a tetramer, and mutations that interfere with its tetramerization, e.g. the single most common PKU mutation, R408W, which results in the loss of an inter-subunit hydrogen-bond (Fig. 2f), causes improper oligomeric assembly and hence reduces stability [47].

2.3.1.2 “Special” Residues: Glycine, Proline and Cysteine

Amino acids such as glycine, proline and cysteine often impart certain structural constraints on the protein, and their substitutions can be deleterious. Proline, with its cyclic side-chain, restricts the protein backbone conformations. Therefore, mutations to proline often distort the native backbone conformation, and interrupt the α-helix or β-sheet in which the mutated amino acid resides. The L166P mutation in the DJ-1 protein, located in the middle of helix α7 in its crystal structure (Fig. 3a), is one of the most deleterious missense mutations linked with early onset Parkinson’s disease. A combination of NMR, CD and molecular dynamics studies have shown that the L166P substitution causes DJ-1 to lose α-helical content and leads to global structural destabilization. Since helices α7 and α8 engage in numerous inter-molecular contacts, the mutant is also incapable of functional dimer formation [37].

Fig. 3
figure 3

Defective protein functions due to missense mutations. Where applicable, the site of mutation described in the text is coloured red. (a) Human DJ-1 protein (PDB code 1PDV). Two monomeric subunits (green, yellow) are shown. (b) Collagen-like peptide (1CAG) in a triple helix conformation. Glycine residues are shown in sticks. (c) Structure of Factor VIII C2 domain (1IQD) that is homologous to retinoschisin highlights the highly-conserved disulphide bond (Cys63–Cys219 in retinoschisin). (d) Structure of FGRF2 tyrosine kinase domain in wild-type (1GJO, white) and A628T mutant (3B2T, green). (e) Structures of human glycogenin-1 show that the conformational movement of lid α4 in the wild-type (3T7O, top) is forbidden in the T83M mutant protein (3RMW, below). (f) Aldolase B in the wild-type (1QO5, white) and A149P mutant protein (1XDM, green). (g) Spermine synthase (3C6K). The G56S mutation is located at the dimer interface (yellow, green). (h) Myosin MyoVIIa (green) in complex with SAN protein (yellow) (3PVL). (i) SDELIN protein (1H3Q) with the site of missense mutations disrupting transcription factor binding shown in red sticks

With the absence of a side-chain, glycine is the smallest of all amino acids and possesses conformational properties and freedoms inaccessible to other amino acids. Therefore, substitutions from glycine can be debilitating to protein stability and folding. The major structural component of skin, bone and tendons is type I collagen, where two α1 and one α2 protein chains are tightly packed in a heterotrimer. The intermolecular interface is mediated by many Gly-x-y sequence repeats from the three chains, forming a triple helix conformation that is essential to collagen structure and function (Fig. 3b). Many missense mutations substituting a single glycine to larger residues are known to cause the brittle bone disease osteogenesis imperfecta (OMIM 166200), with disease severity dependent upon the size of the mutant amino acid [52]. Another example involves a Gly-to-Asp mutation at the hairpin turn of glycogen phosphorylase, which causes glycogen storage disorder type VI [53].

Cysteine is unique among amino acids in its ability to form inter-residue disulphide bonds that are often critical to maintaining the protein fold. Therefore substitution of a cysteine involved in disulphide bond formation, or to a cysteine that yields an unnatural disulphide bond, may disrupt protein structure. Retinoschisin (RS), a photoreceptor and bipolar cell secreted protein, forms a large disulphide-linked multisubunit complex. At least 25% of the >125 known RS mutations result in the loss or gain of a cysteine and cause X-linked juvenile retinoschisis (OMIM 312700). A combined biochemical and modeling study showed that among the disease causing mutations, C142W and C219R resulted in the breakage of intra-subunit disulphide bonds (Cys110–Cys142 and Cys63–Cys219, respectively) (Fig. 3c), while C59S and C223R abolished an inter-subunit disulphide bond (Cys59–Cys223) [54], hence providing a molecular explanation to how these mutations lead to misfolded protein, defective subunit assembly and aberrant subcellular localization.

2.3.2 Disrupting Protein Functions

While less prevalent than destabilizing mutations, an amino acid substitution can lead to the specific loss, or diminishing, of a protein functional property, such as catalysis, protein–protein interactions, and oligomerization. A number of recent structure examples that are complemented with functional studies are described below.

2.3.2.1 Affecting Enzyme Catalysis

Mutations in the tyrosine kinase domain (e.g. A628T) of fibroblast growth factor receptor 2 (FGRF2) cause lacrimo-auriculo-dento-digital syndrome (OMIM 149730). Ala628 is a highly conserved residue in the active site catalytic loop. The crystal structure of FGFR2A628T mutant protein reveals that substitution of Ala628 to a more polar and bulky threonine residue alters the configuration of key residues in the active site that are involved in tyrosine substrate binding [33]. For example, the side-chain of Arg630 has been shifted 160° away (Fig. 3d) and cannot coordinate with the substrate. This observation is supported by activity assays showing weakened substrate binding and severely impaired tyrosine kinase activity [33].

A new form of glycogen storage disorder (GSD15; OMIM 613507) has recently been identified with genetic defects in glycogenin (GYG1), a glycosyltransferase that catalyzes the initiation of glycogen synthesis. The complete structural snapshots of GYG1 along its catalytic cycle have been provided by X-ray crystallography and show a substantial “lid” movement that closes the active site for catalysis [34]. The disease-linked mutation T83M incorporates a bulky Met side-chain into the mobile “lid” region and prevents the essential movement, as revealed in the mutant protein structure (Fig. 3e). As a result, the glycosyltransferase activity of GYG1T83M is completely abolished.

2.3.2.2 Disruption of Quaternary Structure

Hereditary fructose intolerance (OMIM 229600) is caused by mutations in aldolase B, the most prevalent being A149P. The mutant protein structure shows that the A149P substitution disrupts the β-strand element at the mutation site, abolishes a salt-bridge at the adjacent Glu148 residue (Fig. 3f) and also produces a distal effect causing disorder in the 110–129 loop at the dimer–dimer interface [55]. This offers an explanation as to why the mutant protein exists as a solution dimer, and cannot form the homotetramer essential for its catalysis [35]. This study also nicely elucidates the long-range structural perturbations caused by a single amino acid substitution, an observation which would not have been elucidated by modeling a mutant side-chain onto the wild-type structure.

Genetic defects in spermine synthase (SMS), an enzyme converting spermidine to spermine, cause the X-linked disorder Snyder–Robinson Syndrome (OMIM 309583). The crystal structure of SMS reveals that the protein is a homodimer, with the G56S disease mutation lying close to the dimeric interface (Fig. 3g). Any side-chain incorporated at this position is postulated to protrude towards the opposite subunit and disrupt dimer stability, a hypothesis supported by native gel analysis showing the absence of dimer formation in the mutant protein [36].

2.3.2.3 Disruption of Protein–Protein Interaction

Mutations in the myosin protein MyoVIIa, part of a complex network of proteins in the stereocilia of the inner ear, cause syndromic deaf-blindness (OMIM 276900). A recent structural determination of the MyTH4-FERM tandem domain of MyoVIIa in complex with its protein binding partner Sans reveals that the Glu1349 mutation site on MyoVIIa forms direct interaction with Sans (Fig. 3h), and as a result a single E1349K substitution is responsible for a 20-fold reduction in binding affinity towards Sans, as measured by ITC [40].

Four missense mutations on the SDELIN protein, a subunit of the endoplasmic reticulum Transport Protein Particle complex, are known to cause the X-linked rare bone disorder spondyloepiphyseal dysplasia tarda (OMIM 313400). Three of these mutations (S73L, F83S and V130D) are located in a hydrophobic pocket (Fig. 3i) that is proposed to function as a binding site for transcription factors such as MBP1, PITX1 and SF1, on the basis of the SDELIN crystal structure. Yeast two-hybrid studies have confirmed that these three mutations indeed resulted in a loss of protein–protein interactions [56].

2.3.3 Hot Spot Regions

In addition to visualizing the atomic environment of individual mutation sites, as detailed in Sects. 2.3.1 and 2.3.2, structure analysis can also be employed at the whole protein level, for instance, to map all known variations onto the protein 3D structure and identify “hot spot” regions that harbour a high frequency of missense variations. Hot spot mapping can provide insight into phenotype–genotype relationship of mutations in a 3D structural context, and assist in disease diagnosis, for example, by focusing screening efforts on selected mutation-prone regions instead of over an entire gene, most of which may harbour no known mutations. Hot spot mapping can also help generate new conclusions about protein functions and evolutionary mechanisms such as mutability and selection pressure of different mutations by illustrating which regions of a protein can tolerate amino acid variations and which regions are intolerant. A classic example of hot spot identification is with the most commonly mutated cancer gene, TP53 (p53). In p53, an overwhelming majority of its somatic missense mutations are clustered into a loop-sheet-helix region of the DNA-binding domain (Fig. 4a) [57]. These mutations generally disrupt the DNA binding interface and hence mutant proteins are defective in sequence-specific DNA binding [58].

Fig. 4
figure 4

Structure mapping of mutation hot spots. Mutation sites are shown in either red sticks or spheres. (a) p53 central domain in complex with DNA (PDB code 1TSR); (b) ryanodine receptor type 1 N-terminal domain (3HSM); and (c) yeast Cbf5 structure that is homologous to human dyskerin (3U28). The eukaryote-unique N-terminal extension (loops 1 and 3) is coloured blue

More recent examples of hot spot mapping can also be found in the literature. Mutations on the ryanodine receptors RyR1 and RyR2 (cf. Sect. 2.2.2) that lead to skeletal muscle disorders are concentrated in a highly basic loop (Fig. 4b) and have been found not to affect protein stability, but rather to disrupt the protein–protein or domain–domain interface [32, 59]. In another example, 15 missense mutation sites on dyskerin, the catalytic subunit of the Box H/ACA ribonucleoprotein particles, have been identified to cause a bone marrow failure called X-linked dyskeratosis congenita (OMIM 305000). The recently determined structure of the yeast homologue Cbf5 reveals that these mutations are all located in a 32-residue N-terminal extension (Fig. 4c) that forms an additional layer to the well-characterized RNA-binding PUA fold, a structural feature not found in archaea, and may function in protein–protein binding [60]. Within our group, we have mapped 55 known missense mutations causing fumarate hydratase deficiency (OMIM 606812) onto its human protein structure [61] and identified two hot spot regions, one clustering around the active site and the other affecting intra- and inter-subunit interactions. To aid further investigation by interested doctors/researchers, the online version of this article is accompanied with a web-based molecular viewer, allowing the reader to navigate the hot spot regions and each individual mutation along the protein landscape, in an interactive manner [62].

2.3.4 Lessons from Large-Scale Structural “Catalogues”

Taking advantage of the rapidly growing genomic and structural data, there are now efforts being made to catalogue missense variants on a large scale using vast protein datasets. Some of these efforts have focused on disease relevant protein families, such as kinases. For example, Lahiry et al. [63] used structural analysis of kinase mutations to correlate their locations on the protein to disease states. They observed that: (1) neutral mutations/polymorphisms, those that did not tend to cause disease, generally clustered in the C-terminal regions of the catalytic core, a region thought to have a basic structural role; (2) germline disease causing mutations, which cause metabolic disorders or loss-of-function developmental disorders, tended to cluster in the catalytic core in sites involved in regulation and substrate binding, as well as in protein–protein and allosteric interactions; (3) cancer causing somatic mutations were concentrated around the ATP binding and catalytic residues, directly influencing catalysis and resulting in the activation of oncogenes or deactivation of tumour suppressors.

Other studies have employed large datasets of missense variations spanning different protein families in order to detect any trends, consensus or “rules” which dictate whether certain types of amino acid changes will result in neutral polymorphisms or pathogenic mutations [64, 65]. These large-scale studies have generally arrived at the conclusion that pathogenic variants are more likely located in solvent-buried core regions, conserved residue positions, residues that contribute hydrogen bonds and those that alter more dramatically the physico-chemical properties of amino acids [64, 65]. Khan et al. [66] also looked at the distribution and frequency of pathogenic variations and found that arginine and glycine are the most mutated residue types, while overall mutability (i.e. the likelihood of being introduced in missense variations) is highest for cysteine and tryptophan. Using similar approaches, Hurst et al. [65] found that mutations of glycine, cysteine and tryptophan were more likely to be pathogenic than others, confirming results from previous small-scale studies. They also provided online access to their large database of structurally-mapped missense variations (www.bioinf.org.uk/saap/db). Taken together, these proteome-wide structure-based mutation analyses will continue to help us formulate better rules for our prediction of whether an uncharacterized amino acid variation will be pathogenic or not, thereby improving disease diagnostics in the future.

3 Protein Structure Analysis in Drug Development

3.1 Structural Biology and Target-Centric Drug Development

The opportunities presented from the post-genome era have also transformed rapidly the field of drug development. We are now made aware of the unprecedented number of potential therapeutic proteins, estimated in one study as reaching 10% of the predicted coding regions in the human genome (i.e. ~3,000 proteins) [67]. On the other hand, the current FDA-approved drugs target only a small number (~300) of human proteins or proteins from other pathogenic organisms [6870]. This has made a fundamental impact in the direction of biomedical research in steering towards a more target-centric approach to bridge this gap. The main focus in this approach is to identify therapeutically-relevant drug targets that meet the double criteria of being disease-linked i.e. it has a causative role in the onset and/or progression of a disease, and being druggable i.e. it can be bound and modulated by a small molecule.

At the same time, the current field of drug development is facing tremendous challenges with ever-increasing research and development costs (reaching in some estimates up to $2 billion per drug [71]) and high attrition rate along the entire pipeline [72], where many potential projects fail through the early stages of hit identification and optimization to lead. As a result, the pharmaceutical industry is under continuous pressure to look for novel, high confidence disease targets and alternative drug design approaches. Amenable to the target-centric approach while having potentials in addressing some of the challenges in the pharmaceutical industry, the field of structural biology has been playing an increasing role in drug development, particularly at the early stages (“drug discovery”). Today, using structures to identify new lead compounds and as a basis for rational drug design is an integral part of many a drug development project.

3.2 Early Structural Applications in Lead Optimization

Before the technological advances in the past decade that have made protein structure determination faster and more cost-effective, the use of structure information in drug development in the 1980s and early 1990s has been confined to the lead optimization stage, in directing the chemical alterations of initial compound hits to improve their affinity, potency and selectivity. In this process, the protein structure of interest is determined in complexes with lead compounds identified from a high throughput screening (HTS) campaign, accomplished either by co-crystallizing the protein solution pre-incubated with the lead molecule, or by soaking pre-formed crystals of the apo protein with the ligand solution. The determined structure of the protein–ligand complex reveals the modes of interaction between the protein and ligand at the atomic level, e.g. short-range interactions such as hydrogen bonds, salt bridges, and hydrophobic contacts, the distances between the various interacting groups and atoms, and the presence of water molecules at the protein–ligand interaction site. This information is used to guide further iterative rounds of chemistry optimization and protein–ligand structure determination to establish a structure–activity relationship.

The first marketed drug developed via this structure-based approach was captopril, an inhibitor for angiotensin converting enzyme for the treatment of hypertension and congestive heart failure. This drug was designed in the mid-1970s on the basis of the homologous carboxypeptidase A protein which had been structurally characterized at the time [73]. Today, structure-based design approaches have delivered drugs to the market for a wide range of diseases, including retroviral [74, 75], glaucoma [76], influenza [77, 78] as well as cancer [79, 80] (Fig. 5). With advances, particularly in crystallography, the timeframe of protein structure determination is now sufficiently short to be amenable for many other stages of the drug discovery pipeline. As a result, the tools of structural analysis that were traditionally used in lead optimization are now being exploited to assist the processes of target identification, assessment of target druggability, and hit identification (Fig. 6), as outlined below.

Fig. 5
figure 5

Examples of structure-based drug design. FDA-approved drugs that have been derived from structure-based approaches. For each drug, its generic name, chemical structure, protein targeted and disease area applied is shown (references in the main text). CML chronic myeloid leukaemia; EGFR epidermal growth factor receptor

Fig. 6
figure 6

Modern day structure-guided drug discovery. Protein structure analysis is nowadays incorporated into all early stages of drug development, including (a) target identification; (b) assessing binding site druggability; (c) hit identification, and (d) lead optimization. Example shown is from the chemical probe program at the Structural Genomics Consortium to develop small molecule binders for the family of histone-binding bromodomains (cf. [93] in main text)

3.3 Use of Structures in Target Identification

An early consideration in the target-centric approach of drug discovery is to identify and prioritize therapeutically-important proteins in the genome. Obtaining structural information at this stage, in the apo- or relevant liganded states of the protein, is an important milestone in target identification. High resolution atomic structures of many therapeutic targets are now available in the public domain, including kinases (e.g. AMPK [81]), viral proteins (influenza polymerase [82]), cytochrome P450 [83], metabolic enzymes (acetyl-CoA carboxylase [84]) and G-protein coupled receptors (β1-adrenergic receptor [85]). This unprecedented wealth of structure information helps establishing sequence–structure–function relationship and assessing potential ligand-binding capabilities, and is now part of the essential toolkit to complement in vivo target validation experiments (e.g. RNA interference screens, animal models, gene knockouts). The increase in available structures in the PDB also spurs the development of computational methods combining sequence and structural information to probe biological functions [86].

It is with the technological advances in structural biology that the field of “structural genomics” (SG) was born, to determine systematically 3D structures of proteins encoded in a genome primarily by crystallography and NMR. The overall objectives are to provide a structure coverage of the “protein universe” [87], to help define protein functions that cannot be predicted from sequences alone [88], and to facilitate the discovery, as well as selection, of genomic targets for drug therapy [89]. A number of large-scale SG efforts have emerged over the past 10 years, including RIKEN in Japan (www.riken.co.jp), Structure Proteomics in Europe (SPINE, www.spineurope.org), the Structural Genomics Consortium based in UK, Sweden and Canada (SGC, www.thesgc.com), as well as the Protein Structure Initiative in USA (PSI; www.nigms.nih.gov/psi). While sharing similar high-throughput methodologies and open access policy to their data, these SG initiatives differ in their scope and criteria for their target selection. A number of SG initiatives (e.g. PSI, RIKEN) aim to explore the novel protein folds that cannot be predicted from sequence [90], and subsequently leverage structure completeness of a genome by homology modeling of the remaining homologous proteins. Other SG programs take a biology-driven avenue, placing the emphasis more on medical relevance. For example, the Tuberculosis Structural Genomics Consortium [91] adopts an organism-based approach focusing on the obligate human pathogen Mycobacterium tuberculosis. To date nearly 10% of all proteins from the pathogen have been structurally characterized [92], which unravel a number of previously unannotated proteins as potential anti-tuberculosis targets. The human proteome-focused SGC studies protein families with therapeutic importance such as kinases, phosphatases and metabolic enzymes [93, 94]. These studies reveal structure–function relationships between family members with regards to active site and substrate specificity (Fig. 6a), and emphasize their application to develop member/family-specific chemical probes and inhibitors [95].

3.4 Use of Structures in Assessing Druggability

3.4.1 Binding Site Detection

With the structural information of potential therapeutic targets made available, the next step in drug discovery is the identification of binding sites that are receptive to small molecule binding (Fig. 6b). The large repertoire of protein–ligand complexes in the PDB has provided a structural view of a ligand binding site to be a small pocket or invagination on the protein, accessible to the surface exterior, where ligands can fit to mediate a biological function. This pocket should harbour amino acid side-chains that contribute to hydrogen bonds and hydrophobic contacts. Based on these concepts, a number of pocket identification software have been developed to detect binding sites on the protein structure, adopting two general approaches (see [96] and references therein). The geometry-based methods (e.g. SURFNET, LIGSITE) look for geometrically-complex regions on the protein as natural binding sites tend to be concave surface invaginations. The probe/energy-based methods (e.g. GRID, AutoLigand, ICM) calculate the interaction energy between a probe molecule and protein at different point locations to define regions with favourable interaction energies.

A thorough understanding of the binding pocket space helps not only to assess its potential for drug binding but also to annotate functionally under-characterized proteins (i.e. de-orphanization). For example, delineating residues involved at the ligand binding site can stimulate site-directed mutagenesis experiments to probe their catalytic or regulatory roles. Structural characterization of binding sites also reveals the ligand-induced conformational changes on the protein target, which can range from small side-chain adjustment to whole-domain rearrangement [97]. Binding pockets are therefore not static sites as revealed in a structural snapshot, but dynamic regions important for the protein function. The conformational plasticity of the ligand binding sites needs to be addressed during structural analysis and drug design.

3.4.2 Druggability Index

The next step following pocket detection is an evaluation of whether it has the shape and chemical complementarity to accommodate high-affinity, drug-like molecules. This likelihood prediction of drug binding (“druggability”) is crucial to target selection in drug discovery, with the hope of screening out unlikely candidates at an early stage. The emerging concept of protein druggability [98] is an extension to the “drug-likeness” rule-of-five for small molecules that attributes good oral bioavailability of drug compounds to certain favourable physico-chemical parameters [99]. Research groups are developing tools to predict druggability and quantify it in a “druggability index” using different structure-based metrics. Some correlate druggability with hit rates obtained from NMR screening of small fragments [100], whereas others base their predictions on binding affinity calculations [101] or on comparison of binding sites between different proteins/families that bind the same ligand to identify hot spot residues [102]. Druggability indices are especially useful in identifying non-native small molecule binding sites such as between protein–protein interaction surfaces [103].

3.5 Use of Structures in Hit Identification

A therapeutic protein that satisfies the criteria of disease linkage and druggability can enter the pipeline of a drug discovery program to identify hit compounds that bind the target and exert an effect. Traditionally this has been achieved by HTS [104]. In this approach a vast library collection of physically available compounds, accumulated by large pharmaceuticals over many years of research, isolated from natural sources or synthesized from combinatorial chemistry, is experimentally tested on the protein target using a high-density assay that measures either binding to or biochemical modulation of a protein. The aim is to identify compounds with IC50 values better than, e.g. 10 μM for further hit-to-lead optimization. The power of HTS relies on the implementation of a robust and sensitive assay and the interrogation of a vast compound collection, both requirements consuming significant resources in materials, time and manpower. Its success in generating hits also depends upon target classes, robustness of the assay and propensity to deliver false positives [105]. With these challenges under consideration, novel approaches continue to be explored as complement to the HTS method in hit discovery. In particular, in silico methods exploiting structural information of the binding pocket space are being widely explored. To this end, three structure-based approaches, namely virtual screening, de novo design and fragment-based screening, are gaining promise and are nowadays incorporated into almost every drug discovery project (Fig. 6c).

3.5.1 Virtual Screening

Virtual screening (VS) is often considered as the computational alternative to the classic HTS, hence its alias “virtual HTS” (see [106] and references therein). VS interrogates large chemical libraries in silico, often available as public compound databases, to predict their binding mode and affinity towards the protein structure. The prediction is based on docking calculations and generally involves two steps. First, every compound in the library is individually placed onto the protein pocket to generate different conformations and orientations (“poses”) by sampling through the pocket space, taking into account ligand and protein flexibility at the pocket. Second, the binding modes between target and the ligand in its different poses are evaluated by a scoring function, and subsequently ranked to identify binding hits from the highest-scoring ligands and poses.

A rigorous scoring function is crucial to a VS campaign, so that it allows proper enrichment of true compound hits among the top ranking scores. Many scoring functions are developed, taking into account the interaction energies between ligand and protein (“force field-based”) or statistical observations from experimentally derived protein–ligand structures with the basic premise that true hits share common protein–ligand interactions (“knowledge-based”). Nowadays a variety of docking software is available (e.g. DOCK, GOLD, AutoDock; see [106, 107] and references therein), each incorporated with different scoring functions. Current challenges in the docking tools include the need to improve scoring function accuracy, to take into account the various protonation, tautomerization and ionization states of compounds, and to predict ligand-induced protein conformations [107].

The strength of the structure-based VS approach is attributable to its capability to screen large databases (e.g. millions) of compounds with minimal computational power, more quickly and less expensively than HTS. Successful VS examples in hit identification include the development of EGFR inhibitors towards cancer cells [108], cysteine protease inhibitors of the SARS virus [109], and dihydroorotate dehydrogenase (DHODH) inhibitors towards rheumatoid arthritis [110]. The approaches of VS and HTS, with their mechanistic parallels, can also complement each other and have been applied side-by-side on the same drug development project [111] to facilitate hit identification.

3.5.2 De Novo Ligand Design

Structural knowledge of the binding pocket space can also guide the building of novel lead compounds from scratch [112, 113]. This de novo approach of drug design is not constrained by the known chemical structures from existing compounds, opening up the possibility of developing novel chemotypes [114]. The most common strategy is receptor/target-based de novo design, using a priori structural information of the target protein and its binding pocket. In this strategy, small building blocks (known as seeds or fragments) are positioned onto key interaction regions within the pocket, either by computational docking (as in Sect. 3.5.1), or recently by experimental methods such as crystallography and NMR (see Sect. 3.5.3). Each fragment can then be extended towards the neighbouring available space to build a lead compound that matches the binding pocket sterically and electrostatically (“growing” approach). Alternatively, multiple fragments bound independently at different but proximal regions of the pocket can be assembled into a lead compound using linker scaffolds (“linking” approach). A number of de novo drug design projects have yielded potential compound hits. For example, Heikkila et al. [115] exploited a species-specific hydrophobic ligand pocket on the Plasmodium falciparum DHODH protein to design potent parasite-specific compounds with an IC50 value of 43 μM. Ni et al. [116] developed inhibitors for the peptidylprolylisomerase cyclophilin A with IC50 values of 31.6 nM, with potentials as immunosuppressive agents. An important caveat of de novo design is that it often generates complex ligands with poor synthetic accessibility and pharmacokinetic properties. This is being addressed by software development to place emphasis on generating drug-like, synthetically-possible compounds.

3.5.3 Structure Based Fragment Screening

The X-ray and NMR methods of structure determination have also played a crucial role in a paradigm fragment-based screening approach [117]. Its premise is to screen experimentally hundreds to thousands of small compounds (usually between 100–300 Da in size) in order to identify low-affinity fragments (K d in high μM range) that bind to different regions of the binding pocket, as a starting point for hit optimization. The subsequent optimization of fragment hits into a single hit compound can be rationalized, as in the de novo method, by the “growing” and “linking” processes. The concept of starting with small fragments is an appealing alternative to the conventional HTS attempts, with a number of merits. Fragments with their relatively small sizes and low complexity have been shown to provide higher hit rates than larger drug-like compounds from conventional screens [118], and can be optimized more efficiently. Fragments also allow a broader, more efficient sampling of the chemical space using a much smaller set of compounds (e.g. 100 fragments are equivalent to a 1,000,000 combinatory library) [114].

The relative weak binding of fragments (e.g. ~100 μM to 10 mM against target protein), which may be missed by a conventional HTS assay, can be experimentally determined by crystallography, NMR and other biophysical methods such as surface plasmon resonance [119, 120]. With inherently higher hit rates and the likelihood of multiple binding modes for a fragment hit, it is necessary to have its binding mode characterized from crystallography or NMR to allow hit-to-lead compound design. In particular, crystallography with its low-cost, high-throughput implementation is well attuned to fragment-based screening, allowing fast structure determination of protein-fragment complexes (Fig. 6d). A recent survey showed that 15 selective and potent inhibitors generated from fragment-based screening entered the phase I or II clinical trials. Examples include inhibitors for matrix metalloproteinase [121], aurora kinase [122], cyclin-dependent kinase 2 [123] and peroxisome proliferator-activated receptor [124]. An excellent update on fragment screening success examples across industry and academia was recently published [125].

4 Conclusion, Challenges and Future Perspectives

Over the past decade, the field of protein structural biology has responded to the challenging demands presented in the post-sequencing era by two revolutionary accomplishments. It has attained technological advances in the methods of structure determination in order to streamline the gene-to-structure process in a parallel, automated and miniaturized platform. Protein structures are now being solved by numerous academic and industrial research groups worldwide, on a daily or weekly basis. Structural biology has also broadened its scientific impact, successfully transforming itself from mere providers of structure information into an essential toolkit for molecular geneticists in the characterization and understanding of diseases, and for medicinal chemists to assist all stages of the drug discovery process. With its continuing scientific contribution and technical improvements, structural biology is more ready now than ever to offer promise in some of the biological areas that have so far proven difficult (Sects. 4.1 and 4.2), and to open up new exciting avenues for its applications (Sect. 4.3).

4.1 Studying Protein–Protein Interactions

A myriad of cellular processes are mediated by protein–protein interactions (e.g. in signaling, metabolism, cellular structure and transport), often requiring the formation of multiprotein macromolecular machineries. A mechanistic understanding of these biological processes therefore requires an examination of the protein complexes at the molecular level. Experimental methods such as X-ray crystallography, NMR and EM are now being used to complement biochemical and biophysical methods such as yeast two-hybrid, immuno-precipitation and fluorescence resonance energy transfer, to understand these interactions better. However, complex structure determination remains challenging as compared to its single protein counterpart, and often requires systematic mapping and delineation of the interacting region to obtain co-purified and co-crystallized complexes [126, 127]. This substantial investment in time and effort is reflected by the number of protein complex structures in the PDB being only one-sixth of single protein structures. In the absence of co-crystal structures, in silico methods serve as a promising alternative to generate complex structure models by protein–protein docking and homology modeling, and will continue to attract considerable attention and research due to their comparative ease of use [128]. The identification of druggable protein–protein interactions that participate in diseases also represents an exciting avenue in drug discovery. Targeting a protein–protein interface for small molecule modulation is often considered less tractable than conventional single protein targets, due to the large interacting surface and less pocket-like features. Nevertheless, over the years a number of protein–protein interaction inhibitors have been developed, assisted by available structural information of both protein–protein complexes and of individual proteins (e.g. interaction partners of interleukin IL-2, B-cell lymphoma 2 Bcl-XL and human papilloma virus transcription factor E2; [129] and references therein), and some are now entering clinical trials.

4.2 The High Hanging-Fruits of Membrane Protein Structures

In addition to multiprotein complexes, many classes of disease-associated and therapeutically important proteins remain refractory to the current methods of structure determination. Particularly in mind are the integral membrane proteins, such as the family of G-protein coupled receptors (GPCR) that are predicted targets for ~30–50% of marketed drugs [68], and hence a major focus in pharmaceutical research. However, due to intrinsic difficulties with membrane protein crystallization, understanding GPCR structure and function has largely been achieved by homology modeling approaches. Recent structural breakthroughs, e.g. in the use of heterologous expression systems and in engineering mutations to stabilize proteins for crystallization [130], have brought the current number of available GPCR structures to six, an important increase, yet still in stark contrast to the total number of GPCRs predicted in the human genome (>900). Nevertheless, the structure determination over the past few years of a few highly-relevant GPCR drug targets (e.g. β1- and β2-adrenergic receptors [85, 131], A2A adenosine receptor [132], chemokine receptor CXCR4 [133] and dopamine D3 receptor [134]) has provided hope for structure-based methods to be applied routinely in GPCR drug discovery. These new structures offer promising opportunities for in silico compound screening and docking, and provide a diversity of available templates for homology modeling which, until the day that routine membrane protein crystallization has arrived, will continue to play a key role in leveraging structural coverage for this protein family.

4.3 Combining Mutation Analysis and Drug Design: Pharmacological Chaperones

An excellent example of combining the structural applications in mutation analysis and small molecule design is found in the emerging field of pharmacological chaperone therapy (PCT), a paradigm approach to treat inherited diseases that affect enzyme stability and function, such as phenylketonuria (cf. Sect. 2.3.1) and lysosomal storage disorders [135]. PCT involves the use of small molecules, often active site inhibitors or substrate mimics of the native protein, to stabilize mutant enzymes suffering from folding and trafficking defects. A great deal of ground work and proof-of-principle studies has incorporated structural information in order to establish the molecular basis of disease mutations and to identify those chaperone-responsive mutations with potential for PCT. To this end, a small-molecule screening effort to identify stabilizing therapeutic agents to treat PKU has already yielded two promising compounds for PAH stabilization [136]. Recently, crystal structures for a number of lysosomal hydrolases (e.g. β-hexosaminidase B [137] and acid β-glucosidase [138]) have been determined in complexes with pharmacological chaperones identified from chemical screening, to provide atomic insights into their modes of stabilization. The structure determination itself of these lysosomal enzymes is no small feat due to their heavily-glycosylated nature. The stage is now set for a systematic, structure-assisted approach in developing the next generation of chaperone compounds into clinical applications. The current work additionally reveals the potential of PCT as a general strategy to treat a wide range of rare genetic diseases, many of which are being unraveled by the year, and illustrates how structural biology has suitably positioned itself within the translational approach from bench to clinic.