Introduction

Autosomal dominant congenital cataract (ADCC) is the most common form of inherited cataracts.1 Albeit advances in surgical management and effective visual rehabilitation of most cataract patients, early diagnosis is important to optimize visual outcome in cataract surgeries.2

Linkage and candidate gene studies have found high genetic heterogeneities in ADCC,3, 4, 5, 6 for which at least 18 associated genes have been identified. To exclude known mutations and map novel mutations in all ADCC disease-causing genes using conventional gene screening strategy requires laborious laboratory work, and it is impractical for genetic diagnosis and testing in clinical practice. Exome sequencing has recently been successfully applied in mapping disease-causing genes and mutations for Mendelian diseases and tumors.7, 8 By using exome capture chips followed by next-generation sequencing, the technique exhaustively sequences and identifies potential structural or functional variants in the exome.

In the current study, we attempt to evaluate a new approach of ADCC genetic study using exome sequencing. The whole exome was sequenced for one proband from a pedigree with autosomal dominant congenital nuclear cataract. By successfully identifying a recurrent mutation in the proband, the current study demonstrated a new, rapid, and cost-effective approach to identify disease-causing mutations in ADCC.

Materials and methods

Sample selection and clinical examination

This study was approved by the Ethics Committee of Joint Shantou International Eye Center and was conducted in accordance with the Declaration of Helsinki. Written consent was obtained from each participating subject after explanation of the nature of the study.

A Han Chinese family with autosomal dominant congenital nuclear cataract (Figure 1) and 200 unrelated controls with mild senile cataracts were collected at the Joint Shantou International Eye Center, Shantou, China. Comprehensive ophthalmic examinations were performed in all participants. Photographs of the anterior segment were taken from ADCC patients.

Figure 1
figure 1

A Chinese pedigree with congenital nuclear cataract. Filled squares and circles denote affected males and females, respectively. Normal individual are shown as empty symbols. The arrow denotes the proband, with his photo of slit-lamp examination shown below.

Exome sequencing of the proband

Peripheral blood was collected from the proband (III-8) and other family members and all senile cataract controls. Genomic DNA was extracted by using the QIAmp Blood kit (Qiagen, Hilden, Germany). Genomic DNA (3 μg) was used to perform exome sequencing by using service provided by Axeq Technologies, Inc. (Rockville, MD, USA). Whole exome of the proband was captured by Illumina TruSeq Exome Enrichment Kit (Illumina, San Diego, CA, USA), and was used to prepare libraries and cluster using Illumina HiSeq 2000, and sequenced with paired-end 100 base length configuration.

Bioinformatics analysis of exome data

The reads are mapped against UCSC hg19 Human Reference Genome (http://genome.ucsc.edu/) by using BWA (http://bio-bwa.sourceforge.net/). The single-nucleotide variations (SNVs) and Indels are detected by SAMTOOLS (http://samtools.sourceforge.net/), and previously known and reported variants were identified and filtered using dbSNP 135 (http://www.ncbi.nlm.nih.gov/snp/) and 1000 genome project (http://www.1000genomes.org) data. A step-by-step filtering strategy was used to select candidates of disease-causing mutations. Functional annotation was performed using ANNOVAR (http://www.openbioinformatics.org/annovar/), and coding consequence predictions for nonsynomyoums SNVs and indels were done using Sorting Tolerant From Intolerant (SIFT; http://sift.bii.a-star.edu.sg/index.html), PolyPhen (http://genetics.bwh.harvard.edu/pph/), and Condel (http://bg.upf.edu/fannsdb/). A total of 18 previously reported ADCC disease-causing genes were selected to analyze: night crystallin genes, including αA-crystallin (CRYAA),9 αB-crystallin (CRYAB),10 βA1-crystallin (CRYBA1),11 βA4-crystallin (CRYBA4),12 βB1-crystallin (CRYBB1),13 βB2-crystallin (CRYBB2),14 γC-crystallin (CRYGC),15 γD-crystallin (CRYGD),16 and γS-crystallin (CRYGS);17 one cytoskeletal protein gene: beaded filament structural protein 2 (BFSP2);18 three membrane protein genes: gap junction protein alpha 3 (GJA3),19 gap junction protein alpha 8 (GJA8),20 and major intrinsic protein of lens fiber (MIP);21 three growth and transcription factor genes, including heat-shock transcription factor 4 (HSF4),22 paired-like homeodomain 3 (PITX3),23 and v-maf avian musculoaponeurotic fibrosarcoma oncogene homolog (MAF);24 and two other genes, including chromatin modifying protein-4B (CHMP4B),25 and Ephrin receptor A2 (EPHA2).26 For splice site variants, indels, and nonsynomymous SNVs predicted to be functionally deleterious, a comprehensive literature search was conducted to identify known disease-causing variants of ADCC. Candidate mutations in the 18 ADCC genes that were identified by bioinformatic analysis were selected for further validation. The same strategy was applied to analyze additional 212 genes that are recorded in the Cat-Map database (http://cat-map.wustl.edu/),27 which contained >20 known genes identified for non-syndromic cataract, >130 genes and loci for syndromic forms of cataract, and >70 mouse mutants with cataract.

Variant validation using Sanger sequencing

Genomic sequences of the genes containing candidate variants selected by bioinformatic analysis were obtained from the NCBI reference sequence database (http://www.ncbi.nlm.nih.gov/refseq). Primers designed accordingly by Primer 3 were used to amplify coding exons containing the candidate variants and their flanking regions (GJA8-1F 5′-AATGTGGTGGACTGCTTCGT-3′ and GJA8-1R 5′-GGCAGTGTCTCTTGGTAGCC-3′). The PCR amplification was performed as previously described.28, 29 Bidirectional sequencing of PCR products was performed using the BigDye Terminator Cycle Sequencing v3.1 kit (ABI, Foster City, CA, USA) and the 3130xl Genetic Analyzer (ABI) following the protocol suggested by the manufacturer. Sequence alignment and analysis of variations were performed using NovoSNP.30

Results

Statistics of exome sequencing data

The total sequencing depth in the proband is shown in Table 1. In all, 87.6% target region was covered with ≥10 × sequencing depth, and 94.20% was covered with at least 1 × sequencing depth. The median and mean read depths of target regions were 51.0 × and 54.9 × , respectively, which should ensure high quality of SNV and indel calling.

Table 1 Summary of exome sequencing data in the proband

SNVs and indels in known ADCC disease-causing genes

Exome sequencing identified a total of 19 347 coding SNV and 475 coding indels in the proband (Table 2). All of the exons in the 18 ADCC disease-causing genes in the proband were covered in the current study. A total of 48 coding SNVs and no coding indel were observed in the known ADCC genes. Among these variants, a previously reported heterozygous missense mutation c.773C>T (p.S258F) was observed in exon 2 of GJA8, which was predicted by bioinformatics analysis to be functionally deleterious.20 The mutation was further confirmed in the proband by direct sequencing. It was also found in all other affected family members but was absent in either unaffected family members or 200 unrelated senile cataract controls. It thus completely co-segregated with the ADCC disease phenotype (Figure 2). Another nonsynonymous SNV c.49G>A (p.V17M) in exon3 of HSF was also observed in the proband. However, it was not detected in other affected family members and hence did not show co-segregation with the disease phenotype. We have also further analyzed additional 212 genes from the Cat-Map database that had been reported to be associated with cataract phenotype in humans or animals in the literature. No other variant was found to be co-segregated with the ADCC disease phenotype (Supplementary Table S1).

Table 2 Filtering of SNVs and indels in the proband by bioinformatics analysis and experimental validation
Figure 2
figure 2

Sanger sequencing confirms the disease-causing mutation in GJA8 that was identified by exome sequencing. The upper panel shows the chromatography of unaffected family members or senile cataract controls; the lower panel shows the chromatography of the heterozygous mutation c.773C>T (p.S258F) in exon2 of GJA8.

Discussion

In this study, we used an exome sequencing approach to screen mutations in a pedigree with ADCC. We performed exome sequencing in the proband of the pedigree and successfully identified a previously reported recurrent disease-causing mutation and thus demonstrated the robustness and potential application of exome sequencing in clinical molecular diagnosis of ADCC.

By sequencing the whole exome of one proband in each ADCC pedigree, the approach was able to screen all coding variants in 18 ADCC disease-causing genes in a rapid and cost-effective way. In the current study a heterozygous missense mutation in GJA8 was identified, demonstrating the potential application of exome sequencing in the molecular diagnosis of ADCC. One important limiting factor in clinical application of molecular diagnosis is cost. Exons constitute approximately 1% of human genome, and coding variants in these genomic regions are more likely to be functionally relevant and probably affect phenotypes or diseases, especially for Mendelian diseases such as ADCC. By targeting these specific regions in the human genome, exome sequencing provides a more cost-effective method to map genetic variants with large effect, compared with whole-genome sequencing. The cost of next-generation sequencing has been reduced dramatically recently, especially for exome sequencing, which allows possible applications of exome sequencing in molecular diagnosis. The cost of exome sequencing with 100 × average sequencing depth can be less than one tenth of that of whole genome sequencing with 30 × average sequencing depth.31 Therefore, with a fixed budget, exome sequencing can achieve higher sequencing depth in functionally relevant genomic region and thus significantly increases power of detection of disease-causing mutations. Another limiting factor of molecular diagnosis methods is turnaround time. In our study, the single sequencer (Illumina HiSeq 2000) generated 600 million pair-end reads of length 100 in about 11 days.32 This sequencing process can probably be shortened to about 17 h when using the new model Illumina HiSeq 2500 to do the whole-exome sequencing.33 By focusing on 18 ADCC genes, the completion of bioinformatics analysis could be controlled within only 5 h. The total turnaround time is feasible for clinical molecular diagnosis.

Congenital cataract is of high genetic heterogeneities. To exclude known mutations and map novel mutations in all ADCC disease-causing genes using such conventional strategy requires extensive laboratory work and budget, and it is especially impractical for genetic diagnosis and testing in clinical practice. In addition, in the current study we focused on 18 genes that were previously reported to cause ADCC. The panel can be extended to include additional new disease-causing genes of ADCC. Our findings also showed that by using a high sequencing depth strategy exome sequencing of only one proband or affected family member was able to detect the underlying disease-causing mutation in an ADCC pedigree. The coverage of target regions (with ≥10 × sequencing depth) was 87.6%; the average sequencing depth was 111 × ; and the average sequencing depth on target region was 51 × . In general, an average sequencing depth ≥100 × could ensure high coverage of target regions and high quality of SNV and indel calling. Our data showed that all 18 ADCC genes were covered in the exome sequencing. Bioinformatics analysis could be very powerful in identifying candidate variants in these genes, which could be quickly validated in ADCC pedigrees to confirm the possible co-segregation with the disease phenotype. The approach can also be useful in search for unknown ADCC genes. If no disease-causing mutations are found among these genes in the affected members of the pedigree, the results implicated novel disease-causing genes in the ADCC pedigrees.

In the current study, we proposed and evaluated an approach of ADCC genetic study to sequence whole exome of one proband from a pedigree with autosomal dominant congenital nuclear cataract. By successfully identifying a known mutation in the proband, the current study demonstrated a rapid and cost-effective approach to map disease-causing mutations in ADCC.