Abstract
Genetic susceptibility to colorectal cancer is caused by rare pathogenic mutations and common genetic variants that contribute to familial risk. Here we report the results of a two-stage association study with 18,299 cases of colorectal cancer and 19,656 controls, with follow-up of the most statistically significant genetic loci in 4,725 cases and 9,969 controls from two Asian consortia. We describe six new susceptibility loci reaching a genome-wide threshold of P<5.0E−08. These findings provide additional insight into the underlying biological mechanisms of colorectal cancer and demonstrate the scientific value of large consortia-based genetic epidemiology studies.
Similar content being viewed by others
Introduction
The estimated lifetime risk of colorectal cancer (CRC) is 5.2% for men and 4.8% for women in the United States1. The narrow-sense heritability estimates based on twin and family studies of CRC range from 12 to 35% (refs 2, 3). Although several genome-wide association studies (GWAS) of CRC have successfully identified common single-nucleotide polymorphisms (SNPs) associated with CRC risk4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21, a large fraction of the heritability still remains elusive22. Our GWAS combines data from four large CRC consortia, the Colorectal Cancer Transdisciplinary (CORECT) Study, the Colon Cancer Family Registry (CFR), the Molecular Epidemiology of Colorectal Cancer (MECC) Study and the Genetics and Epidemiology of Colorectal Cancer Consortium (GECCO) to elucidate previously undiscovered susceptibility loci for CRC. The current meta-analysis identifies novel genome-wide significant risk regions at 3p14.1, 3p22.1, 10q24.2, 12q24.12, 12q24.22 and 20q13.13.
Results
Study Populations and Population Stratification
Data for this discovery analysis focuses on individuals of the European ancestral heritage from North America, Australia and Europe. Our discovery analysis includes 19 observational studies genotyped with high-density SNP arrays and imputed to the 1,000 Genomes Project March 2012 reference panel23,24 (Supplementary Table 1). We employ an inverse-variance-weighted fixed-effects meta-analysis of study-specific logistic regression results after filtering data for quality control (QC). Quantile–quantile plots show no appreciable evidence of population stratification for the meta-analysis (Supplementary Fig. 1) or by the individual discovery studies (Supplementary Fig. 2) before and after adjustment for principal components (PCs), and the sample size-corrected marginal lambda (equivalent to 1,000 cases and 1,000 controls) measures 1.003 in the discovery meta-analysis. The PC plots for ancestry indicate no difference between cases and controls in the respective discovery GWAS studies (Supplementary Fig. 3).
Confirmation of Prior Studies and Discovery
We evaluate the quality and effectiveness of our study design and analytic methods by assessing previously reported CRC susceptibility loci. We replicate the results for 41 of the 47 (P<0.05, unadjusted for multiple testing) published autosomal susceptibility variants for CRC (Supplementary Table 2). We turn our attention to the discovery of new susceptibility loci (Supplementary Fig. 4) by investigating the top 200 independent loci detected in the European discovery phase (Supplementary Table 3) in two separate East Asian consortia. Overall, our combined meta-analysis across European and Asian studies discovers six new susceptibility loci reaching a statistical threshold of P<5.0E−08: at chromosome 3p22.1 (rs35360328), 3p14.1 (rs812481), 10q24.2 (rs11190164), 12q24.12 (rs7137828), 12q24.22 (rs73208120) and 20q13.13 (rs6066825; Table 1; Supplementary Table 4). The odds ratios (ORs) across these six loci indicate a range of a 9 to 16% increase in the odds of developing CRC per risk allele, similar to previously reported CRC susceptibility loci. A seventh susceptibility locus tagged by rs4946260 at 6q22.1 approaches genome-wide significance (P=6.27E−08). The ORs are consistent across populations and genotyping platforms as shown by chi-square tests for heterogeneity, with only one locus showing marginally significant heterogeneity (rs6066825/20q13.13, Phet=0.04).
Replication in Asian Populations
Forest plots for the six genome-wide significant loci show that the risk alleles identified in the European populations replicate broadly across the Asian populations even though allele frequencies differ substantially (Fig. 1). Two SNPs were not available for replication in the Asian studies because they are rare in Asians.
Genomic Location and Candidate Genes
Several of the six susceptibility SNPs fall within regions harbouring genes known to be involved in the pathogenesis of CRC (Supplementary Fig. 5). Rs35360328 and a corresponding tagSNP at 3p22.1 (rs35364139, r2=0.8, P=1.7E−07) lie in an intergenic region within ∼300 kb of CTNNB1, the gene that encodes β-catenin. β-Catenin is a key member of the WNT signalling pathway and is commonly mutated in CRC development25,26. There are no histone marks in the vicinity of either rs35360328 or rs35364139 in any colon-derived cells in the publicly available ENCODE chromatin immunoprecipitation (ChIP)-seq tracks, making these unlikely to be the functional SNPs in this region (Supplementary Fig. 5). However, there are 26 other SNPs in linkage disequilibrium (LD) with rs35364139 (r2>0.5, CEU population), which may disrupt biofeatures or regulatory elements resulting in the observed CRC risk. Together, the physical proximity of this newly identified susceptibility locus, relevant functional biology and adjacent regulatory marks suggest that CTNNB1 is an intriguing candidate target gene of a putative enhancer.
The second locus on chromosome 3 is located at 3p14.1 (rs812481) and is intronic of LRIG1, a gene encoding a transmembrane protein that interacts with epidermal growth factor receptor-family tyrosine kinase family members27,28,29. LRIG1 has recently been described as a marker of quiescent colon crypt stem cells activated to proliferate following injury30. No histone marks are found in the vicinity of rs812481, making it unlikely to be the functional SNP. Notably, rs3856595 (P=2.4E−07), in LD (r2>0.5, CEU population) with rs812481, is located in a LRIG1 intronic active enhancer peak (H3K27ac4) in sigmoid colon epithelium. A second SNP in LD with rs812481 is rs231276 (P=2.0E−06), which resides in an H3K4me1 enhancer peak in a CRC cell (HCT-116). This peak is intronic of SLC25A26, a mitochondrial transport protein.
The SNP at 10q24.2 (rs11190164) lies in a genomic region containing multiple genes including SLC25A28, ENTPD7, COX15, CUTC and ABCC2. Several SNPs in high LD with rs11190164 map to putative enhancers, promoters or 3′ UTRs of genes within the region. A recent study identified rs1035209, 6.3 kb upstream from rs11190164 (CEU r2=0.4), to be significantly associated with CRC risk17. In addition, rs3740078 (distance to rs11190614=93,887 bp, r2=0.71, CEU population; P=3.2E−05) causes a synonymous change in the coding sequence of ENTPD7. While ENTPD7 has been linked to intestinal epithelial inflammation in mice and is expressed in normal colonic epithelium31, a role in CRC has not been previously reported.
Rs3184504 at 12q24.12 implicates SH2B3 as a putative target gene for CRC susceptibility. SH2B3 is an adaptor protein involved in cytokine signalling and functions as a classic tumour suppressor gene in B-precursor acute lymphoblastic leukaemia that increases STAT3 phosphorylation32. Less is known about its signalling roles in the colon, but rs3184504 is a missense variant (Trp262Arg) that is a known risk allele for coeliac disease and other immune-related disorders33 and is a well-established risk factor for type 1 diabetes34 and hypertension35. Several other SNPs in LD with rs3184504 also map to putative regulatory regions, but further work is needed to functionally characterize this missense variant or these other SNPs. Other genes within this region, including CUX2, BRAP and ACAD10 are also potential candidate genes.
The SNP at 12q24.22 (rs73208120) is independent of rs3184504 at 12q24.12 (r2=0.002, CEU population) and lies intronic of NOS1. NOS1 encodes neuronal nitric oxide synthase 1 that generates nitric oxide a reactive free radical involved in several biologic processes, including inflammation, infection and antimicrobial and antitumoral activities36. There are several SNPs in LD with rs73208120, but none map to the candidate enhancer regions.
The SNP at 20q13.13 (rs6066825) lies within an intron of the PREX1 gene that encodes the Rac-guanine nucleotide exchange factor P-Rex1, a signalling protein involved in cell migration and invasion in some cell types37. There are 35 SNPs in LD with rs6066825 (r2>0.5, CEU population), all intronic or immediately downstream of PREX1. The most promising functional candidates are three SNPs, rs2092492 (r2=0.62, CEU population), rs6066823 (r2=0.62, CEU population) and rs6066825 itself that lie within a putative active enhancer marked by an H3K27ac ChIP-seq peak in sigmoid colon tissue.
Discussion
In conclusion, the combined meta-analysis of 52,649 individuals facilitated the discovery of six new susceptibility loci for CRC. Additional CRC loci remain to be discovered despite the large sample sizes included in our discovery meta-analysis. Although replication of suggestive loci from the discovery phase in similar ancestral populations would be more powerful due to LD and effect allele frequency differences, this study identified six novel CRC risk loci. This study identified opportunities to explore new biologic mechanisms for predisposition to CRC and the potential for translation into improved risk prediction for populations of diverse ancestral heritage.
Methods
Our initial GWAS combined data from three large CRC consortia, the CORECT Study, the CFR, the MECC and the GECCO to elucidate previously undiscovered susceptibility loci for CRC. Data for this discovery analysis focused on individuals of European ancestral heritage from North America, Australia and Europe. Detailed methods are described in the Supplementary Methods. In brief, samples from 19 observational studies genotyped with high-density SNP arrays and imputed to the 1,000 Genomes Project March 2012 reference panel24 contributed to the discovery meta-analysis. Replication of the top 200 independent SNPs was performed in two additional consortium studies from Asian populations. The studies included in the discovery and replication phases are listed in Supplementary Table 1.
Discovery phase genotyping and QC
The details on study design and characteristics for each study and substudy in the discovery phase are provided in the Supplementary Methods. In brief, the discovery phase consisted of four CRC consortia. The CORECT consortium coordinated genotyping and analysis of six observational studies of CRC for the present analysis: (1) MECC2, (2) CFR2, (3) Kentucky case–control study, (4) American Cancer Society CPS II nested case–control study, (5) Melbourne nested case–control study and (6) Newfoundland case–control study. Genotyping as part of CORECT was conducted using a custom Affymetrix genome-wide platform (the Axiom CORECT Set) with ∼1.3 million SNPs and insertions and deletions (indels) on two physical genotyping chips (pegs). In the MECC1 study, germline DNA was extracted from peripheral blood samples and genotyped in two batches using the Illumina HumanOmni 2.5–8 BeadChip, which measures nearly 2.4 million SNPs and indels. Batch 1 (414 cases and 155 controls) was run at the Case Western Reserve University and batch 2 (104 cases and 376 controls) was run at the University of Michigan. Germline DNA for the CFR1 study was extracted from peripheral blood samples and genotyped in two batches using three different platforms—the Illumina Human1M or Human1M-Duo (CFR1-Set1) and the Illumina HumanOmni1-Quad (CFR1-Set 2)—each containing ∼1.2 million SNPs and indels. Genotype data were cleaned based on QC metrics at the individual subject and SNP levels. Samples with <95% call rate, sex mismatches (between self-reported and genotypic predicted sex), low concordance with previous genotype data, duplicate samples, unanticipated genotype concordance, identity-by-descent with another sample or ethnic outliers as identified by visual inspection of PCA cluster plots were removed. Before imputation, SNPs with <95% call rate, concordance <95% with 1000 Genomes in samples genotyped for QC, or Hardy–Weinberg equilibrium P<10−4 in controls were excluded. All SNPs overlapping 1000 Genomes were matched to the forward strand.
The GECCO consortium consists of 13 studies. Details are provided in Supplementary Methods and Supplementary Table 1. In brief, DNA was extracted from blood samples or from buccal cells, using conventional methods. Phase one genotyping was done using either Illumina HumanHap 550K, 610K or combined Illumina 300 and 240K, Affymetrix platforms18, Illumina HumanCytoSNP or Illumina HumanOmniExpress. All studies included 1 to 6% blinded duplicates to monitor quality of the genotyping. All individual-level genotype data were managed and underwent QA/QC at the Ontario Institute for Cancer Research, the University of Washington or at the Fred Hutchinson Cancer Research Center. Details on the QA/QC have previously been described12. In brief, samples were excluded based on call rate, heterozygosity, unexpected duplicates, gender discrepancy and unexpectedly high identity-by-descent or unexpected genotype concordance (>65%) with another individual. All analyses were restricted to samples clustering with the Utah residents with Northern and Western European ancestry from the CEPH collection (CEU) population in PC analysis, including the HapMap II populations as reference. SNPs were excluded if they were triallelic, not assigned an rs number, or were reported or observed as not performing consistently across platforms. In addition, genotyped SNPs were excluded based on call rate (<98%), lack of Hardy–Weinberg equilibrium in controls (P<1 × 10−4) and minor allele frequency (MAF) (<5% in Set 1 for PLCO, WHI, DALS and OFCCR; minor allele count<10 for remaining studies).
Imputation
To meta-analyse genotype data generated from multiple platforms and to increase the coverage of variation that is measurable across the genome, imputation of genotypes was performed for both autosomal (all consortia) and X chromosome (excluding GECCO consortium) markers. Imputing missing genotypes for study samples based on the cosmopolitan panel of reference haplotypes from Phase I of the 1,000 Genomes Project (March 2012 release; n=1,092; (ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/release/20110521))23,24 helps improve imputation accuracy of low-frequency variants38. The target panel was phased using Beagle39 (GECCO) or SHAPE-IT40 (CORECT, MECC1 and CFR1) and the phased target panel was imputed to the 1000 Genomes reference panel using either Minimac41 (GECCO) or IMPUTE2 (ref. 42) (CORECT, MECC1 and CFR1). Genetic markers retained following imputation had to pass stringent imputation quality and accuracy filters before entering the analysis phase. For GECCO, Rsq was used as the imputation quality measure for imputed SNPs43, and SNPs were excluded at different Rsq thresholds based on their MAF: for SNPs with MAF>0.01, we excluded those with Rsq≤0.3; for MAFs of 0.005–0.01, we excluded Rsq<0.5; and for MAF<0.005, we excluded Rsq<0.99. In the remaining studies (CORECT, MECC1 and CFR1) stringent imputation quality and accuracy filters (info ≥0.7, certainty ≥0.9, concordance ≥0.9) were applied between directly measured and imputed genotypes after masking input genotypes (for genotyped markers only) to enter the analysis phase. Further, we restricted the SNP list to those with study-specific MAF≥1%.
Statistical analysis
We utilized PC analysis to assess correspondence between self-reported and genotypic classification of ancestry including unrelated HapMap CEU, YRI and ASN samples as population controls. Ancestral outliers were identified by visual inspection of PC plots for each study and removed. PCs were computed and used for ancestry adjustment. Study-specific association estimates (OR and 95% CI) were obtained employing logistic regression of CRC on allelic dosage adjusting for ancestry and potential confounding variables (for example, age, sex and study site) as defined by the individual studies (Supplementary Methods). The genomic control factor (λ) was estimated by dividing the median χ2-statistic by 0.456. A sample size-corrected marginal λ, equivalent to studying 1,000 cases and 1,000 controls, was also calculated. Heterogeneity of genetic effects by study was assessed using Cochran’s Q test for heterogeneity (Phet).
Replication phase
The replication phase was conducted in two Asian consortia (Asian 1 and Asian 2). The Asian Colorectal Cancer Consortium (ACC), Asian 1, consisted of five studies with genome-wide scan data: Shanghai CRC study 1 (Shanghai-1); Shanghai CRC Study 2 (Shanghai-2); Guangzhou CRC Study (Guangzhou); Aichi CRC Study 1 (Aichi-1), and the Korean Cancer Prevention Study-II CRC (KCPS-II). Samples in these studies were genotyped using Affymetrix and Illumina SNP arrays for GWAS (Supplementary Methods)10,44,45,46,47,48. A uniform QC protocol (call rates, concordance rates, cryptic relatedness, sex misidentification and ancestry) to filter samples and SNPs was applied10. Imputation was performed with the GIANT ALL data panel from the 1,000 Genomes Project phase 1 release v3 as the reference using program MACH v1.0 (ref. 43) and minimac41. SNPs with imputation R2>0.7 in each of the five studies were included in the final analysis. Associations between SNPs and CRC risk were evaluated based on the log-additive model using mach2dat43. Per-allele ORs and 95% confidence intervals (CIs) were derived from logistic regression models, adjusting for age, sex and the first ten PCs when appropriate. Association analysis was conducted for each participating study separately and a fixed-effects meta-analysis was conducted to obtain summary results with the inverse-variance method using program METAL49.
The Asian 2 consortium was genotyped using the Illumina 1M-duo Array and consisted of studies from the Multiethnic Cohort (MEC; N=3,094), CFR (N=285), Colorectal cancer study on Oahu, Hawaii (CR2 & 3; N=134), Fukuoka, Japan (N=1,411), Nagano, Japan (N=207) and the Japan Public Health Center-based prospective study (JPHC; N=1,293) after QC filtering50,51,52,53,54. In general, all genotyped samples were examined and excluded according to the following: (1) call rates <90, 95 or 97% depending on the batches, (2) missing on basic covariates (age, sex or disease status), (3) gender mismatch, (4) ethnicity outliers and (5) relatedness (≥2nd degree). Prediction of untyped or partly genotyped SNPs was performed with BEAGLE 3.3 (ref. 39) using the 1,000 Genomes Project (phase 1, release 3) East Asians as reference panels. Imputation was performed with all cases and controls combined. Markers with MAF<0.005 in reference panels were excluded from imputation. Study-specific association statistics were obtained using logistic regression models adjusted for ancestry and potential confounding variables (Supplementary Methods). A fixed-effects meta-analysis was conducted to obtain summary results with the inverse-variance method using programme METAL49.
All study samples were collected with written informed consent, and procedures were approved by the Human Research institutional review boards (IRBs) of the respective institutions. Specifically, the University of Southern California Health Sciences IRB approved all elements of the CORECT, CFR and MECC studies. The MECC study protocol was also approved by the IRBs at the University of Southern California, University of Michigan, and Carmel Medical Center (Haifa). The Fred Hutchinson Cancer Research Center IRB approved the GECCO contribution. The Asian 1 consortia study protocols were approved by the review board of the Vanderbilt University Medical Center and informed consent was obtained from all study participants. Study protocols of the Asian 2 consortia were approved by the University of Hawaii Human Studies Program and University of Southern California IRB, the IRB in the National Cancer Center, Japan and the Ethics Committee of Kyushu University Faculty of Medical Sciences.
Meta-analysis
A consortia-wide meta-analysis for the discovery and replication phases using fixed-effect models with inverse variance weighting was implemented in METAL. Heterogeneity was evaluated using Cochran’s Q test for heterogeneity and the measure I2. Graphical representation of effect estimates and CIs by study and consortia are presented using forest plots.
Additional information
How to cite this article: Schumacher, F. R. et al. Genome-wide association study of colorectal cancer identifies six new susceptibility loci. Nat. Commun. 6:7138 doi: 10.1038/ncomms8138 (2015).
Change history
26 October 2015
The original version of this Article contained typographical errors in the spelling of the authors Sébastien Küry, Edward L. Giovannucci and Mathieu Lemire, which were incorrectly given as Sebastian Kury, Edward L.. Giocannucci and Mathiew Lemire. This has now been corrected in both the PDF and HTML versions of the Article.
References
Siegel, R., Ma, J., Zou, Z. & Jemal, A. Cancer statistics, 2014. CA Cancer J. Clin. 64, 9–29 (2014).
Czene, K., Lichtenstein, P. & Hemminki, K. Environmental and heritable causes of cancer among 9.6 million individuals in the Swedish Family-Cancer Database. Int. J. Cancer. 99, 260–266 (2002).
Lichtenstein, P. et al. Environmental and heritable factors in the causation of cancer--analyses of cohorts of twins from Sweden, Denmark, and Finland. N. Engl. J. Med. 343, 78–85 (2000).
Broderick, P. et al. A genome-wide association study shows that common alleles of SMAD7 influence colorectal cancer risk. Nat. Genet. 39, 1315–1317 (2007).
Cui, R. et al. Common variant in 6q26-q27 is associated with distal colon cancer in an Asian population. Gut 60, 799–805 (2011).
Dunlop, M. G. et al. Common variation near CDKN1A, POLD3 and SHROOM2 influences colorectal cancer risk. Nat. Genet. 44, 770–776 (2012).
Houlston, R. S. et al. Meta-analysis of three genome-wide association studies identifies susceptibility loci for colorectal cancer at 1q41, 3q26.2, 12q13.13 and 20q13.33. Nat. Genet. 42, 973–977 (2010).
Houlston, R. S. et al. Meta-analysis of genome-wide association data identifies four new susceptibility loci for colorectal cancer. Nat. Genet. 40, 1426–1435 (2008).
Jaeger, E. et al. Common genetic variants at the CRAC1 (HMPS) locus on chromosome 15q13.3 influence colorectal cancer risk. Nat. Genet. 40, 26–28 (2008).
Jia, W. H. et al. Genome-wide association analyses in East Asians identify new susceptibility loci for colorectal cancer. Nat. Genet. 45, 191–196 (2013).
Peters, U. et al. Meta-analysis of new genome-wide association studies of colorectal cancer risk. Hum. Genet. 131, 217–234 (2012).
Peters, U. et al. Identification of genetic susceptibility loci for colorectal tumors in a genome-wide meta-analysis. Gastroenterology 144, 799–807 e724 (2013).
Tenesa, A. et al. Genome-wide association scan identifies a colorectal cancer susceptibility locus on 11q23 and replicates risk loci at 8q24 and 18q21. Nat. Genet. 40, 631–637 (2008).
Tomlinson, I. et al. A genome-wide association scan of tag SNPs identifies a susceptibility variant for colorectal cancer at 8q24.21. Nat. Genet. 39, 984–988 (2007).
Tomlinson, I. P. et al. Multiple common susceptibility variants near BMP pathway loci GREM1, BMP4, and BMP2 explain part of the missing heritability of colorectal cancer. PLoS Genet. 7, e1002105 (2011).
Tomlinson, I. P. et al. A genome-wide association study identifies colorectal cancer susceptibility loci on chromosomes 10p14 and 8q23.3. Nat. Genet. 40, 623–630 (2008).
Whiffin, N. et al. Identification of susceptibility loci for colorectal cancer in a genome-wide meta-analysis. Hum. Mol. Genet. 23, 4729–4737 (2014).
Zanke, B. W. et al. Genome-wide association scan identifies a colorectal cancer susceptibility locus on chromosome 8q24. Nat. Genet. 39, 989–994 (2007).
Zhang, B. et al. Large-scale genetic study in East Asians identifies six new loci associated with colorectal cancer risk. Nat. Genet. 46, 533–542 (2014).
Schmit, S. L. et al. A novel colorectal cancer risk locus at 4q32.2 identified from an international genome-wide association study. Carcinogenesis 35, 2512–2519 (2014).
Wang, H. et al. Trans-ethnic genome-wide association study of colorectal cancer identifies a new susceptibility locus in VTI1A. Nat. Commun. 5, 4613 (2014).
Jiao, S. et al. Estimating the heritability of colorectal cancer. Hum. Mol. Genet. 23, 3898–3905 (2014).
Abecasis, G. R. et al. A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073 (2010).
Abecasis, G. R. et al. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012).
Cancer Genome Atlas Network. Comprehensive molecular characterization of human colon and rectal cancer. Nature 487, 330–337 (2012).
Lawrence, M. S. et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499, 214–218 (2013).
Laederich, M. B. et al. The leucine-rich repeat protein LRIG1 is a negative regulator of ErbB family receptor tyrosine kinases. J. Biol. Chem. 279, 47050–47056 (2004).
Miller, J. K. et al. Suppression of the negative regulator LRIG1 contributes to ErbB2 overexpression in breast cancer. Cancer Res. 68, 8286–8294 (2008).
Shattuck, D. L. et al. LRIG1 is a novel negative regulator of the Met receptor and opposes Met and Her2 synergy. Mol. Cell. Biol. 27, 1934–1946 (2007).
Powell, A. E. et al. The pan-ErbB negative regulator Lrig1 is an intestinal stem cell marker that functions as a tumor suppressor. Cell 149, 146–158 (2012).
Kusu, T. et al. Ecto-nucleoside triphosphate diphosphohydrolase 7 controls Th17 cell responses through regulation of luminal ATP in the small intestine. J. Immunol. 190, 774–783 (2013).
Perez-Garcia, A. et al. Genetic loss of SH2B3 in acute lymphoblastic leukemia. Blood 122, 2425–2432 (2013).
Zhernakova, A. et al. Evolutionary and functional analysis of celiac risk loci reveals SH2B3 as a protective factor against bacterial infection. Am. J. Hum. Genet. 86, 970–977 (2010).
Barrett, J. C. et al. Genome-wide association study and meta-analysis find that over 40 loci affect risk of type 1 diabetes. Nat. Genet. 41, 703–707 (2009).
Levy, D. et al. Genome-wide association study of blood pressure and hypertension. Nat. Genet. 41, 677–687 (2009).
Lirk, P., Hoffmann, G. & Rieder, J. Inducible nitric oxide synthase—time for reappraisal. Curr. Drug Targets Inflamm. Allergy 1, 89–108 (2002).
Campbell, A. D. et al. P-Rex1 cooperates with PDGFRbeta to drive cellular migration in 3D microenvironments. PLoS ONE 8, e53982 (2013).
Howie, B., Marchini, J. & Stephens, M. Genotype imputation with thousands of genomes. G3 1, 457–470 (2011).
Browning, S. R. & Browning, B. L. Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am. J. Hum. Genet. 81, 1084–1097 (2007).
Delaneau, O., Howie, B., Cox, A. J., Zagury, J. F. & Marchini, J. Haplotype estimation using sequencing reads. Am. J. Hum. Genet. 93, 687–696 (2013).
Howie, B., Fuchsberger, C., Stephens, M., Marchini, J. & Abecasis, G. R. Fast and accurate genotype imputation in genome-wide association studies through pre-phasing. Nat. Genet. 44, 955–959 (2012).
Howie, B. N., Donnelly, P. & Marchini, J. A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet. 5, e1000529 (2009).
Li, Y., Willer, C. J., Ding, J., Scheet, P. & Abecasis, G. R. MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes. Genet. Epidemiol. 34, 816–834 (2010).
Abnet, C. C. et al. A shared susceptibility locus in PLCE1 at 10q23 for gastric adenocarcinoma and esophageal squamous cell carcinoma. Nat. Genet. 42, 764–767 (2010).
Amundadottir, L. et al. Genome-wide association study identifies variants in the ABO locus associated with susceptibility to pancreatic cancer. Nat. Genet. 41, 986–990 (2009).
Bei, J. X. et al. A genome-wide association study of nasopharyngeal carcinoma identifies three new susceptibility loci. Nat. Genet. 42, 599–603 (2010).
Nakata, I. et al. Association between the SERPING1 gene and age-related macular degeneration and polypoidal choroidal vasculopathy in Japanese. PLoS ONE 6, e19108 (2011).
Jee, S. H. et al. Adiponectin concentrations: a genome-wide association study. Am. J. Hum. Genet. 87, 545–552 (2010).
Willer, C. J., Li, Y. & Abecasis, G. R. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics 26, 2190–2191 (2010).
National Cancer Institute, Division of Cancer Epidemiology and Genetics. Cancer Genetic Markers of Susceptibility (CGEMS) Project: Executive Summary. Available at <http://dceg.cancer.gov/research/how-we-study/genomic-studies/cgems-summary> (2009).
Yeager, M. et al. Genome-wide association study of prostate cancer identifies a second risk locus at 8q24. Nat. Genet. 39, 645–649 (2007).
Petersen, G. M. et al. A genome-wide association study identifies pancreatic cancer susceptibility loci on chromosomes 13q22.1, 1q32.1 and 5p15.33. Nat. Genet. 42, 224–228 (2010).
Landi, M. T. et al. A genome-wide association study of lung cancer identifies a region of chromosome 5p15 associated with risk for adenocarcinoma. Am. J. Hum. Genet. 85, 679–691 (2009).
Lan, Q. et al. Genome-wide association analysis identifies new lung cancer susceptibility loci in never-smoking women in Asia. Nat. Genet. 44, 1330–1335 (2012).
Acknowledgements
CORECT: this work was supported by the National Cancer Institute, National Institutes of Health under RFA # CA-09-002, NIH/NCI U19 CA148107. The content of this manuscript does not necessarily reflect the views or policies of the National Cancer Institute or any of the collaborating centres in the CORECT consortium, nor does mention of trade names, commercial products or organizations imply endorsement by the US Government or the CORECT Consortium. ASTERISK: we are very grateful to Dr Bruno Buecher without whom this project would not have existed. We also thank all those who agreed to participate in this study, including the patients and the healthy control persons, as well as all the physicians, technicians and students. DACHS: we thank all participants and cooperating clinicians, and Ute Handte-Daub, Renate Hettler-Jensen, Utz Benscheid, Muhabbet Celik and Ursula Eilber for excellent technical assistance. GECCO: we thank all those at the GECCO Coordinating Center for helping bring together the data and people that made this project possible. HPFS, NHS and PHS: we acknowledge Patrice Soule and Hardeep Ranu of the Dana-Farber Harvard Cancer Center High-Throughput Polymorphism Core who assisted in the genotyping for NHS, HPFS and PHS under the supervision of Dr Immaculata Devivo and Dr David Hunter, Qin (Carolyn) Guo and Lixue Zhu who assisted in programming for NHS and HPFS and Haiyan Zhang who assisted in programming for the PHS. We thank the participants and staff of the Nurses' Health Study and the Health Professionals Follow-Up Study, for their valuable contributions as well as the following state cancer registries for their help: AL, AZ, AR, CA, CO, CT, DE, FL, GA, ID, IL, IN, IA, KY, LA, ME, MD, MA, MI, NE, NH, NJ, NY, NC, ND, OH, OK, OR, PA, RI, SC, TN, TX, VA, WA, WY. In addition, this study was approved by the Connecticut Department of Public Health (DPH) Human Investigations Committee. Certain data used in this publication were obtained from the DPH. We assume full responsibility for analyses and interpretation of these data. PLCO: we thank Drs Christine Berg and Philip Prorok, Division of Cancer Prevention, National Cancer Institute, the Screening Center investigators and staff or the Prostate, Lung, Colorectal and Ovarian (PLCO) Cancer Screening Trial, Mr. Tom Riley and staff, Information Management Services Inc., Ms Barbara O’Brien and staff, Westat Inc. and Drs Bill Kopp, Wen Shao and staff, SAIC-Frederick. Most importantly, we acknowledge the study participants for their contributions for making this study possible. The statements contained herein are solely those of the authors and do not represent or imply concurrence or endorsement by NCI. PMH: we thank the study participants and staff of the Hormones and Colon Cancer study. WHI: we thank the WHI investigators and staff for their dedication, and the study participants for making the program possible. A full listing of WHI investigators can be found at https://cleo.whi.org/researchers/Documents%20%20Write%20a%20Paper/WHI%20Investigator%20Short%20List.pdf. ACC: we thank all study participants and research staff of all studies for their contributions and commitment to this project, Regina Courtney for DNA preparation and Jing He for data processing.
Author information
Authors and Affiliations
Contributions
S.B.G., G.C. and U.P. contributed to the study concept and design. L.L.M. and W.Z. organized the Asian 1 and Asian 2 consortia. D.J.V.D.B. supervised the genotyping of samples at USC and C.K.E. led the quality control of the CORECT, MECC and CFR GWAS data. F.R.S., S.L.S., S.J., H.W., B.Z. and D.V.C. contributed to the statistical analysis. F.R.S., S.L.S., G.C., U.P. and S.B.G. drafted the manuscript. E.L.G, B.H., R.B.H., L.N.K., R.G., R.H., S.Küry, M.I., P.A.N., D.W.W., S.I.B., B.W.Z., N.M.L., M.J., S.J.G., S.T., W.H.-J., K.M., X.O.S., Y.B.X., S.H.J., G.G., J.L.H., E.J., J.D.P., G.S., W.Z., L.L.M., S.B.G., G.R. and U.P. conducted the epidemiological studies for sample collection. All authors contributed to the writing of the manuscript, interpretation, and discussion of the findings. All authors approved the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Supplementary information
Supplementary Information
Supplementary Figures 1-5, Supplementary Tables 1-4, Supplementary Note 1, Supplementary Methods and Supplementary References (PDF 10196 kb)
Rights and permissions
About this article
Cite this article
Schumacher, F., Schmit, S., Jiao, S. et al. Genome-wide association study of colorectal cancer identifies six new susceptibility loci. Nat Commun 6, 7138 (2015). https://doi.org/10.1038/ncomms8138
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/ncomms8138
This article is cited by
-
Prioritization of risk genes in colorectal cancer by integrative analysis of multi-omics data and gene networks
Science China Life Sciences (2024)
-
Genetic risk impacts the association of menopausal hormone therapy with colorectal cancer risk
British Journal of Cancer (2024)
-
Genome-wide enhancer-gene regulatory maps link causal variants to target genes underlying human cancer risk
Nature Communications (2023)
-
Probing the diabetes and colorectal cancer relationship using gene – environment interaction analyses
British Journal of Cancer (2023)
-
Genetic risk factors for colorectal cancer in multiethnic Indonesians
Scientific Reports (2021)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.