We use cookies to improve your experience. By continuing to browse this site, you accept our cookie policy.×
Skip main navigation
Aging Health
Bioelectronics in Medicine
Biomarkers in Medicine
Breast Cancer Management
CNS Oncology
Colorectal Cancer
Concussion
Epigenomics
Future Cardiology
Future Medicine AI
Future Microbiology
Future Neurology
Future Oncology
Future Rare Diseases
Future Virology
Hepatic Oncology
HIV Therapy
Immunotherapy
International Journal of Endocrine Oncology
International Journal of Hematologic Oncology
Journal of 3D Printing in Medicine
Lung Cancer Management
Melanoma Management
Nanomedicine
Neurodegenerative Disease Management
Pain Management
Pediatric Health
Personalized Medicine
Pharmacogenomics
Regenerative Medicine
ReviewFree Access

DNA methylation markers for early detection of women’s cancer: promise and challenges

    Timo Wittenberger

    Genedata AG, Margarethenstrasse 38, 4053 Basel, Switzerland

    Authors contributed equally

    Search for more papers by this author

    ,
    Sara Sleigh

    Euram Ltd, 6 Musters Road, Nottingham, UK

    Authors contributed equally

    Search for more papers by this author

    ,
    Daniel Reisel

    Department of Women’s Cancer, UCL Elizabeth Garrett Anderson Institute for Women’s Health, University College London, London, UK

    Authors contributed equally

    Search for more papers by this author

    ,
    Michal Zikan

    Gynaecologic Oncology Center, Department of Obstetrics & Gynaecology, First Faculty of Medicine & General University Hospital, Charles University Prague, Prague, Czech Republic

    Authors contributed equally

    Search for more papers by this author

    ,
    Benjamin Wahl

    GATC Biotech AG, Jakob-Stadler-Platz 7, 78467 Konstanz, Germany

    Authors contributed equally

    Search for more papers by this author

    ,
    Marianna Alunni-Fabbroni

    Department of Gynecology & Obstetrics, Ludwig-Maximilians-University Hospital, Munich, Germany

    Authors contributed equally

    Search for more papers by this author

    ,
    Allison Jones

    Department of Women’s Cancer, UCL Elizabeth Garrett Anderson Institute for Women’s Health, University College London, London, UK

    ,
    Iona Evans

    Department of Women’s Cancer, UCL Elizabeth Garrett Anderson Institute for Women’s Health, University College London, London, UK

    ,
    Julian Koch

    Department of Gynecology & Obstetrics, Ludwig-Maximilians-University Hospital, Munich, Germany

    ,
    Tobias Paprotka

    GATC Biotech AG, Jakob-Stadler-Platz 7, 78467 Konstanz, Germany

    ,
    Harri Lempiäinen

    Genedata AG, Margarethenstrasse 38, 4053 Basel, Switzerland

    ,
    Tamas Rujan

    Genedata AG, Margarethenstrasse 38, 4053 Basel, Switzerland

    ,
    Brigitte Rack

    Department of Gynecology & Obstetrics, Ludwig-Maximilians-University Hospital, Munich, Germany

    ,
    David Cibula

    Gynaecologic Oncology Center, Department of Obstetrics & Gynaecology, First Faculty of Medicine & General University Hospital, Charles University Prague, Prague, Czech Republic

    &
    Martin Widschwendter,‡

    *Author for correspondence:

    E-mail Address: m.widschwendter@ucl.ac.uk

    Department of Women’s Cancer, UCL Elizabeth Garrett Anderson Institute for Women’s Health, University College London, London, UK

    Authors contributed equally

    Search for more papers by this author

    Published Online:https://doi.org/10.2217/epi.14.20

    Abstract

    Breast, ovarian and endometrial cancers cause significant morbidity and mortality. Despite the presence of existing screening, diagnostic and treatment modalities, they continue to pose considerable unsolved challenges. Overdiagnosis is a growing problem in breast cancer screening and neither screening nor early diagnosis of ovarian or endometrial cancer is currently possible. Moreover, treatment of the diversity of these cancers presenting in the clinic is not sufficiently personalized at present. Recent technological advances, including reduced representation bisulfite sequencing, methylation arrays, digital PCR, next-generation sequencing and advanced statistical data analysis, enable the analysis of methylation patterns in cell-free tumor DNA in serum/plasma. Ongoing work is bringing these methods together for the analysis of samples from large clinical trials, which have been collected well in advance of cancer diagnosis. These efforts pave the way for the development of a noninvasive method that would enable us to overcome existing challenges to personalized medicine.

    Figure 1. Effect of specificity and prevalence on assay outcomes.

    Shown are assay outcomes for a given sensitivity of 0.8 and two values of disease prevalence (0.1 and 0.01, respectively) plotted against assay specificity. Assay outcomes are TN’ (i.e., the number of TNs divided by the total number tested), FP’, TP’, FN’, PPV (i.e., the number of TPs divided by the total number of positive tests, which corresponds to the probability of having the disease if tested positive), and NPV (i.e., the number of TNs divided by the total number of negative tests). The graph shows clearly the strong dependency of PPV on prevalence, and its rapid decline with decreasing specificity.

    FN’: False-negative proportion; FP’: False-positive proportion; NPV: Negative predictive value; PPV: Positive predictive value; Prev: Prevalence; Sens: Sensitivity; TN’: True-negative proportion; TP’: True-positive proportion.

    Figure 2. Approximating clinical assay characteristics for prevalence <<1.

    PPV can be described as a function of specificity, Sens and Prev as shown in the figure. However, the PPV may be approximated for small Prev (Prev <<1) as a function of specificity and a constant (Sens*Prev). Hence, for a given (constant) Prev, the curves each correspond to a particular level of Sens. During assay development, such a graph can be used to define target specificity and Sens of the biomarker panel when a certain PPV value has to be reached (in order to confer clinical utility): For example, one could first define a range of Sens deemed realistic (e.g., 0.2–1.0; to also cover the most optimistic case). These values, multiplied with the Prev, result in the set of (Sens*Prev) curves, which delimit the specificity range one has to aim for. A horizontal line may be drawn between those curves at the desired PPV, and the intersection between the line and the curves is used to look up the specificity needed for different Sens. Importantly, the horizontal line becomes much shorter and moves to the right with increasing PPV (since it always has to connect the same set of [Sens*Prev] curves for a given Prev and specificity range; e.g., see B & C). This illustrates the strong dependency upon Prev (i.e., how the required specificity increases drastically with lower prevalence, even when a broad range of Sens is allowed). Three examples for this approach are given (A, B & C); the estimated and exact specificity values for these examples are shown in the table. (The Prev numbers given for ovarian and breast cancer are arbitrary and only for illustration; see text for discussion on cancer epidemiology.).

    Estimated assuming prevalence <<1.

    PPV: Positive predictive value; Prev: Prevalence; Sens: Sensitivity.

    Figure 3. True-positive versus false-positive rates needed for different positive predictive values, given a disease prevalence of 0.05.

    The characteristics of any biomarker assay can be drawn into this plot with a receiver-operating characteristic curve, and then used to determine the PPV that can be reached with this assay, and the sensitivity and specificity to which the assay has to be set; only when the receiver-operating characteristic curve is partially located to the left of a particular PPV threshold can that PPV be achieved.

    PPV: Positive predictive value.

    Figure 4. Differentially methylated DNA biomarkers for early cancer detection.

    (A) Cell-free circulating tumor DNA from a blood sample for detection of breast and ovarian cancers from which tissue cannot be obtained readily; and (B) tumor DNA from vaginal fluid for the detection of endometrial and cervical cancers.

    Figure 5. High-throughput methods including the Illumina Methyl 450k array and reduced representation bisulfite sequencing are used for DNA methylation discovery.

    (A) For the Illumina Methyl 450K array, DNA is treated with bisulfite to convert all unmethylated cytosines into uracil. During subsequent whole-genome amplification, uracil is replaced by thymine. DNA is randomly sheared; fragmented DNA is loaded onto the chip and hybridizes to either methylated (M) or unmethylated (U) probes. Binding of methylated loci to methylated probes leads to single base extensions and fluorescently tagged nucleotides. Analogously unmethylated loci bind to unmethylated probes. Finally, the proportion of incorporated fluorescent nucleotides is quantified for each probe. (B) Reduced representation sequencing enriches DNA for CpG rich regions by cleavage using a restriction enzyme (e.g., MspI being most commonly used). After enzymatic digestion, the ends of DNA fragments are repaired (filled with nucleotides) and sequencing adapters are ligated. Subsequently, the adapter-ligated DNA is bisulfite-converted and amplified by PCR. Subsequent size selection by agarose gel electrophoresis ensures that only fragments having the restriction site (‘CCGG’ for MspI) on both ends within a small nucleotide range are sequenced. The DNA is extracted from gel, quality is controlled and the libraries are sequenced. By comparison of the sequence reads to a genomic reference, methylation rate for each cytosine present in the libraries is determined.

    WGA: Whole-genome amplification.

    Figure 6. Clinical assay formats to analyze aberrantly methylated cell free DNA targets in cancer samples.

    (A) Analysis of bisulfite-modified cell-free DNA by digital PCR. Each individual sample is partitioned into 1000s of nanoliter partitions via oil emulsion and dispersion. Low abundant cancer-specific and differentially methylated targets are sequestered away from wild-type background DNA to increase the signal-to-noise ratio. The absolute number of target molecules is counted at end point PCR and expressed as number of molecules per volume of serum allowing patient samples to be compared. (B) Bisulfite sequencing of cell-free DNA derived from serum. Following bisulfite conversion, specific DNA regions that are aberrantly methylated in cancer samples can be analyzed at single base-pair resolution.

    Figure 7. Pipeline overview for cancer differentially methylated region detection from Illumina 450K data.

    After the preprocessing and QC, DMR detection is carried out by comparing the methylation of cancer and WBC samples. After DMR detection, several steps and criteria are applied for filtering and ranking.

    DMR: Differentially methylated region; QC: Quality control; WBC: White blood cell.

    Figure 8. Example of potential differentially methylated region biomarker for early detection of ovarian cancer.

    Data from Illumina 450K arrays is shown. (A) Normalized M-value profiles (arithmetic mean of the probes within the range) for all the relevant samples. (B) Normalized M-values of individual probes in the genomic context for ovarian cancer and WBC samples; 482 bp genomic region with four probes is shown.

    WBC: White blood cell.

    Screening & diagnosis

    In the EU and the USA, one in every three individuals face a cancer diagnosis in their lifetime. In women, approximately half of all cancers originate in the female reproductive organs (i.e., breast, ovary, endometrium and cervix) [1,2]. Breast cancer (BC) is the most common cancer in women, affecting as many as one in eight women. The high mortality rate is related to its tendency to spread: a third of axillary, node-negative BC patients develop local or distant metastases, even in the absence of tumor spread at the time of primary diagnosis [3,4].

    Ovarian cancer (OC) has low prevalence but is the most common cause of death from gynecological malignancies. In the early stages, women are mostly asymptomatic or present with nonspecific symptoms, making diagnosis difficult. However, while patients with stage I disease have a 5‐year survival of >90%, diagnosis is most often made at stage IIIC when 5‐year survival is less than 40% [5]. The majority of patients with advanced stage disease undergo recurrence despite aggressive surgery and chemotherapy with platinum and taxanes [6]. Endometrial cancer (EC) is the most common gynecological malignancy and, despite an overall 5‐year survival rate of 83%, it ranks second in mortality among genital tract cancers, with the death rate steadily increasing over the past 20 years [7]. EC is commonly diagnosed early, and is therefore often curable. However, the survival in advanced stages falls dramatically to below 30% for stage III and 5% for stage IV. Additionally, the advanced disease, as well as serous papillary or clear cell cancer of the endometrium behave similarly to OC, meaning they are highly responsive to therapy prior to drug-resistant relapse [8,9].

    Cervical cancer incidence and associated deaths have been reduced by up to 80% through introduction and widespread uptake of triennial cervical screening with the Papanicolaou test (or Pap smear) [10]. Key factors in this success are the accessibility of the uterine cervix and availability of early stage markers – visualization of abnormal cells and/or HPV status.

    By contrast, breast, endometrial and ovarian cancers continue to pose considerable challenges in terms of prediction and early detection. Unlike cervical cancer, the abnormal cells are not directly accessible. Furthermore, in the case of OC, the cell of origin is not well defined.

    Ovarian cancer

    No effective screening method for OC is available at present. The US PLCO study comprised 78,216 women aged 55–74 years, who were screened with CA125 and transvaginal sonography (TVS) annually or underwent usual clinical care. It did not find any difference in OC mortality between the screening and conventional care arms [11]. The UKCTOCS, the largest prospective randomized trial in the world, has enrolled over 200,000 postmenopausal women. It compared TVS alone with combined TVS and CA125 screening. The trial demonstrated 89.5% sensitivity, 99.8% specificity and a positive predictive value (PPV) of 35.1% for the combination of methods (CA125 + TVS). For TVS alone, the values were 75.0% sensitivity, 98.2% specificity and 2.8% PPV [12]. Besides these promising results, the impact of screening on mortality remains to be determined, especially for individual subtypes of OC of different biological behavior and prognosis.

    Beyond screening, once a pelvic mass is identified in a patient, a noninvasive test that could help to triage patient care would be highly desirable. Biochemical markers, TVS, scoring systems and models are widely used to discriminate between benign and malignant disease. However, CA125 is also expressed in numerous benign conditions, and it is positive in only approximately 50% of early-stage OCs [13]. The Risk of Malignancy Algorithm, based on assessment of CA125 together with HE4, was initially thought to be superior to CA125 alone. However, other groups have suggested that further validation is required [14]. The Risk of Malignancy Index, a score based on ultrasound variables as well as on menopausal status and CA125, is widely used at present, mainly in the UK [15]. Although the Risk of Malignancy Index allows the referring gynecologist to send a patient to experts based on an objective assessment, its sensitivity is as low as 78% [16]. TVS by an expert operator using a formal scoring model system, is a highly sensitive and ideal second-stage diagnostic method; however, it is highly dependent on individual expertise [17]. As a result, discrimination between benign and malignant ovarian tumor remains a significant challenge in clinical practice.

    Availability of a biomarker or a panel of biomarkers that can detect OC in its earliest stages with both sensitivity and specificity would improve outcomes. Biomarkers may also be employed to identify some types of EC. In addition, they may be used as prognostic or predictive indicators, or as novel targets for cancer treatment.

    Breast cancer

    The lack of diagnostic markers detectable in early BC is a critical issue in patient management. In clinical practice, since early BC does not cause symptoms, mammography or ultrasound imaging are used for screening. Mammography screening, which detects characteristic masses and/or microcalcifications, is routinely performed in patients older than 45–50 years of age. Intensified breast screening can also be used for younger women when other factors, such as genetic profile or family history, indicate higher risk of disease.

    Despite the fact that mammography screening for BC does save some lives, there is a continuing debate about its usefulness. Recent data demonstrate an improved treatment outcome for only 3–13% of women in whom BC was detected by screening mammography. Consequently, it would seem that 87–97% of women with a screening-detected BC do not receive a clear treatment benefit, indicating that these cancers are still not being detected early enough [18]. In addition, there is convincing evidence that women diagnosed with BC not picked up during a screening program – so called ‘interval breast cancers’ – have significantly poorer prognosis [19]. Furthermore, overdiagnosis is a growing concern. Cancers can be detected that would have otherwise naturally receded, or grow so slowly that the woman would die of other causes well before the cancer would produce any symptoms. The natural life course of BC was recently addressed in a cohort of 650,000 Swedish women, and it was found that BC incidence increased with biannual screening [20]. Based on these and data from previous studies [21], the estimated extent of overdiagnosis in BC screening programs is likely approximately 35%. However, the absolute risk reduction is only in the order of one in 1000 to one in 2000 women attending the screening programs [22]. A total of 92% of European women overestimated the true benefit of screening mammography by at least one order of magnitude (tenfold), or they reported that they did not know [23]. In short, concerns about the potential risks and harms of the current breast screening program continue, as should the development of novel methods for early detection.

    Endometrial cancer

    While EC is the most common malignancy of the female genital tract, routine screening is not recommended [24], the rationale being that symptoms due to these malignancies develop at an early stage in 85% of cases [25]. Moreover, the screening methods have not yet been evaluated for their impact on cancer mortality. Endometroid EC has a good prognosis with 5‐year survival (depending on grade) of over 80% at early stage. However, up to 25% of ECs are of nonendometroid type, meaning serous, clear-cell or mixed. Their prognosis is dramatically worse; similar to the prognosis of high-grade OC. These cancers typically spread in the pelvis very early and angioinvasion can be detected even in the absence of superficial invasion of the myometrium [26]. Patients suffering from this type of EC would greatly benefit from improved screening.

    Possible screening modalities for EC include measuring endometrial thickness with TVS and endometrial sampling with cytological examination. If the same cut-offs were applied to the definition of abnormal endometrial thickness in asymptomatic women [27] as used in symptomatic women [28], large numbers of false-positive results and unnecessary referrals for histological evaluations would follow. The Pap smear, successfully employed in the screening for cervical cancer, is not sensitive enough to reliably detect EC, although occasionally the Pap smear may identify endometrial abnormalities including EC.

    Defining screening/diagnostic test characteristics

    The development of a successful biomarker requires the precise definition of the clinical utility and the performance of the diagnostic test. Clinical utility defines the exact test populations, the follow-up procedures and decision diagrams. Depending on the clinical utility, specific thresholds for the test performance need to be set, including sensitivity, specificity and PPV (Figure 1). The experimental set-up for biomarker discovery, including sample composition, number of samples and type of critical control, as well as data analysis strategy, will critically depend on these predefined test requirements.

    A major challenge when developing any diagnostic test is to define the specific thresholds for test performance. If the follow-up diagnostic method is noninvasive and highly specific, a screening test with high sensitivity is preferred. If the follow-up procedure is invasive, the test needs to be optimized for high specificity, even at the cost of lower sensitivity.

    The goal of an early detection test for breast, ovarian or endometrial cancers would be to diagnose disease in women who are symptom-free. Early diagnosis of OC would significantly improve chances of curing the disease. For BC, the aim would be to improve on the current screening method, either through improvement in diagnostic accuracy or via reduction of costs. A screening test of this type has to be easily administered, minimally invasive, convenient for the patient and affordable. In addition, it should not deliver incorrect results, in particular false-positive results.

    The low prevalence of OC – between 0.4 and 0.6% of women aged above 50 years – poses a considerable challenge for screening test development. A PPV of 10% and above could be regarded as acceptable in the case of OC screening. Given the low prevalence, a screening test would need to have a specificity of 97.8% or above, with a sensitivity of above 50% to achieve this PPV (Figures 2 & 3). A screening test should also detect high-risk ECs, which are comparable in terms of incidence and clinical behavior to OC.

    Mammography screening can lead to early diagnosis of BC, resulting in less invasive surgery and the need for toxic systemic treatment in fewer patients. However, detection of benign or clinically irrelevant tumors causes harm to women through overtreatment. Regrettably, there is a high likelihood of side effects associated with the treatment, as well as psychological, social and economic costs. Furthermore, after BC diagnosis, it remains unclear in a relevant number of cases, whether the individual patient will benefit from chemotherapy or other systemic or radiotherapy treatment approaches. The development of a method for early detection of primary or secondary BC with a high sensitivity and specificity would help to avoid overtreatment and psychological distress [29]. During systemic treatment, early information about treatment efficacy would be invaluable and would help to prevent unnecessary side effects. The ideal test would be able to distinguish between benign and malignant lesions, most importantly being able to characterize the type of malignancy. It should be technically reliable and reproducible, while minimizing intra- and inter-observer variability. Thus, the testing system should be automated, or at least semi-automated. Finally, invasiveness should be reduced to a simple blood sample, making screening practicable in a wide variety of settings.

    With these qualifiers in mind, it will be possible to advance biomarker discovery and development. However, this process is not without its challenges, as discussed in more detail below.

    Past obstacles to biomarker development in women’s cancers

    Strikingly, despite major advances in biomedical science and analytical technologies, no novel biomarker has been approved for screening, diagnosis or monitoring response to adjuvant treatment in the field of women’s cancer in the last two decades. The US FDA last approved a biomarker in 2009 when HE4 protein analysis was approved for monitoring recurrence of OC. However, serum biomarkers for many diseases are employed in other areas of medicine (e.g., cardiac troponin for myocardial infarction, creatinine as an indicator of renal function, and choriogonadotropin confirming early pregnancy).

    In 2001, Pepe and co-workers proposed a model consisting of five phases of biomarker development, in many ways similar to the recognized phases of drug development [30]. Below, we use this model to explain why past attempts to develop biomarkers for cancer screening have failed.

    In Phase 1, potentially useful biomarkers are identified through comparison of tumor with non-neoplastic tissue/samples. The approach of comparing normal and cancerous tissue has not been feasible for proteomic analyses, which have relied on testing of serum/plasma. RNA is considered to be both analytically and biologically less stable. Moreover, tools for genome-wide, especially epigenome-wide analyses, have only become available recently. Nonetheless, progress in this field has been impeded by technical difficulties and the lack of appropriate analysis platforms.

    In Phase 2, clinical assays are developed based on a specimen (e.g., serum) that can be obtained noninvasively to differentiate subjects with cancer from those without, and provide insight into the true-positive rate and false-positive rate. As stated above, some approaches, including proteomics, are instigated at this second phase via direct analysis of serum or plasma from cancer patients at the time of diagnosis. Although this was successful at first, the majority of markers identified were in fact reflective of an inflammatory response, which is both unspecific and common in patients with an active cancer. To date, none of these markers have proven successful in diagnosing disease prior to the onset of clinical symptoms. An illustrative example was provided recently in our own work [31]. A DNA methylation (DNAme) signature identified in serum cell-free DNA from patients diagnosed with OC and age-matched controls, demonstrated a receiver-operating characteristic area under the curve of 0.8. It was subsequently validated in an independent set of cases and controls. Analysis of the genes involved in the signature, however, found that they were significantly enriched for markers that indicated an altered granulocyte/lymphocyte ratio in peripheral blood. Hence, it is extremely unlikely that this signature would indicate disease with the necessary level of sensitivity/specificity in a clinical setting.

    Furthermore, although numerous clinical trials in breast and OC treatment have been performed in the past two decades, very few of the clinical trials currently reporting outcome data have collected serial samples, prior to and after systemic treatment in order to validate biomarkers. Consequently, a major hurdle for the identification of early detection cancer biomarkers has been the lack of appropriately collected and readily available sample sets for discovery and validation.

    In Phases 3–5, clinical assays are validated in retrospective longitudinal repository studies, in prospective screening studies (where the screen is applied to individuals and concurrent definitive diagnostic procedures applied to those who screen positive) and, eventually, in cancer control studies to address whether screening reduces the burden of cancer on the population. Sample collections made over significantly long timeframes from sufficiently large numbers to include cancers with low prevalence and with appropriate sample processing are extremely rare. For these reasons, progress at Phase 3 has been largely thwarted. Additionally, studies examining the two serum markers PSA and CA125 (currently employed in prostate and OC screening) demonstrate inconsistencies with regard to their relative clinical value [12,32–34].

    In summary, the key obstacles for successful cancer biomarker development have been the lack of appropriate sample collections and available technologies.

    Moving forward: DNAme-based cancer biomarkers

    Over the past decade, we have seen intensified efforts to validate the involvement of epigenetic changes in cancer detection and monitoring. The best characterized epigenetic modification occurring during carcinogenesis is de novo methylation of CpG islands, correlating with transcriptional repression of the affected genes [35]. Recent studies identify 10–15 tumor suppressor genes in epithelial tumors that are silenced by mutations. The silencing of several hundred genes by DNAme demonstrate the important contribution of these modifications to tumor development [36,37].

    These discoveries in the field of epigenetics highlighting the important role of DNAme in carcinogenesis create new opportunities to identify biomarkers for early detection and personalized treatment of cancer (Figure 4). Numerous reports have shown that methylation signatures can be detected in virtually any body fluid (serum/plasma, smears, nipple fluid aspirate and vaginal fluid, among others). Blood samples can be obtained through a minimally invasive procedure and provide an ideal substrate for DNAme analysis. Circulating cell-free DNA (cfDNA) is present in healthy subjects at average concentrations of 30 ng/ml. If we are to assume that the DNA content of a normal cell amounts to 6.6 pg, this translates to an average of 5000 genome equivalents per ml of blood. In cancer patients, tumor DNA is released into the blood by dying cancer cells and the average concentration of cfDNA in the serum is higher, approximately 180 ng/ml [38].

    Analysis of tumor-specific DNAme in serum has a number of advantages compared with competing strategies:

    • • Improved sensitivity: cfDNA is easily amplified by PCR, with potential improvements in sensitivity;

    • • Fewer false positives: after acquiring methylation at a specific gene, the methylation pattern is generally conserved throughout disease progression;

    • • DNAme is a positively detectable signal: rather than a loss of signal, as in chromosomal deletions;

    • • Stable during sample collection and transportation: abnormal DNAme is chemically and biologically stable and is relatively unaffected by physiological state and sample collection conditions;

    • • Increased technical sensitivity and specificity: gene-specific assays using real-time PCR are easily adapted to commercial platforms present in diagnostic laboratories;

    • • Assay design advantages: selection of gene promoter CpG island hypermethylation offers advantages over genetic alterations that may be interspersed throughout a given gene.

    As a result of the technical difficulties of DNAme analysis, the scant numbers of DNAme markers identified to date apply to only a fraction of breast, ovarian and endometrial cancers. They are nonspecific, lack sensitivity, and are based upon labor-intensive, non-quantitative techniques. Two separate technological challenges arise. The first is the detection of tumor-specific DNAme patterns that are present at particularly low abundance. This requires a high signal-to-noise ratio and often uses methylation-specific PCR priming and, in some cases, methylation-specific probing. The second is determination of the consecutive sequence of methylation sites in individual DNA molecules at single base-pair resolution, which requires low sensitivity, methylation-independent priming and combined PCR product resolution for sequence analysis. Until now, the high signal-to-noise-ratio required to detect scarcely abundant alleles within high background levels of nontarget molecules has proved a major problem in applying DNAme assays for the clinical detection of cancer. However, the advent of digital MethyLight [39], discussed in more detail below, coupled with rapid advances in next-generation sequencing (NGS)-based technologies, such as those applied for the development of the PraenaTest™ (LifeCodexx, Germany), have provided novel ways to overcome this issue.

    Technologies for whole-genome DNAme biomarker discovery

    Recent advances in DNAme assay technology have the potential to enhance DNAme marker discovery by simultaneous analysis of many thousands of genomic loci. Additionally, it allows for an ultra-sensitive detection of negligible amounts of methylated DNA in a quantitative manner. DNAme profiling approaches can be divided into two main strategies: bisulfite conversion-based methods and affinity enrichment methods, both of which can be analyzed either by microarrays or NGS.

    Affinity enrichment technologies

    Methylated DNA immunoprecipitation (MeDIP) uses antibodies specifically binding to 5-methylcytosine [40]. Methylated DNA is enriched compared with unmethylated DNA, and genomic location can be determined using peak-calling algorithms. Accumulation of sequence reads indicates a high rate of cytosine methylation. A similar approach, methylated DNA capture by affinity purification sequencing (MeCAP), enriches DNA using methylated cytosine-binding proteins [41–43]. MeCAP enables the fractionation of bound DNA according to its GC content, providing detailed analysis of GC-rich regions [41].

    These affinity enrichment methods have been extremely valuable in understanding the principles of DNAme on a genome-scale in both model systems and tumors. However, while these methods are useful in detecting qualitative differences in methylation, they have reduced resolution – they cannot measure and differentiate methylation status of single CpGs – and sensitivity – they cannot reliably differentiate between low methylation and absence of methylation in a given region – and are especially sensitive to experimental batch effects, making cross-study comparisons challenging [44,45].

    Bisulfite conversion-based strategies

    In order to discover differentially methylated regions (DMRs; methylation present in tumor, absent in control DNA) from heterogenic tissue samples, the focus has moved to bisulfite conversion-based methods. Here, two approaches dominate: the Illumina Infinium Human Methylation 450 BeadChip arrays (Illumina, Inc., CA, USA) and NGS (Figure 5).

    Microarrays

    The Illumina 450K array, which measures the methylation status of approximately 480,000 cytosines in the human genome, enables DNAme profiling in population-wide studies. It offers moderate running costs and rapid data processing and analysis time. Of particular relevance to cancer biomarker studies, the 450K array has been the method of choice for DNAme analysis by The Cancer Genome Atlas consortium [46]. This offers added value due to the availability of reference data for control tissues and pan-cancer analysis.

    Next-generation sequencing

    The current gold standard for DNAme analysis is the bisulfite conversion of DNA with subsequent sequencing [47,48]. NGS, with decreasing cost and increasing throughput, may seem an obvious choice for an unbiased genome-wide discovery approach. NGS provides single nucleotide resolution and larger numbers of CpG sites can be analyzed compared with the microarray approach. Furthermore, repeat regions containing about half of all CpGs within the human genome can be analyzed in greater detail [49]. However, whole-genome bisulfite sequencing (WGBS) using NGS is more costly and although it provides the means for cross-study comparisons, fewer reference data sets are available in the public domain.

    A newly developed method, reduced representation bisulfite sequencing (RRBS) [50], utilizes the same approach as WGBS, but its libraries are enriched for CpG-containing motifs (Figure 5). RRBS enhances sequencing coverage of CpG dinucleotides and, by removing CpG-poor, constitutively methylated, intergenic regions, provides data at single base-pair resolution for CpG islands, promoters and enhancer elements. The protocol includes digestion of DNA with restriction endonuclease MspI leading to high sequencing coverage for the areas subsequently represented – the reduced representation genome. This is a particular advantage for comparative DNAme profiling as most fragments from this reduced representation genome will be sequenced in all RRBS analyses of a given species [51].

    Another important difference of the discussed discovery tools for DNAme is the amount of DNA available. Whereas microarrays and the enrichment methods such as MeDIP and MeCAP need DNA in the microgram range, RRBS as well as WGBS require far smaller amounts of DNA (∼50–100 ng) [52], which is a key consideration for a blood-based screening test. Moreover, circulating DNA in serum contains only small amounts of tumor DNA, whereas the majority of DNA derives from other cells [53]. Thus, the method for discovery of DMRs in serum/plasma DNA needs to be highly sensitive in order to discriminate between minimal differences in methylation levels. MeDIP and MeCAP detect regions of high methylation efficiently, but repetitive genomic elements can be detected as highly methylated in error and these methods are not as quantitative as RRBS [44,49].

    Similar to the variety of different library preparation methods, many different NGS technologies exist and can be used for analysis of DNAme. The commercial sequencing platforms available differ greatly with regard to read length, sequence reads per run, accuracy and costs [54,55]. The most-widely used sequencers are from Illumina and have been used in 5278 publications (correct from 20 January 2014 [56]). The HiSeq 2000/2500 has by far the highest output of all sequencers (∼300 Gb), generating enough data to cover on average each base within the whole human genome about 100-times with a single flow cell. Furthermore, it has the lowest costs per base and has been used for whole-genome bisulfite sequencing, as well as all other sequencing methods described above. 454 sequencing has an output of approximately 700 Mb using Roche’s GS FLX+, but costs per base are currently above those of the Illumina system. Nevertheless, the mode read length of up to 700 bp has made it the gold standard for amplicon sequencing and it is used for targeted approaches analyzing DNAme. Life Technologies offers a NGS technology, which measures changes in electric charge instead of detection of fluorescent-labeled nucleotides, thereby avoiding the time-consuming image analysis step. Therefore, the run-time is significantly reduced compared with sequencers from Roche and Illumina. The sequencing output is currently about an order of magnitude higher than the GS FLX+ (10 Gb with Ion PI chip) output, but still about 30-times lower than the HiSeq 2000/2500 output.

    All three companies offer smaller-size, bench-top versions of their sequencers. These have about ten-times less sequencing output and operate at higher costs per base. Nevertheless, they contribute to the spread of NGS technologies in universities and hospitals. The PacBio RS II (Pacific Biosciences) does not require amplified DNA as input. Furthermore, it is able to assess DNAme (as well as other DNA modifications) without bisulfite treatment and has the longest read length (∼3.5 kb with P4-C3 chemistry). However, the sequencing output is only about 200 Mb and for the detection of CpG methylation an average coverage of 500× is needed. In addition, the PacBio RS II has high error rates per base (∼15–20%) and high costs per base. Therefore, it is not suitable for genome-wide analysis of human samples (although it can be used for high quality DNA modification analysis of bacteria).

    Clinical DNAme assay platforms

    In addition to whole-genome DNAme analysis for discovery of DMRs, there are targeted methods for DNAme analysis. Both targeted sequencing of DMRs and PCR-based methods are available, which enable DNAme biomarkers to be translated into clinical practice (Figure 6).

    Methylation-specific PCR (MSP) is a relatively simple method that is more qualitative than quantitative [57]. Two sets of primers for a target region are designed, one that amplifies the methylated template and one that amplifies the unmethylated template after bisulfite conversion. The PCR products of the two PCRs are then analyzed by agarose gel electrophoresis. To quantify the ratio of methylated and unmethylated DNA accurately, MethyLight combines MSP with real-time PCR [58]. An even more sensitive approach compared with real-time PCR involves digital PCR in combination with MSP in a method called digital MethyLight [59]. For digital PCR, the DNA template is extensively diluted out across multiple smaller PCR reactions. Each sub PCR reaction will subsequently contain none, one or more template molecules. Reactions containing more than one template molecule are corrected for using Poisson statistics [60]. Currently available commercial digital PCR machines use emulsification of oil or microfluidics to generate hundreds to millions of reaction chambers. As digital PCR is usually not influenced by differences in PCR efficiency and offers absolute quantification of the DNA template, it offers a superior sensitivity and specificity compared with real-time PCR [61].

    Finally, ultra-deep sequencing of amplicons of bisulfite-converted DNA is used for gene-specific determination of DNAme [62]. The sensitivity and specificity of this approach can be increased by error correction techniques such as SafeSeqS, which adds unique random barcodes to every DNA template [63]. Comparison of all sequence reads with the same barcode allows correction of errors introduced by PCR and sequencing and thus reduces the error rate remarkably.

    Bioinformatics in DNAme biomarker discovery

    The task of bioinformatics in the cancer biomarker discovery is to provide prioritized lists of marker candidates with the desired sensitivity and specificity. A DMR identification and ranking pipeline has to be established that includes threshold cut-offs for quantitative inclusion/exclusion criteria, and parameters for scoring and ranking of DMRs passing the thresholds (Figure 7). Criteria for identification and prioritization of marker candidates need to be carefully selected to take into account clinical relevance, as well as technical feasibility. Appropriate case–control discovery data sets have to be defined, covering not only the disease phenotypes in question, but all clinically relevant control specimens that could interfere with marker specificity (examples are given below).

    A particular challenge for bioinformatics analysis in this field is that the clinical and discovery assays often rely on different technology platforms. The analysis needs to take into account technology-specific signal distribution, dynamic range and background signal. Discovery in samples from different tissues (e.g., solid tumor) to those used for diagnosis in the clinical routine (e.g., blood serum), increases the likelihood of failure in the validation phase. This problem can only be addressed by selecting a much higher number of discovery marker candidates for validation and by prioritizing those markers that may be identified in serum (e.g., DNA markers from amplified genomic regions) [64].

    Achieving specificity

    The bioinformatics workflow for biomarker discovery must be informed by the clinical application anticipated. A particularly important aspect is the reduction of false-positive results. Accordingly, likely sources of contaminating (DNA) material have to be identified and the respective control tissues included, in order to score for marker specificity (Figure 8). If the goal is to deliver clinical assays based on cfDNA, the most likely source of contamination is cellular DNA in the circulation, namely white blood cells (WBCs). For this reason, WBCs should be considered the most stringent filter when searching for cancer-specific methylation patterns in the serum. Large organs with high cell turnover, including the liver, lung and intestine, also release significant amounts of DNA into circulation. Therefore the availability and use of methylation data from these tissues in the bioinformatics analysis will improve the specificity of cancer DMRs and the accompanying clinical assay.

    Clinical assay specificity with regard to cancer type and clinical stage also need to be considered carefully. In the case of BC, where overdiagnosis from mammography screening is a concern, biomarker discovery identifies cancers with poor prognosis and contrasts these against cancers with good prognosis as well as benign tumors. By contrast, the clinical assay for OC should detect all classes of the disease. During the discovery phase, histological classification is important to account for variability of the methylation pattern, thereby improving statistical power. To date, cancer biomarker discovery has focused on specific cancer types, often contrasting results against other cancer types. Recent literature has shown that different cancer types share several molecular features [65]. A search for shared methylation patterns rather than contrasting ones may be a more effective strategy.

    Data analysis

    Analyses of 450K array and bisulfite conversion NGS data present unique opportunities and challenges. As with any microarray, the 450K array relies on probe hybridization and is thereby sensitive to differences in probe affinity to methylated versus unmethylated DNA. An additional challenge for analysis of 450K array data is the presence of two different types of probe sets (type I and II) on the same array, and data from the two probe set types showing different distributions [66]. Robust normalization methods have been developed for correction of probe type differences [66–69]. However, it should be noted that use of different normalization methods here may lead to slightly incommensurate results and conclusions.

    Another challenge, not generally acknowledged, is individual probe-to-probe variation in affinity/sensitivity to unmethylated versus methylated DNA. This variation has little impact on prioritization of DMRs as long as no absolute thresholds for methylation levels (e.g., methylation in control sample should be below β-value 0.1) are applied and data analysis is done solely on the single probe level. However, if the goal of the analysis is to identify DMRs of several neighboring CpGs (see below), this background noise from probe-to-probe variation can have a significant effect. It may lead to false-positives and -negatives when looking for homogenously methylated or unmethylated genomic regions, respectively. Such an analysis therefore requires data that is corrected for a probe-specific background.

    Compared with 450K array data, the NGS data has two main advantages: it does not suffer from probe-to-probe variation and it is fully quantitative, thus providing superior sensitivity. However, the costs, the data volume and analysis throughput, prevent the application of NGS on large sample numbers. A specific challenge in the bioinformatic processing of bisulfite sequencing data is data-mapping to the reference genome due to the reduced DNA sequence complexity. However, many bisulfite data-specific mappers perform well [70,71]. After mapping, it is straightforward to calculate the percentage of methylation from the base calls. Another question that has not been addressed fully is the sensitive detection of few copies of methylated DNA within a high background of unmethylated DNA (and vice versa). Specific algorithms will need to be developed for effective detection of tumor-specific DMRs using heterogeneous tumor samples or samples containing a background of nontumor DNA, especially serum.

    The search for DMRs as biomarker candidates has, in most of the cases, been performed at the single CpG level, especially for Illumina Infinium array data. For the older Infinium 27K array with low probe density, this was the only option. However, the newer 450K array has relatively high probe density in several genomic regions. Therefore, for both the 450K array and the NGS data, an analysis taking into account information of neighboring probes has a solid rationale and offers several benefits: DNAme status of neighboring CpGs in general is similar within genomic windows of a certain size and/or genomic features (e.g., transcription start site and enhancer, among others). Analysis of DNAme within such units with measurements of several CpGs improves the power to detect DMRs [44], and reduces possible false-positive/negatives that may arise from technical or biological noise; several clinical assays suitable for blood-based tests require more than one CpG for sensitive measurement (e.g., PCR-based MethyLight, described above, requires 5–6 CpGs). By looking for these ‘co-methylated’ neighboring CpGs, the feasibility of the biomarker candidate DMR to perform in a clinical test is addressed early on. Identified DMRs are more likely to survive the rigorous clinical assay development process. However, algorithms making use of the information on neighboring CpGs should do so based on the actual data rather than on annotation of genomic features, as there still might be considerable variation of methylation within such features.

    DMR detection at the single CpG level is straightforward; it is generally done with an appropriate two‐group statistical test, depending on the data transformation that has been applied, on cancer versus healthy tissue. The selection of the appropriate healthy control tissue is critical. If the goal is to develop a blood-based assay, we would recommend the use of WBCs. Effect-size calculated from the two-group test can be used for ranking the DMRs. For detection of genomic regions containing multiple differentially methylated CpGs, genome-wide bump-hunting [72], or a sliding-window approach can be used. Both methods identify DMRs with preferred window size and p (or q) value cut-offs, and an additional effect size can be calculated and used for ranking.

    Once the DMRs from the primary tumor versus control tissue statistical test have been selected/ranked, further available control tissue can be used for additional filtering and ranking. For both the single probe and multi-CpG range approaches, this can be carried out by applying two-group tests with p-value and effect size cut-offs. Once the information from all desired control tissues has been taken into account, further ranking can be done based on feasibility for the design of clinical assays. Principal criteria for feasibility are the number of CpGs within the DMR’s range or in the neighborhood of the single CpG, depending on the strategy used. Further filtering can be performed based on biological annotation; for example, whether the range/CpGs is within a CpG island or CpG shore, and this information may be used to establish hypothesis-derived test criteria.

    Multiplexing

    Even under optimal conditions, it is unlikely that a single marker would fulfil the high specificity requirements for population-wide, early detection of OC. A possible solution is to develop an assay with multiple read-out parameters, entailing multiple methylation sites or a combination of methylation and genomic markers. Here, the algorithm by which the measurements are combined to yield a diagnostic result critically influences the sensitivity and specificity of the test. For example, the requirement for all parameters to reach a certain threshold (i.e., a Boolean AND combination) increases the specificity in a multiplicative way, while decreasing sensitivity. Similarly, allowing parameters to compensate for each other (OR combination) increases sensitivity at the cost of specificity.

    For the early detection of OC, a combined or sequential readout (Boolean AND) of DNAme markers with specific genomic features (cancer-specific mutations), and protein markers (CA125), would likely be required to reach 99.7% specificity. The measurement methods would ideally be independent (not correlated) and individually have a specificity of above 90%. It is also important that each individual method is able to detect the cancer at an early stage. Introducing a method that cannot detect OC in Stage 1/2 would make it fail in clinical practice. The combination of markers and test methods add significant complexity to test development. All additions and combinations need to be planned for and tested carefully in the assay discovery and development process.

    Conclusion & future perspective

    The disappointingly slow progress in the field of biomarker development has been a major issue to date. Two of the major challenges are lack of appropriate technologies and in depth knowledge of disease processes. Recent technological advances are now enabling us to finally progress in this field. Biomarker discovery has been aided by availability of microarrays and bisulfite sequencing and it is anticipated that those key biomarkers identified will both represent early disease and lead to improved understanding of tumorigenesis. Equally, technologies that are clinically suitable, such as digital MethyLight and ultra-deep sequencing of bisulfite converted DNA, will allow us to translate biomarker discovery into clinically viable serum-based screening and diagnostic tests.

    Newly identified serum DNAme biomarkers of female cancers will require validation in the clinical setting. Nevertheless, ongoing research in this field is likely to enable improved diagnosis and treatment monitoring for all major women’s cancers. Furthermore, the methods described will be suitable for wider application, and offer the vision of a single serum-based screening test for multiple cancers.

    Executive summary

    Screening & diagnosis of breast, ovarian & endometrial cancers

    • • Breast cancer screening by mammography is offered routinely to women over 45–50 years of age, as well as to younger women considered to be at higher risk, but is complicated by overdiagnosis leading to unnecessary diagnostic procedures and treatment.

    • • The lack of an early screening or a diagnostic method for ovarian cancers often leads to diagnosis at a late stage of the disease, with associated poor survival for these women.

    • • Early detection of endometrial cancers that have poor prognostic features (e.g., clear cell and serous cancers) is likely to reduce mortality from these cancers.

    Clinical sampling & sample collections

    • • A cancer screening marker has to be validated in a population-based setting and in samples predating diagnosis.

    DNA methylation & epigenetic analysis

    • • Analysis of circulating tumor DNA and its epigenetic modification (methylation) in serum/plasma or vaginal fluid is a novel method for identification of cancer biomarkers.

    • • Reduced representation bisulfite sequencing offers a cost-effective alternative to whole-genome bisulfite sequencing for the identification of differentially methylated regions in DNA samples.

    • • In comparison to microarray-based methods, next-generation sequencing enables single nucleotide resolution, analysis of more CpG sites and analysis of samples containing as little as 30–50 ng DNA.

    • • Digital PCR is a targeted method that provides absolute quantification of a DNA template and can be used for sensitive and specific DNA methylation analysis with an increased signal-to-noise ratio compared with conventional methylation-specific PCR or MethyLight.

    Bioinformatics

    • • Discovering cancer-specific methylated DNA regions requires novel bioinformatic tools to identify the most relevant and most promising regions that show a consistent methylation pattern across all linked CpGs of a particular region in the cancer sample and not in any normal sample (e.g., in white blood cells).

    Future perspective

    • • Aberrant DNA methylation in serum/plasma or vaginal fluid will enable: early identification of individuals before the cancer becomes symptomatic and poses serious risk to well-being; and monitoring and personalization of cancer treatment.

    Acknowledgements

    The authors would like to thank M Burnell (UCL) for his valuable contributions during drafting of the review and S Oeschger (Genedata) for preparing Figures 7 and 8.

    Financial & competing interests disclosure

    The research leading to these results has received funding from the European Union Seventh Framework Programme (FP7/2007–2013) under grant agreement No. 305428 (EpiFemCare). The authors have no other relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript apart from those disclosed.

    No writing assistance was utilized in the production of this manuscript.

    Open Access

    This work is licensed under the Creative Commons Attribution-NonCommercial 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-nd/3.0/

    Papers of special note have been highlighted as: • of interest

    References

    • 1 American Cancer Society, GA, USA: Cancer Facts and Figures 2013. www.cancer.org/acs/groups/content/@epidemiologysurveilance/documents/document/acspc-036845.pdf
    • 2 European Commission´s Eurostat, Luxembourg: Health Statistics. http://epp.eurostat.ec.europa.eu/
    • 3 Weaver DL, Ashikaga T, Krag DN et al. Effect of occult metastases on survival in node-negative breast cancer. N. Engl. J. Med. 364(5), 412–421 (2011).
    • 4 Rosner D, Lane WW. Predicting recurrence in axillary-node negative breast cancer patients. Breast Cancer Res. Treat. 25(2), 127–139 (1993).
    • 5 Jemal A, Siegel R, Ward E, Hao Y, Xu J, Thun MJ. Cancer statistics, 2009. CA Cancer J. Clin. 59(4), 225–249 (2009).
    • 6 Ozols RF. Systemic therapy for ovarian cancer: current status and new treatments. Semin. Oncol. 33(2 Suppl. 6), S3–S11 (2006).
    • 7 Sorosky JI. Endometrial cancer. Obstet. Gynecol. 111(2 Pt 1), 436–447 (2008).
    • 8 Carey MS, Gawlik C, Fung-Kee-Fung M, Chambers A, Oliver T. Cancer Care Ontario Practice Guidelines Initiative Gynecology Cancer Disease Site Group. Systematic review of systemic therapy for advanced or recurrent endometrial cancer. Gynecol. Oncol. 101(1), 158–167 (2006).
    • 9 Chaudhry P, Asselin E. Resistance to chemotherapy and hormone therapy in endometrial cancer. Endocr. Relat. Cancer 16(2), 363–380 (2009).
    • 10 Arbyn M, Anttila A, Jordan J et al. European Guidelines for Quality Assurance in Cervical Cancer Screening. Second edition – summary document. Ann. Oncol. 21, 448–458 (2010).
    • 11 Buys SS, Partridge E, Black A et al. Effect of screening on ovarian cancer mortality: the Prostate, Lung, Colorectal and Ovarian (PLCO) Cancer Screening Randomized Controlled Trial. JAMA 305(22), 2295–2303 (2011).
    • 12 Menon U, Gentry-Maharaj A, Hallett R et al. Sensitivity and specificity of multimodal and ultrasound screening for ovarian cancer, and stage distribution of detected cancers: results of the prevalence screen of the UK Collaborative Trial of Ovarian Cancer Screening (UKCTOCS). Lancet Oncol. 10(4), 327–340 (2009).• UKCTOCS demonstrated that CA125 combined with transvaginal sonography can detect ovarian cancer with 89.5%, sensitivity, 99.8% specificity and 35.1% positive predictive value.
    • 13 Jacobs I, Bast RC Jr. The CA 125 tumour-associated antigen: a review of the literature. Hum. Reprod. 4(1), 1–12 (1989).
    • 14 Moore RG, McMeekin DS, Brown AK et al. A novel multiple marker bioassay utilizing HE4 and CA125 for the prediction of ovarian cancer in patients with a pelvic mass. Gynecol. Oncol. 112(1), 40–46 (2009).
    • 15 Jacobs I, Oram D, Fairbanks J, Turner J, Frost C, Grudzinskas JG. A risk of malignancy index incorporating CA125, ultrasound and menopausal status for the accurate preoperative diagnosis of ovarian cancer. Br. J. Obstet. Gynaecol. 97(10), 922–929 (1990).
    • 16 Geomini P, Kruitwagen R, Bremer GL, Cnossen J, Mol BW. The accuracy of risk scores in predicting ovarian malignancy: a systematic review. Obstet. Gynecol. 113(2 Pt 1), 384–394 (2009).
    • 17 Yazbek J, Raju SK, Ben-Nagi J, Holland TK, Hillaby K, Jurkovic D. Effect of quality of gynaecological ultrasonography on management of patients with suspected ovarian cancer: a randomised controlled trial. Lancet Oncol. 9(2), 124–131 (2008).
    • 18 Welch HG, Frankel BA. Likelihood that a woman with screen-detected breast cancer has had her ‘life saved’ by that screening. Arch. Intern. Med. 171, 2043–2046 (2011).
    • 19 Mook S, Van’t Veer LJ, Rutgers EJ et al. Independent prognostic value of screen detection in invasive breast cancer. J. Natl Cancer Inst. 103, 585–597 (2011).
    • 20 Zahl PH, Gotzsche PC, Maehlen J. Natural history of breast cancers detected in the Swedish mammography screening programme: a cohort study. Lancet Oncol. 12, 1118–1124 (2011).
    • 21 Jorgensen KJ, Gotzsche PC. Overdiagnosis in publicly organised mammography screening programmes: systematic review of incidence trends. BMJ 339, b2587 (2009).• A meta-analysis of public breast screening programs in five countries showing that one in three breast cancers detected in a population offered organized screening is overdiagnosed.
    • 22 Warner E. Clinical practice. Breast-cancer screening. N. Engl. J. Med. 365, 1025–1032 (2011).
    • 23 Gigerenzer G, Mata J, Frank R. Public knowledge of benefits of breast and prostate cancer screening in Europe. J. Natl Cancer Inst. 101, 1216–1220 (2009).
    • 24 Siegel R, Naishadham D, Jemal A. Cancer statistics, 2013. CA Cancer J. Clin. 63(1), 11–30 (2013).
    • 25 Pessoa JN, Freitas AC, Guimaraes RA, Lima J, Dos Reis HL, Filho AC. Endometrial assessment: when is it necessary? J. Clin. Med. Res. 6(1), 21–25 (2014).
    • 26 Tavasolli FA, Devilee P. WHO Classification of Tumours. Pathology and Genetics of Tumours of the Breast and Female Genital Organs. IARC Press, Lyon, France (2004).
    • 27 Langer RD, Pierce JJ, O’Hanlan KA et al. Transvaginal ultrasonography compared with endometrial biopsy for the detection of endometrial disease. Postmenopausal Estrogen/Progestin Interventions Trial. N. Engl. J. Med. 337(25), 1792–1798 (1997).
    • 28 Fleischer AC, Wheeler JE, Lindsay I et al. An assessment of the value of ultrasonographic screening for endometrial disease in postmenopausal women without symptoms. Am. J. Obstet. Gynecol. 184(2), 70–75 (2001).
    • 29 Henry NL, Hayes DF. Uses and abuses of tumor markers in the diagnosis, monitoring, and treatment of primary and metastatic breast cancer. Oncologist 11(6), 541–552 (2006).
    • 30 Pepe MS, Etzioni R, Feng Z et al. Phases of biomarker development for early detection of cancer. J. Natl Cancer Inst. 93, 1054–1061 (2001).• Describing a five phase model for biomarker development, in many ways similar to the recognized phases of drug development.
    • 31 Teschendorff AE, Menon U, Gentry-Maharaj A et al. An epigenetic signature in peripheral blood predicts active ovarian cancer. PLoS ONE 4, e8274 (2009).
    • 32 Andriole GL, Crawford ED, Grubb RL et al. Mortality results from a randomized prostate-cancer screening trial. N. Engl. J. Med. 360, 1310–1319 (2009).
    • 33 Buys SS, Partridge E, Black A et al. Effect of screening on ovarian cancer mortality: the Prostate, Lung, Colorectal and Ovarian (PLCO) Cancer Screening Randomized Controlled Trial. JAMA 305, 2295–2303 (2011).
    • 34 Schroder FH, Hugosson J, Roobol MJ et al. Screening and prostate-cancer mortality in a randomized European study. N. Engl. J. Med. 360, 1320–1328 (2009).
    • 35 Jones PA, Baylin SB. The epigenomics of cancer. Cell 128, 683–692 (2007).
    • 36 Rodriguez-Paredes M, Esteller M. Cancer epigenetics reaches mainstream oncology. Nat. Med. 17, 330–339 (2011).
    • 37 Widschwendter M, Fiegl H, Egle D et al. Epigenetic stem cell signature in cancer. Nat. Genet. 39(2), 157–158 (2007).• A model for the progression of epigenetic marks from reversible repression in stem cells via polycomb repressor complex 2 to aberrant DNA methylation in cancer precursor cells and persistent gene silencing in cancer cells.
    • 38 Gormally E, Caboux E, Vineis P, Hainaut P. Circulating free DNA in plasma or serum as biomarker of carcinogenesis: practical aspects and biological significance. Mutat. Res. 635, 105–117 (2007).
    • 39 Weisenberger DJ, Trinh BN, Campan M et al. DNA methylation analysis by digital bisulfite genomic sequencing and digital MethyLight. Nucleic Acids Res. 36, 4689–4698 (2008).
    • 40 Weber M, Davies JJ, Wittig D et al. 2005. Chromosome-wide and promoter-specific analyses identify sites of differential DNA methylation in normal and transformed human cells. Nat. Genet. 37, 853–862 (2005).
    • 41 Brinkman AB, Simmer F, Ma K, Kaan A, Zhu J, Stunnenberg HG. Whole-genome DNA methylation profiling using MethylCap-seq. Methods 52, 232–236 (2010).
    • 42 Rauch T, Pfeifer GP. Methylated-CpG island recovery assay: a new technique for the rapid detection of methylated-CpG islands in cancer. Lab. Invest. 85(9), 1172–1180 (2005).
    • 43 Serre D, Lee BH, Ting AH. MBD-isolated genome sequencing provides a high-throughput and comprehensive survey of DNA methylation in the human genome. Nucleic Acids Res. 38(2), 391–399 (2010).
    • 44 Bock CE, Tomazou EM, Brinkman AB et al. Quantitative comparison of genome-wide DNA methylation mapping technologies. Nat. Biotechnol. 28(10), 1106–1114 (2010).
    • 45 Bock C. Analysing and interpreting DNA methylation data. Nat. Rev. Genet. 13(10), 705–719 (2012).
    • 46 The Cancer Genome Atlas. http://cancergenome.nih.gov/ 
    • 47 Clark SJ, Harrison J, Paul CL, Frommer M. High sensitivity mapping of methylated cytosines. Nucleic Acids Res. 22, 2990–2997 (1994).
    • 48 Frommer M, McDonald LE, Millar DS et al. A genomic sequencing protocol that yields a positive display of 5-methylcytosine residues in individual DNA strands. Proc. Natl Acad. Sci. USA 89, 1827–1831 (1992).
    • 49 Harris RA, Wang T, Coarfa C et al. Comparison of sequencing-based methods to profile DNA methylation and identification of monoallelic epigenetic modifications. Nat. Biotechnol. 28(10), 1097–1105 (2010).
    • 50 Gu H, Smith ZD, Bock C, Boyle P, Gnirke A, Meissner A. Preparation of reduced representation bisulfite sequencing libraries for genome-scale DNA methylation profiling. Nat. Protoc. 6, 468–481 (2011).• Reduced representation bisulfite sequencing provides single nucleotide resolution, is highly sensitive and provides quantitative DNA methylation measurements.
    • 51 Chatterjee A, Rodger EJ, Stockwell PA, Weeks RJ, Morison IM. Technical considerations for reduced representation bisulfite sequencing with multiplexed libraries. J. Biomed. Biotechnol. 1–8 (2012) (2012).
    • 52 Gu H, Bock C, Mikkelsen TS. Genome-scale DNA methylation mapping of clinical samples at single-nucleotide resolution. Nat. Methods 7, 133–136 (2010).
    • 53 Ellinger J, El Kassem N, Heukamp LC et al. Hypermethylation of cell-free serum DNA indicates worse outcome in patients with bladder cancer. J. Urol. 179, 346–352 (2008).
    • 54 Quail MA, Smith M, Coupland P et al. A tale of three next generation sequencing platforms: comparison of Ion torrent, pacific biosciences and illumina MiSeq sequencers. BMC Genomics 13, 341 (2012).
    • 55 Mardis ER. A decade’s perspective on DNA sequencing technology. Nature 470, 198–203 (2011).
    • 56 Illumina. www.illumina.com 
    • 57 Herman JG, Graff JR, Myöhänen S, Nelkin BD, Baylin SB. Methylation-specific PCR: a novel PCR assay for methylation status of CpG islands. Proc. Natl Acad. Sci. USA 93, 9821–6982 (1996).
    • 58 Eads CA, Danenberg KD, Kawakami K et al. MethyLight: a high-throughput assay to measure DNA methylation. Nucleic Acids Res. 28, E32 (2000).
    • 59 Weisenberger DJ, Trinh BN, Campan M et al. DNA methylation analysis by digital bisulfite genomic sequencing and digital MethyLight. Nucleic Acids Res. 36, 4689–4698 (2008).
    • 60 Zimmermann B, Grill S, Holzgreve W, Zhong XY, Jackson LG, Hahn S. Digital PCR: a powerful new tool for noninvasive prenatal diagnosis? Prenat. Diagn. 28, 1087–1093 (2008).
    • 61 Hindson CM, Chevillet JR, Briggs HA et al. Absolute quantification by droplet digital PCR versus analog real-time PCR. Nat. Methods 10, 1003–1005 (2013).• A comparison of miRNA quantification by droplet digital PCR and real-time PCR revealing the greater precision and improved day-to-day reproducibility of droplet digital PCR but with comparable sensitivity.
    • 62 Varley KE, Mitra RD. Bisulfite Patch PCR enables multiplexed sequencing of promoter methylation across cancer samples. Genome Res. 20, 1279–1287 (2010).
    • 63 Kinde I, Wu J, Papadopoulos N, Kinzler KW, Vogelstein B. Detection and quantification of rare mutations with massively parallel sequencing. Proc. Natl Acad. Sci. USA 108, 9530–9535 (2011).
    • 64 Lee da E, Kim SY, Lim JH, Park SY, Ryu HM. Non-invasive prenatal testing of trisomy 18 by an epigenetic marker in first trimester maternal plasma. PLoS ONE 8(11), e78136 (2013).
    • 65 Cancer Genome Atlas Research Network, Weinstein JN, Collinson EA et al. The Cancer Genome Atlas Pan-Cancer Analysis Project. Nat. Genet. 45(10), 1113–1120 (2013).• The Cancer Genome Atlas Research Network has analyzed molecular aberrations at the DNA, RNA, protein and epigenetic levels in a variety of human tumors.
    • 66 Dedeurwaerder S, Defrance M, Calonne E, Denis H, Sotiriou C, Fuks F. Evaluation of the infinium methylation 450K technology. Epigenomics 3(6), 771–784 (2011).
    • 67 Maksimovic J, Gordon L, Oshlack A. SWAN: Subset-quantile within array normalization for Illumina Infinium Human Methylation450 BeadChips. Genome Biol. 13(6), R44 (2012).
    • 68 Marabita F, Almgren M, Lindholm ME et al. An evaluation of analysis pipelines for DNA methylation profiling using the Illumina Human Methylation450 BeadChip platform. Epigenetics 8(3), 333–346 (2013).
    • 69 Teschendorff AE, Marabita F, Lechner M et al. A beta-mixture quantile normalization method for correcting probe design bias in Illumina Infinium 450 k DNA methylation data. Bioinformatics 29(2), 189–196 (2013).
    • 70 Chatterjee A, Stockwell PA, Rodger EJ, Morison IM. Comparison of alignment software for genome-wide bisulphite sequence data. Nucleic Acids Res. 40(10), e79 (2012).
    • 71 Chen PY, Cokus SJ, Pellegrini M. BS Seeker: precise mapping for bisulfite sequencing. BMC Bioinformatics 11, 203 (2010).
    • 72 Jaffe AE, Murakami P, Lee H et al. Bump hunting to identify differentially methylated regions in epigenetic epidemiology studies. Int. J. Epidemiol. 41(1), 200–209 (2012).