Introduction

Hepatocellular carcinoma (HCC) is the sixth most common cancer and the third leading cause of cancer-related death worldwide, with an estimated 500,000 deaths per year1,2,3. Although surgical resection is regarded as a potentially curable treatment for HCC patients, majority of HCC patients are not medically suitable for this treatment and HCC recurrence after surgery remains a frequent event, leading to poor patient survival4,5. Liver transplantation is well established as a therapeutic treatment for patients with unresectable HCC but the scarcity of donors limits this treatment6. Many nonsurgical therapeutic options are also available for HCC patients, whereas these treatments exhibit limited survival benefit7,8. Therefore, early diagnosis and treatment of HCC remain a key goal for improving patient survival.

HCC development is intimately associated with chronic hepatitis B virus (HBV) infection, which accounts for over 50% of total cases worldwide9,10. Our previous studies have well demonstrated that pre-S mutants, which contain in-frame deletions in the pre-S1 and/or pre-S2 gene segments of HBV large surface antigen (LHBs) (Fig. 1), are HBV oncoproteins and can initiate multiple endoplasmic reticulum (ER) stress-related signaling pathways, contributing to growth advantages of hepatocytes and eventually HCC formation11,12,13. The presence of pre-S mutants in liver tissues or blood carries a 5-fold higher risk of HCC development in patients with chronic HBV infection14,15. The prevalence of pre-S mutants is as high as 60% in HCC patients and significantly associated with HCC recurrence after surgical resection16,17,18. As a result, pre-S mutants have emerged as important predictive markers for HCC risk in chronic HBV carriers as well as powerful prognostic markers for recurrence risk in HBV-related HCC patients following surgical resection14,15,18,19,20.

Figure 1
figure 1

Schematic representation of the wild-type and pre-S mutant LHBs gene. The wild-type LHBs gene contains three intact gene segments, pre-S1, pre-S2, and S. The numbers on the bottom of the gene indicate the nucleotide positions of each gene segment in the HBV genome. The arrow at the top of the diagram indicates the start (nucleotide 1) of the circular genome, and the numbers go clockwise and end at nucleotide 3221 (not shown). Here only the LHBs gene of the genome is shown. The pre-S1, pre-S2, and pre-S1 + pre-S2 mutants contain deletion mutations in the pre-S1, pre-S2, and both pre-S1 and pre-S2 gene segments, respectively. The grey and hatched boxes represent the regions deleted in the pre-S1 and pre-S2 gene segments, respectively.

Several methods have been applied to detect the presence of pre-S mutants in chronic HBV carriers and HBV-related HCC patients, including the methods based on immunohistochemistry (IHC) staining of HBV surface antigen (HBsAg)13,19, TA cloning and DNA sequencing following polymerase chain reaction (PCR) amplification of pre-S gene (including both pre-S1 and pre-S2 regions)14,15,16, and the usage of Pre-S Gene Chip17,20. Hepatocytes harboring pre-S1 and pre-S2 mutants have been shown to manifest themselves histologically as two different types of ground glass hepatocytes (GGHs, designated types I and II, respectively) in liver tissues13,19. Following IHC staining, types I and II GGHs express distinct inclusion-like and marginal patterns of HBsAg, respectively, and thus can be distinguished from hepatocytes harboring wild-type LHBs, which express a diffuse and homogeneous pattern of HBsAg13,19. In addition, pre-S gene can be specifically amplified from virus DNA extracted from patient’s liver tissue or blood by using nested PCR14,15,16. To detect pre-S deletion, the resulting pre-S gene PCR products either are directly subjected to DNA sequencing (when only a single PCR band is seen in agarose gel) or are first cloned into a TA vector followed by DNA sequencing (when two or more PCR bands are revealed)14,15,16. Alternatively, the pre-S gene PCR products can be analyzed by Pre-S Gene Chip17,20. In the cases of only a single PCR band, the pre-S gene PCR products are directly subjected to chip hybridization; in the cases of multiple PCR bands, TA cloning is first performed, followed by PCR amplification of pre-S insert-DNA, chip hybridization and eventually signal development17,20.

Although the current methods are successfully used to detect pre-S mutants, they can still only provide qualitative and semi-quantitative detection results. Moreover, since TA cloning is majorly dependent on PCR bands that are clearly separated and visible in agarose gel, the TA cloning-based detection of pre-S mutants may have the risk to omit unseparated or invisible PCR bands, resulting in lower detection sensitivity. Therefore, development of a method that can quantitatively detect pre-S mutants with higher sensitivity and fidelity may improve the clinical application of pre-S mutants in serving as predictive and prognostic markers for HBV-related HCC. In this study, we established a next-generation sequencing (NGS)-based platform for quantitative detection of pre-S mutants in plasma of HBV-related HCC patients. Our data showed that the NGS-based analysis was more sensitive than the TA cloning-based analysis in detection of HBV pre-S mutants.

Results

Patient profile and clinicopathological data

The clinicopathological characteristics of the 49 HBV-related HCC patients enrolled in this study are summarized in Table 1. There were 43 (88%) men and 6 (12%) women, and the median age of all patients was 57 years (range, 33 to 78). Forty-three (88%) patients were HBV e antigen (HBeAg) negative. HBV DNA was detected in 27 (55%) patients at a median of 1.4 × 105 copies/mL (range, 30.1 to 1.5 × 108). Tumor size was recorded for 48 (98%) patients (median: 3.5 cm; range: 1.5 to 35 cm).

Table 1 Clinicopathological characteristics of the 49 HBV-related HCC patients enrolled in this study (continued).

Pre-S genotype and patient grouping

According to the pre-S gene segments in which deletion mutations took place, the pre-S deletion fell into three types, the pre-S1, pre-S2, and pre-S1 + pre-S2 deletion, which spanned the pre-S1, pre-S2, and both pre-S1 and pre-S2 gene segments, respectively (Fig. 1). Therefore, each patient could be classified into one of the following pre-S genotype groups: the patients without pre-S deletion (wild-type), and the patients with pre-S deletion, including the patients with any one or two or all three types of pre-S deletion. Alternatively, the patients with pre-S deletion could be classified into two groups, simply depending on either pre-S1 or pre-S2 gene segment containing deletion mutations.

TA cloning-based pre-S genotyping in plasma of HBV-related HCC patients

By using TA cloning-based pre-S genotyping analysis, we first detected pre-S deletion in plasma of the 49 HBV-related HCC patients. As shown in Supplementary Table S1 and Table 2, pre-S deletion was detected in 17 out of 49 (35%) patients, among whom 9 (19%) patients had only pre-S1 deletion, 4 (8%) patients had only pre-S2 deletion, 3 (6%) patients had only pre-S1 + pre-S2 deletion, and 1 (2%) patient had both pre-S1 and pre-S1 + pre-S2 deletion. Additionally, 13 (27%) and 8 (16%) patients had deletion spanning the pre-S1 and pre-S2 gene segments, respectively. Each pre-S deletion region was also defined.

Table 2 Summary of the pre-S genotyping results by TA cloning- and NGS-based analyses in 49 HBV-related HCC patients.

NGS-based analysis detected pre-S deletion quantitatively

Next, we developed a NGS-based platform to detect pre-S deletion in plasma of the 49 HBV-related HCC patients. As shown in Supplementary Table S1, the same as the TA cloning result, the NGS result provided information on not only the pre-S deletion type but also the detailed pre-S deletion region. Importantly, in contrast to the TA cloning-based analysis, NGS-based analysis detected pre-S deletion quantitatively so that the frequency of each pre-S gene DNA within the pre-S gene PCR products could be determined. As a result, all three types of pre-S deletion could be detected in plasma of each patient although the frequency was as low as one copy of DNA in the analyzed PCR product. Notably, when the pre-S gene DNA with the highest frequency in each type of pre-S deletion was selected, its deletion region perfectly coincided with that of the same type of pre-S deletion clone detected by TA cloning in all analyzed patients.

By matching the result of TA cloning with that of NGS, the corresponding pre-S gene DNA detected by both analyses had the lowest frequency of 5.049%, as shown by the wild-type pre-S gene DNA in the patient No. 37 (Supplementary Table S1). Therefore, we set the frequency of 5.049% as the cut-off percentage value to divide the patients analyzed by NGS into one of the pre-S genotype groups. As shown in Table 2, up to 37 of 49 (75%) patients had pre-S deletion, among whom 10 (20%) patients had only pre-S1 deletion, 5 (10%) patients had only pre-S2 deletion, 6 (12%) patients had both pre-S1 and pre-S2 deletion, 1 (2%) patient had both pre-S2 and pre-S1 + pre-S2 deletion, and 15 (31%) patients had all three types of pre-S deletion. Moreover, up to 32 (65%) and 27 (55%) patients had deletion spanning the pre-S1 and pre-S2 gene segments, respectively.

NGS was more sensitive than TA cloning for detection of pre-S deletion in plasma of HBV-related HCC patients

By comparing the results between TA cloning and NGS in total 49 patients, pre-S deletion was detected in 17 (35%) and 37 (75%) patients by TA cloning- and NGS-based analyses, respectively (Table 2). Up to 20 (40%) patients were negative for pre-S deletion by TA cloning-based analysis but conversely were positive for pre-S deletion by NGS-based analysis. The detection of pre-S deletion was significantly more sensitive by NGS- than by TA cloning-based analysis (McNemar’s paired proportion test, P value  < 0.0001) (Table 3). Also, the simple kappa value for the agreement between both analyses in detection of pre-S deletion was fair at 0.29 (95% confidence interval (CI), 0.12 to 0.46) (Table 3). Similar result could be observed when comparing the concordance between both analyses in detection of deletion spanning the pre-S1 (McNemar’s paired proportion test, P value  < 0.0001; simple kappa coefficient, kappa value (κ) = 0.32 (95% CI, 0.15 to 0.49)) or pre-S2 (McNemar’s paired proportion test, P value  < 0.0001; simple kappa coefficient, κ = 0.27 (95% CI, 0.09 to 0.45)) gene segment (Table 3).

Table 3 Comparison of the pre-S genotyping results by TA cloning- and NGS-based analyses in 49 HBV-related HCC patients.

Because the PCR bands that are too close in size are difficult to be clearly separated by agarose gel electrophoresis, these bands might be visualized together as a single band in agarose gel. In such cases, when the TA cloning-based pre-S genotyping was performed, the individual pre-S gene PCR bands that remained unseparated as a single band in agarose gel might be possibly randomly omitted from analysis during the cloning process, resulting in the lower detection sensitivity of pre-S deletion. To clarify this possibility, we selected the patient No. 11 from the 50 HBV-related HCC patients for further analysis. Based on the result of TA cloning, this patient was negative for pre-S deletion (Supplementary Table S1). However, the result of NGS showed that this patient had pre-S2 deletion, and especially, the deletion pre-S gene DNA with the highest frequency was very close in size (only 15 base pairs in difference) with the wild-type pre-S gene DNA (Supplementary Table S1). As shown in Fig. 2, we carried out an additional TA cloning-based pre-S genotyping analysis for this patient, in which the single pre-S gene PCR band visualized in agarose gel was cloned into the TA vector, and 10 plasmid clones containing the pre-S insert-DNA were then randomly selected for Sanger DNA sequencing analysis. We observed that the 10 randomly selected plasmid clones of this patient contained either wild-type or deletion pre-S gene DNA, and the pre-S deletion region coincided with that by NGS analysis. This data might partly explain the lower detection sensitivity of pre-S deletion by TA cloning- than by NGS-based analysis.

Figure 2
figure 2

The pre-S genotyping result by additional TA cloning-based analysis in selected HBV-related HCC patient. (A) The pre-S gene PCR products prepared from the plasma DNA of the selected patient No. 11 were resolved by agarose gel electrophoresis with the 100-bp DNA size marker loaded on both sides. The single pre-S gene PCR band was indicated by arrow. The size of the marker was shown in base pairs. (B) Following TA cloning of the single pre-S gene PCR band, 10 plasmid clones containing the pre-S insert-DNA were randomly selected for Sanger DNA sequencing analysis. The result was shown as the ratio of the number of clones containing either wild-type or deletion pre-S gene DNA to the total number of clones sequenced. The NGS result was shown for comparison. For simplicity, only the pre-S wild-type or deletion whose total frequency was above the cut-off percentage value was shown. The pre-S gene DNA with the highest frequency in the indicated type of pre-S deletion and its deletion region was also shown. Abbreviations: bp, base pairs; del, deletion; nt, nucleotide.

Discussion

Despite remarkable progress in developing therapies for HCC, the high morbidity and mortality rates of HCC in patients with chronic HBV infection remain a big challenge21,22. Overall, liver resection and transplantation are still the best therapeutic options for patients with HCC5. However, the former is suitable for the patients with single or smaller HCC, and the latter is limited by the shortage of donor livers4,6. Early diagnosis of HCC is thus urgently needed for early treatment of HCC by surgical resection or allowing enough time for the patients waiting for suitable donor livers, improving the patient survival. Pre-S mutants, which harbor deletion mutation in the pre-S gene segments of LHBs, have been demonstrated as HBV oncoproteins that disturb a variety of ER stress-related signaling pathways in hepatocytes, contributing to genomic instability and HCC development11,12,13. It has been well documented that the presence of pre-S mutants in plasma represents a predictive marker for HCC risk in chronic HBV carriers as well as a prognostic marker for recurrence risk in HBV-related HCC patients following surgical resection14,15,18,19,20. Therefore, development of a method that can detect pre-S mutants with high sensitivity and accuracy may improve the clinical application of pre-S mutants in serving as predictive and prognostic markers for HBV-related HCC, providing survival benefit for the patients.

To date, besides the IHC-based method13,19, the methods applied to detect pre-S mutants in blood or liver tissues of patients with chronic HBV infection or HBV-related HCC are mainly based on the TA cloning approach14,15,16,17,20. Following PCR amplification of pre-S gene from the patient specimen, TA cloning approach is well established for isolation of specific pre-S gene DNA from the PCR products for further analysis. However, this approach has the following drawbacks in practical operation. First, the cloning procedure is somewhat time-consuming, especially in cases where multiple PCR bands of variable intensity are visualized in agarose gel. Moreover, the methodology is largely dependent on the visualization of PCR bands in agarose gel. The PCR bands with intensity below the detection limit cannot be visualized in agarose gel for the cloning. Last but not least, there is a difficult in clearly separating the PCR bands that are too close in size by agarose gel electrophoresis. As shown in this study, we demonstrated that when the wild-type and deletion pre-S gene DNA were too close in size to be clearly separated in agarose gel, the deletion pre-S gene DNA could be possibly omitted from detection during the cloning process, thus decreasing the detection sensitivity and increasing the false-negative rate by TA cloning-based pre-S genotyping analysis.

Over the past decade, the invention of NGS technology has revolutionized almost all fields of genomic studies23,24,25,26,27. In contrast to traditional Sanger DNA sequencing28,29, NGS technology performs massively parallel sequencing, during which millions of fragments of DNA from a single sample are sequenced in unison, providing qualitative and quantitative information on the sequence and number of individual DNA fragments at an incredible throughput30,31. As a result, for NGS there is no need to carry out TA cloning for individual DNA molecules before the sequencing analysis. Therefore, in this study, based on the NGS technology we developed a new method for detection of pre-S mutants in plasma of HBV-related HCC patients. Compared to the current TA cloning-based methods, our data showed that the NGS-based method exhibited several advantages. First, the sample preparation before analysis was more simple and easy. The pre-S gene PCR products could be directly subjected to NGS analysis without the steps of agarose gel electrophoresis and TA cloning. Additionally, individual pre-S gene DNA molecules were detected qualitatively and quantitatively so that their deletion region and frequency within the PCR products could be well determined. Most important of all, the detection of pre-S deletion was more sensitive. There were no concerns on the detection and resolution limitations by agarose gel electrophoresis. As low as one copy of pre-S gene DNA in the PCR products could be detected without the risk of omitting due to either the DNA intensity being too low to be visualized or the DNA size being too close to others to be clearly separated.

In conclusion, in this study we developed a NGS-based platform for detection of HBV pre-S mutants in plasma of HCC patients. Compared to the current TA cloning-based methods, our data showed that the NGS-based method could detect pre-S deletion quantitatively with a higher sensitivity, suggesting that the NGS-based platform may provide better accuracy for detection of pre-S deletion, potentially improving the outcome prediction of patients with HBV-related HCC.

Methods

Human plasma samples

In this study, the plasma samples were obtained retrospectively from 49 HBV-related HCC patients at China Medical University Hospital (Taichung, Taiwan), from July 2008 to May 2017, under the approval of the China Medical University & Hospital Research Ethics Committee. All plasma samples were stored at −80 °C for TA cloning- and NGS-based pre-S genotyping. The clinicopathological records of the patients were also collected. All research was performed in accordance with relevant guidelines and regulations and the informed consent was obtained from all participants.

TA cloning-based pre-S genotyping analysis

For TA-cloning-based detection of pre-S deletion, 200 μL of plasma of each patient was used for DNA extraction with the DNeasy Blood Kit (Qiagen, Valencia, CA) according to the manufacturer’s instructions. The pre-S gene was amplified from the plasma DNA, which contains virus DNA, by using nested PCR with primers for two successive rounds of reactions following the program shown in Table 4. The nested PCR was performed with the high-fidelity Platinum SuperFi DNA polymerase (Invitrogen, Carlsbad, CA), and the resulting pre-S gene PCR products were resolved by agarose gel electrophoresis. As shown in Fig. 3, in cases where only a single PCR band was visualized, the pre-S gene PCR products were directly subjected to Sanger DNA sequencing analysis. In cases where multiple PCR bands (approximately ranging from 350 to 600 base pairs in size) were visualized, individual bands in agarose gel were separately excised, purified, and cloned using the TOPO TA Cloning Kit (Invitrogen) after conducting A-tailing procedure, and the resulting plasmid clones containing the pre-S insert-DNA were then directed to Sanger DNA sequencing analysis.

Table 4 The primers and program of nested PCR for amplification of HBV pre-S gene.
Figure 3
figure 3

Flowchart of the pre-S genotyping by TA cloning- and NGS-based analysis. The virus DNA is extracted from patient’s plasma. The pre-S gene is amplified from virus DNA following first and second rounds of nested PCR. For TA cloning-based analysis, the pre-S gene PCR products are visualized by using agarose gel electrophoresis. When only a single PCR band is seen in agarose gel, the products are directly subjected to DNA sequencing analysis. However, when two or more PCR bands are revealed, individual bands are first cloned into the TA vector for further DNA sequencing analysis. In contrast, for NGS-based analysis, the entire pre-S gene PCR products, including both visible and invisible bands, are directly subjected to NGS sequencing analysis without the complex procedures needed for TA cloning-based analysis.

NGS-based pre-S genotyping analysis

For NGS-based analysis of pre-S deletion, the pre-S gene was amplified from the plasma DNA of each patient by using the same condition of nested PCR as the TA cloning-based analysis (Table 4). Unlike TA cloning-based analysis, the pre-S gene PCR products, regardless of containing single or multiple visible, even invisible bands in agarose gel, were directly subjected to NGS sequencing analysis without the need for gel electrophoresis, band excision, purification, and cloning (Fig. 3). For NGS sequencing analysis, DNA libraries were prepared using standard KAPA LTP Library Preparation Kit for Illumina Platforms (Kapa Biosystems, Boston, MA). Base calling and quality scoring were performed by an updated implementation of Real-Time Analysis on the NextSeq 500 system (Illumina, San Diego, CA) according to the manufacturer’s instructions. The bcl2fastq Conversion Software v2.20 (Illumina) was used to demultiplex data and convert BCL files to standard FASTQ file formats. After trimming the adaptor sequence and masking the low-complexity sequence with a quality score under 20, the sequence reads were compared with the master sequences from the reference sets of HBV genotypes, which were available on the NCBI website (https://www.ncbi.nlm.nih.gov/projects/genotyping/view.cgi?db=2), by using BLAST. The E-value cutoff was set as 1e-5 to reduce false-positive result. Finally, the pre-S deletion type, region, and frequency were determined by using our customized scripts. The deletion frequency was defined as the counts of reads with deletion divided by the total number of reads and then multiplied by 100.

Statistical analysis

The McNemar’s paired proportion test was used to compare the difference between TA cloning- and NGS-based analyses for detection of pre-S deletion, where a P value < 0.05 was considered to indicate statistical significance. Also, the agreement between both analyses in pre-S genotyping was evaluated by calculating the simple kappa coefficient with 95% CI, where κ = 1 denotes perfect agreement; κ > 0.80, excellent; κ > 0.60, good; κ > 0.40, moderate; κ > 0.20, fair; and κ > 0 indicates poor agreement32.