Introduction

Despite the fact that algorithm for managing nodular thyroid diseases is well established the diagnosis remain challenging and as a result only 10–40% of surgically resected thyroid nodules are malignant1.

Thyroid lesions occur frequently in the adult population, with up to 7% having palpable nodules and 50–67% having nodules that are detected in ultrasound examination or autopsy2,3,4,5. Although less than 10% of thyroid lesions are cancerous, the prevalence of individual nodules or multinodular goiters makes thyroid cancer the most common endocrine malignancy6,7,8,9. American Cancer Society estimated the total number of new thyroid cancer cases in the United States will exceed 64,000 in 201610, and an increasing number of new cancers detected each year is observed worldwide11.

The method of choice for the diagnosis of thyroid nodules is fine-needle aspiration (FNA) biopsy under ultrasound control followed by cytological evaluation12,13,14. Based on the recent Bethesda System for Reporting Thyroid Cytopathology, the FNA results are divided into 6 categories: nondiagnostic or unsatisfactory (I), benign (II), atypia of undetermined significance (AUS) or follicular lesion of undetermined significance (FLUS) (III), follicular neoplasm (FN) or suspicious for FN (IV), suspicious for malignancy (V), and malignant (VI)15,16,17. Nondiagnostic or unsatisfactory results of thyroid biopsy (Bethesda system category I) should ideally be limited to no more than 10% of the cases; however, according to some statistical studies, it occurs in up to 20% of specimens6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21. There is a need for repeated FNA in such situations6, 15, 21. Such a procedure is also recommended in patients with the FNA finding of AUS/FLUS (Bethesda system category III); however, surgery is also a management option here6, 15, 21,22,23. The treatment of choice in situations of Bethesda categories IV - VI is operation, although the risk of malignancy is above 97% only in category VI, while it is 60–75% in category V and only 15–30% in category IV6, 15, 21. Considering the frequency of nondiagnostic or unsatisfactory results of a thyroid biopsy but also the fact that AUS/FLUS diagnosis represents approximately 10% of FNA cases, 10% of FN or suspicious for FN cases, and almost 3% of suspicious for malignancy category21, it must be clearly stated that many invasive procedures in nodular thyroid disease are performed because of the lack of other diagnostic methods that could exclude malignancy. Therefore, there is a strong need for development of new diagnostic methods that could provide clinically useful information regarding thyroid nodular lesions in a non-invasive way. Discovery of such a method could prevent unnecessary repeated biopsies and surgical procedures like thyroid lobectomy or thyroidectomy.

Years of experience have shown the usefulness and the limitations of immunocytochemical analyses and genetic tests of material obtained by FNA as a supplement to cytology analysis. The evaluated markers have included galectin-3, E-cadherin, fibronectin, CD44v6, thyroid transcription factor 1, Cbp/p300-interacting transactivator 1, thyroglobulin, calcitonin, CEA, p27, cyclin D1, cytokeratin 19, thyroid peroxidase, HBME-1, beta-catenin, and p53 as well as searching for BRAF or RAS mutations and RET/PTC or PAX8/PPARɤ translocations24,25,26,27,28. The last four appear to have the greatest potential in differential diagnosis29,30,31,32. At the same time, no breakthrough in the process of thyroid malignancy diagnostics was achieved in the blood analysis, when using some of the aforementioned markers. The evaluated markers were, for example, galectin-3 and cytokeratin 19, as well as others such as metalloproteinase-1, chitinase 3-like 1 or angiopoietin-133. Currently, the expectations are especially high with regard to the evaluation of microRNA (miRNA) serum profiles, although further studies are necessary34,35,36. It should be noted that to date, none of the described methods revolutionized the methodology for nodular thyroid disease diagnostics.

Metabolomics analysis of tissue samples demonstrated a potential for discrimination between nodular lesions and healthy thyroid as well as in grading malignant and benign lesions37, 38. Unfortunately, an invasive procedure is necessary to obtain the material for analysis, and thus other biomarkers need to be discovered. In this study we describe 1H NMR spectroscopy based metabolomic analysis of biofluids, which could improve the diagnostics of nodular thyroid disease in a non-invasive manner. The data was obtained by measurement of paired serum and urine samples from patients classified to the following four groups: HC – healthy control, NN – non-neoplastic nodules, AF – follicular adenoma and TC – papillary thyroid cancer.

Methods

Sample collection

Serum and urine samples were collected from patients operated on at the First Department and Clinic of General, Gastroenterological and Endocrinological Surgery of Wroclaw Medical University. The protocol for this study was approved by the Commission of Bioethics at Wroclaw Medical University (Approval no. KB-248/2010), and written informed consent was obtained from all of the patients before enrollment in the study. All of the patients were euthyroid and had a normal level of thyroid stimulating hormone (TSH). They were not treated with any drugs before the surgery. Serum and urine were sampled from the participating 67 subjects (Table 1).

Table 1 Patients demographic data. N – number of patients, f – female, m – male, age including standard deviation.

Serum was sampled from the peripheral vein from all of the participants after overnight fasting, and it was collected using serum tubes that were then centrifuged at 1000 x rpm for 15 minutes at 4 °C. The samples were stored in Eppendorf type tubes and kept at −80 °C until the analysis.

Urine samples (morning, first pass) were collected in urine test cups. They were centrifuged at 4000 rpm for 10 min, and then, the urine samples were stored in 2-mL aliquots in falcon tubes at −80 °C until further use.

Sample preparation for NMR spectroscopy

Prior to the metabolomic experiment, the serum samples were thawed at room temperature,vortexed and 200 μL of serum was mixed with 400 μL of saline solution (0.9% NaCl, 15% D2O, containing 3 mM TSP). After centrifugation (12 000 x rpm, 10 min), 550 μL of supernatant was transferred to a 5-mm NMR tube. The samples were kept at 4 °C until measurement.

All of the urine samples were thawed at room temperature and mixed using a vortex mixer. The samples were centrifuged (12 000 x rpm, 10 min) and 400 μL of supernatant was mixed with 200 μL of PBS (0.5 M, pH = 7.00, 33% D2O, containing 3 mM NaN3 and 3 mM TSP). The samples were mixed again, and finally, an aliquot of 550 μL was transferred into a 5-mm NMR tube.

1H NMR measurements

The NMR spectra of the serum and urine samples were recorded at 300 K using an Avance II spectrometer (Bruker, GmBH, Germany) that was operating at a proton frequency of 600.58 MHz. The NMR spectra of the serum samples were recorded by using a CPMG pulse sequence with water presaturation (cpmgpr1d in Bruker notation). For each sample, 128 subsequent scans were collected with a spin-echo delay of 400 μs; there were 80 loops, a relaxation delay of 3.5 s, an acquisition time of 2.73 s, a time domain of 64 k, and a spectral width of 20.01 ppm.

The NMR spectra of the urine were recorded using NOESY pulse sequence with a water presaturation (noesy1dpr in Bruker notation) relaxation delay of 3.5 s, an acquisition time of 1.36 s, 128 transients, a time domain of 64 k, and a spectral width of 20.01 ppm.

The spectra were processed with a line broadening of 0.3 Hz and were manually phased and baseline corrected using Topspin 1.3 software (Bruker, GmBH, Germany). Serum spectra were referenced to an α-glucose signal (δ = 5.225 ppm), while urine spectra to the TSP resonance (δ = 0.000 ppm). Signal alignment was carried out using the correlation optimized warping algorithm (COW)39 and the icoshift algorithm implemented in Matlab (v 8.3, Mathworks Inc.)40. The water spectrum region was removed from the calculations. All of the spectra were normalized using the Probabilistic Quotient Normalization (PQN) method41, 42.

Preprocessing of variables prior to analysis

The metabolite resonances were identified according to the assignments published in the literature and on-line databases (Biological Magnetic Resonance Data Bank and Human Metabolome Data Base). For quantification purposes, integrals of the non-overlapping signal fragments were used. All of the variables (originating from different fluids) were scaled by unit variance.

Data fusion

The relative integrals of the resonances signals obtained from paired serum and urine samples data matrices were combined into one fusion data matrix. For the purpose of the model calculations, the names of the metabolites identified in the urine were replaced with structure - metabolite_[s] for serum and metabolite_[u] for urine to overcome overlaying metabolites. All repetitive metabolites from serum and urine were treated as separate variables.

Multivariate data analysis

Multivariate data analysis was performed using SIMCA software (v 14.0, Umetrics). The order of the samples in the dataset was randomized. The discriminant version of the Partial Least Squares regression (PLS-DA) with a default k-fold cross validation procedure was used to determine the differences between the groups.

Samples were split into two datasets (model and test) based on the Kennard and Stone algorithm and randomized.

To improve the obtained models, variable selection using the VIP-plots with a jack-knifed confidence interval and confidence level of 0.95 was conducted. The variables that had a value VIP of below 0.8 were removed from the subsequent analysis until they had a negative influence on the R2 and Q2 parameters of the model. The new models were re-built on the basis of the selected variables, and then, the models’ reliabilities were tested with CV-ANOVA at the level of significance of p < 0.05.

The prediction performance of the VIP-PLS-DA models was estimated based on receiver operating characteristic (ROC) curves and area under curve (AUC) values. For this purpose, a perfcurve function from the Matlab statistical tool-box (Matlab v. 8.3, Mathworks, Inc.) was adopted. The specificity and sensitivity were determined according to the sample class prediction using the 7-fold cross-validated predicted values of the fitted Y-predcv (implemented in SIMCA-14 software) for observations in the model.

Statistical data analysis

For each metabolite of the serum and urine samples, the percentage difference (PD) and relative standard deviation (RSD) were calculated using STATISTICA 12. The percentage difference was calculated based on the mean values of relative signal integrals in each group. The calculations were performed from left to right. For the chosen metabolites, the statistical significance based on the Mann–Whitney–Wilcoxon (p < 0.05) or Student t test (p < 0.05) was calculated.

Results and Discussion

Multivariate analysis in the diagnosis of thyroid lesions

Each pathological state, even at the cellular level, should be reflected in the body fluids, at least in the most abundant urine and blood, which can be less-invasively collected43. The changes that occur on a molecular basis in the thyroid tissue at the genomic and proteomic levels ought to be reflected in a variation in the metabolome profile of the biofluids. There, a specific variation in the homeostatic concentration of low-molecular compounds is expected to occur as a characteristic of the existing pathological conditions of thyroid gland43. In this work, for the first time, we have investigated paired urine and blood serum samples by the use of NMR methodology for healthy controls (HC) and patients who suffer from benign changes as well as those with advanced carcinogenesis.

The representative 1H NMR serum spectrum of 36 and urine spectrum of 44 assigned metabolites in HC subject are presented in the examples in Figs 1 and 2. The metabolite resonances were identified according to assignments published in the literature and on-line databases (http://hmdb.ca and http://www.bmrb.wisc.edu).

Figure 1
figure 1

The representative spectrum 1H NMR obtained from serum samples of HC subjects. The following metabolites are identified: 1, L_1; 2, L_2; 3, Isoleucine; 4, Leucine; 5, Valine; 6, Unk_1; 7, 3-Hydroxybutyrate; 8, L_3; 9, L_4; 10, L_5; 11, Lactate; 12, Alanine; 13, L_6; 14, Acetate; 15, L_7; 16, L_8; 17, NAC; 18, Acetone; 19, Acetoacetate; 20, Pyruvate; 21, Glutamine; 22, Citrate; 23, Unk_2; 24, Creatine; 25, Dimethyl sulfone; 26, Chol+GPC+APC; 27, Glucose; 28, Betaine; 29, Methanol; 30, Glycerol; 31, Creatinine; 32, L_9; 33, Tyrosine; 34, π-Methylhistidine; 35, Phenylalanine; 36, Formate.

Figure 2
figure 2

The representative spectrum 1H NMR obtained from the urine samples of HC subjects. The following metabolites are identified: 1, Unk_1; 2, 3-Hydroxyisobutyrate; 3, 3-Methyl-2-oxovalerate; 4, Isopropanol; 5, 3-Hydroxybutyrate; 6, Methylmalonate; 7, Fucose; 8, 3-Hydroxyisovalerate; 9, Lactate; 10, 2-Hydroxyisobutyrate; 11, 2-Phenylpropionate; 12, Unk_2; 13, Alanine; 14, Acetate; 15, Unk_3; 16, Unk_4; 17, N-Isovaleroylglycine; 18, Acetone; 19, 2-Aminoadipate; 20, Unk_5; 21, Unk_6; 22, Citrate; 23, Dimethylamine; 24, Unk_7; 25, Creatine; 26, Creatinine; 27, Carnitine; 28, TMAO; 29, Unk_8; 30, Unk_9; 31, Glycine; 32, Glycylproline; 33, Hippurate; 34, Trigonelline; 35, Ascorbate; 36, 2-Furoylglycine; 37, 3-Hydroxyphenylacetate; 38, Tyrosine; 39, N-Phenylacetylglycine; 40, 3-Indoxylsulfate; 41, Unk_10; 42, Formate; 43, Unk_11; 44, 1-Methylnicotinamide.

Discrimination between controls and thyroid lesions

Initially, all of the obtained NMR data were subjected to calculations of seven discriminatory PLS models, for each type of thyroid lesion and each type of collected biological material. The selected metabolites were chosen based on the best parameters of separation between the groups by using the VIP scores, and they were used for further VIP-PLS-DA model calculations (Figs 3 and 4). The calculated parameters of the designed models as represented by ROC curves were compiled in Table 2.

Figure 3
figure 3

The VIP-PLS-DA models and ROC curves obtained from serum samples. (A) – NN vs HC; (B) – FA vs HC; (C) – TC vs HC; (D) – TC vs NN; (E) – FA vs NN; (F) – FA vs TC; (G) – P vs HC. Red triangles – healthy control; blue boxes – non-neoplastic nodules; yellow diamonds – follicular adenoma; black hexagons – papillary thyroid cancer; gold circles – patients. Solid symbols: training set; empty symbols: predicted test set.

Figure 4
figure 4

The VIP-PLS-DA models and ROC curves obtained from the urine samples. (A) – NN vs HC; (B) – FA vs HC; (C) – TC vs HC; (D) – TC vs NN; (E) – FA vs NN; (F) – FA vs TC; (G) – P vs HC. Red triangles – healthy control; blue boxes – non-neoplastic nodules; yellow diamonds – follicular adenoma; black hexagons – papillary thyroid cancer; gold circles – patients. Solid symbols: training set; empty symbols: predicted test set.

Table 2 The parameters of PLS-DA models obtained from 1H NMR analysis of serum, urine samples and fusion data (*latent variables).

Each of analyzed biofluids exhibited different discrimination potential (Table 2). The best separation using serum between healthy subjects and patients was obtained for NN vs HC comparison (Q2 = 0.478, AUC test set = 0.83). The two other models, FA vs HC and TC vs HC, were on the same level of discrimination (AUC test set equal to 0.71 and 0.73) but did not pass the test of model significance. The urine models for these groups also showed a p value that was higher than 0.05, while the predictive potential between particular comparisons was in the following order: FA vs HC (AUC test set = 1) > TC vs HC (AUC test set = 0.73) > NN vs HC (AUC test set 0.61). Interestingly, in the comparison between healthy subjects and all of the collected thyroid lesions, All patients (P) vs HC were only slightly different between the biofluids, showing blood serum (AUC test set 0.84) to be more appropriate diagnostic material than urine (AUC test set 0.76). Conversely, the pairwise comparison between different thyroid lesions revealed that only one model (FA vs TC based on urine) provides satisfactory predictive power (AUC = 0.76).

In the case of models obtained on the fusion data, the basic model parameters were significantly better (Table 2). In support of this finding, the results also showed differences in ROC curve (Fig. 5) and AUC training (Table 2) values for all of the comparisons, which indicates a better fit for the model that uses the selected samples in relation to the models, that were constructed separately on serum or urine. As in the case of the obtained models for each biofluid model based on data fusion, the best performance was observed in the case of the comparison of HC vs NN /FA/TC and the total number of patients (P) with pathological changes (Table 2). For all of the models, all of the values that were obtained were above 0.83 (FA vs TC), while the highest value was 0.99 for TC vs HC comparison. The possibility of formulating prediction models was also higher in terms of comparisons of HC to pathological changes of the thyroid (Table 2). The highest predictive value was characterized by FA vs HC, which reached the AUC test of 1.00, while the lowest value was obtained for NN vs HC with an AUC test of 0.82.

Figure 5
figure 5

The VIP-PLS-DA models, ROC curves obtained from the fusion of data from urine and serum samples. (A) – NN vs HC; (B) – FA vs HC; (C) – TC vs HC; (D) – TC vs NN; (E) – FA vs NN; (F) – FA vs TC; (G) – P vs HC. Red triangles – healthy controls; blue boxes – non-neoplastic nodules; yellow diamonds – follicular adenoma; black hexagons – papillary thyroid cancer; gold circles – patients. Solid symbols: training set; empty symbols: predicted test set. Small violet circles – metabolites that influence group differentiations in PLS-DA models.

The data fusion from the serum and urine NMR measurements definitely strengthened of calculated models between the NN, FA, TC, P and the HC groups. That finding was due to the increased number of variables, which theoretically could add complementary information to the obtained models. The AUC values were between 0.82 and 1, which shows a high predictive potential based on combined information from both biofluids. Moreover, comparisons between different thyroid lesions were enhanced when data fusion was applied. Surprisingly, lesion development did not exhibit better predictive model abilities, as it has been previously found in the metabolomics investigation of thyroid tissues37, 38, 44. This fact might be explained by vascularization of the tumor tissue, where the size of the tumor could influence the type of vascularization45 and thus lead to providing less powerful information on the biochemical changes that are widely spread over the biological system.

Metabolic differences in thyroid lesions

In our previous study, we showed that the differentiation of tumor type was possible by conducting aqueous tissue extracts37. Based on this findings we decided to investigate whether similar discrimination can be obtained using biofluids (individually or in combination), which unlike tissue biopsies, can be easily collected. Considering all low-molecular-weight compounds that were identified in NN vs HC, when comparing serum to tissue extract samples, only lactate and formate were statistically important in both studies and followed the same trend of increasing values of the relative integral in NN. However the tissue lactate was increasing systematically with NN > AF > TC, which was found to be reversal to blood serum level. In tissue, the lactate upregulation is associated with alanine and glucose increasing level, two main sources of it. While in blood serum these metabolites are only slightly changed, decreased alanine and increased glucose, but are not statistically significant. This might be evidence of its fast utilization from circulating blood as an answer for eg. energy demand. Similarly, formate strong upregulation was observed in blood serum and tissue however due to the high RSD it is hard at this stage to consider this molecule as a potential biomarker.

In urine and tissue extract only two metabolites were overlapping: 3-hydroxybutyrate (3-HB) and acetone, both statistically important and followed the same types of changes – decreasing in NN. Collectively in the serum and urine samples, 10 metabolites were statistically important in the NN vs HC comparison (Tables 3, 4).

Table 3 Significantly changed serum metabolites (# - VIP plot selected metabolites; * - statistically significant metabolites).
Table 4 Significantly changed urine metabolites (# - VIP plot selected metabolites; * - statistically significant metabolites).

In contrast, in the tissue study, 16 metabolites with significant changes were found37. In the assessment of changes in the metabolite statistical data of FA vs HC, for serum and urine samples, four metabolites and three metabolites, respectively, were matched to the tissue study results, namely, valine, citrate, lactate, and tyrosine for serum and citrate, acetone and 3-hydroxybutyrate for urine samples. In serum samples along the statistically important metabolites in both studies, valine and tyrosine percentage differences were decreased in FA, which was in contrast to the trend in the aqueous tissue extract study. However, the changes in tissue extracts of citrate (decreasing) and lactate (increasing) are of opposed direction, while in serum blood both metabolites are increased in comparison to HC group. This data can pronounce the different changes, which occurred at the local level (tissue) and whole metabolism as an answer for pathological state. Another example can be shifted balance of valine, where decreased level in blood serum is observed and increased in tissue extract. Additionally the decreased level of amino acids especially relative integral of serum tyrosine, could not only be related to protein biosynthesis but also for the synthesis catecholamines46.

In the assessment of changes in the urine samples, the decreased trend in citrate, acetone and 3-hydroxybutyrate level for the FA group were reported in both studies showed metabolism directed towards energy demand. In the tissue extract study, a total of 15 metabolites were statistically important, while overall for serum and urine, 17 compounds were found, where the majority belonged to the urine – 12 (Supplementary Tables S1 and S2).

The third comparison was the TC vs HC subjects. The differences between these most distant groups should have given the largest differences written in the molecular information. In the tissue extract study, 22 metabolites from 26 identified were statistically important, while surprisingly, in serum and urine collectively, only 17 were significantly changed. Only four of the identified metabolites from serum and two from urine matched the results from the previous study37. Valine, alanine, creatine and tyrosine in serum samples were decreasing in TC, whereas the aqueous tissue extract showed the opposite trend. Only the creatine, replenishing energy in ADP-ATP cycle metabolite, which increased level is observed in the TC group matched with the changes from the previous study. In urine, citrate and acetone were statistically important, and it followed the same decreasing trend as in the tissue extract. The distinguishing of HC among all of the other investigated thyroid lesions appears to be possible based on the serum and urine samples. However, the molecular composition outcome was not as good as from the aqueous tissue extract in the previous study. The consequence of having a lower quantity of statistically important metabolites obtained in this study was that discrimination of the thyroid nodules types was more difficult. Among all of the comparisons between each type of thyroid lesion, only two common metabolites were identified in serum, which matched with the results obtained from the aqueous tissue extracts. From the FA vs NN comparison – the valine relative integral was decreased for the FA group in both studies, in TC vs NN valine and lactate decreased in serum for the TC group, with the opposite trend in the tissue extract study. For the FA vs TC comparison – only serum lactate was decreased in the TC group, which is also in contradiction to the previous study37. An increased level of lactate in the processes of carcinogenesis is a common symptom and it was unexpected that the level is the highest in the NN group.

In conclusion no significant changes were observed in the total lipid profile. The acetone level identified in the urine samples followed exactly the same trend, which was recognized in the tissue extract samples study. Moreover, in most cases is positively correlated with its precursor 3-hydroxybutyrate (except tissue TC vs HC). This may indicate that the level of acetone in the urine samples may be prognostic factor for thyroid nodules. The increased level of lactate in blood and tissue with opposite direction of change in urine can be related to possible influence of hypoxia microenvironment47 occurring in tissue and its direct translation on blood serum48 (comparison NN, FA vs HC). The lack of lactate level changes in the comparison TC vs HC can be caused by strong local tissue changes and/or the limited flow between these two compartments.

Clearly, changes were much more pronounced in the tissue extracts then in serum or urine. This is however, to be expected as tissue collected in situ should reflect metabolic state of the lesion, than any of biofluids that reports whole organism metabolism.

All our findings appear to be rational combining three biological compartments, where the obtained data from the tissue extract samples were directly occurred in the place of major pathological disturbances. In the case of serum and urine samples, visible changes in the relative integral of the metabolites could be caused by the general response of the biological system for homeostasis disorder, which could be more subtle then the changes directly in the pathological tissue37.

Conclusions

The models based on the fusion data have higher parameters and predictive potential compared with most of the models that were calculated separately for each body fluid. That finding indicates that combined datasets exhibit synergy that increases model stability and enhances diagnostic potential. However, it should also be noted that stratification of tumor types and their differentiation in relation to each other could not be obtained.

The VIP-PLS-DA method allows us to identify metabolites that are biomarker candidates and should be investigated in detailed in future.

Our study also allowed us to obtain a model with a 100% prediction for the FA vs HC comparison. The models that were calculated for comparisons between diseases units were of low quality, which could be connected to indications of similar changes in the distribution of metabolites in the organism. This similarity could negatively affect the creation of high-quality diagnostic models based on the proton NMR technique.

Despite the relatively good results in some comparisons, studies must be conducted on a larger cohort of patients in order confirm predictive potential of selected metabolites in diagnosing thyroid lesions.

Declarations

Ethics approval and consent to participate

The study was carried out in accordance with the Declaration of Helsinki. Serum and urine samples were collected from patients who were operated on at the First Department and Clinic of General, Gastroenterological and Endocrinological Surgery of Wroclaw Medical University. The protocol for this study was approved by the Commission of Bioethics at Wroclaw Medical University (Approval no. KB-248/2010).

Consent for publication

All subjects read and signed a written informed consent prior the enrollment to the study.

Availability of data and material

The datasets generated during and/or analyzed during the current study are available from the corresponding author upon request.