Introduction

Triple-negative breast cancer (TNBC) is a tumor subtype that lacks expression of the hormonal and HER2 receptors1, however, it is a genomically heterogeneous disease2. TNBC comprises 15% of all invasive breast cancers and tends to be detected when it is already large in size and high grade1,3. Systemic recurrence occurs more often in TNBC than non-TNBC and mostly within 5 years of the TNBC diagnosis, while non-TNBC shows constant recurrence risk during follow-up4. The overall 4-year survival rate of TNBC is 77.0%, which is lower than the overall survival rates of other subtypes which range from 82.7 to 92.5%5. Systemic recurrence of TNBC leads to poorer survival, and therefore, it is important to predict the systemic recurrence risk of TNBC to tailor treatment to the individual patient4. Tumor size, lymph node, androgen receptor, Ki-67, treatment options such as adjuvant chemotherapy, radiotherapy, and pathologic complete response (pCR) after neoadjuvant chemotherapy have been associated with TNBC prognosis3,6,7,8,9,10. Several studies have used breast magnetic resonance imaging (MRI) features to predict the prognosis of invasive breast cancer including TNBC11,12,13, and in other studies, peritumoral enhancement and perfusion parameters on MRI have been used for TNBC only14,15.

Radiomics shows promise for predicting cancer prognosis because it provides a comprehensive quantification of imaging phenotypes and allows an objective assessment of tumor phenotypes16,17. Tumors with higher heterogeneity are related with poorer survival, and because radiomics reflect tumor heterogeneity, radiomics of breast MRI have been reported as prognostic factors for invasive breast cancer18,19,20,21. To our knowledge, radiomics features using dynamic contrast-enhanced (DCE) MRI have not been evaluated to predict prognosis in TNBC, especially with 3-dimensional images from the whole tumor. One thing to note is that radiomics features of breast MRI are affected by the chosen imaging scanner22. Thus, radiomics features have to be validated with different scanners.

Therefore, the purpose of our study was to evaluate 3-dimensional radiomics features of breast MRI as prognostic factors for predicting systemic recurrence in TNBC. Furthermore, the results were externally validated with a different MRI scanner.

Methods

Study population

The Institutional Review Board approved this retrospective study and required neither patient approval nor informed consent for our review of patient images and medical records (Institutional review board, Severance Hospital Yonsei University College of Medicine, 4-2018-0520).

From January 2012 to December 2015, among 3,166 patients who underwent breast surgery due to breast cancer, 272 (8.6%) patients were confirmed with invasive TNBC. TNBC was defined when both estrogen receptors and progesterone receptors were <1% of tumor cell nuclei28 and also when HER2 staining scores were 1+ or 0, or 2+ with a fluorescence in situ hybridization amplification ratio of <2.029. Forty-one patients were excluded because they had undergone MRI with different vendors for the training and validation sets (n = 19), had not undergone MRI before treatment (n = 8), had undergone MRI at an outside hospital (n = 6), had experienced problems when downloading the Digital Imaging and Communications in Medicine (DICOM) files (n = 5), had undergone vacuum-assisted biopsy (n = 2), or was a recurred case (n = 1). Finally, 231 TNBCs in 231 patients were included. Mean age of the patients was 51.1 ± 12.0 years (range, 23 to 88). The mean follow-up period was 43.2 ± 15.8 months.

MRI acquisition

Breast MRI examinations were performed before any treatment using two 3-T MRI scanners (Discovery MR750w; GE Medical Systems, Milwaukee, Wisconsin, USA/Philips Achieva; Philips Medical Systems, Best, The Netherlands), and patients were imaged with the scanner used being decided arbitrarily according to the hospital’s daily schedule (182 for the GE scanner, 49 for the Philips scanner). Scans were performed with an 8-channel breast receiver coil with the patient in the prone position. Breast MRI consisted of T2-weighted axial images, fat-suppressed T2-weighted axial images, diffusion-weighted images, T1-weighted non-fat-suppressed pre-contrast images, 6 sets of 3D fat-suppressed DCE axial images, and T1-weighted fat-suppressed contrast-enhanced sagittal images. DCE-MRI was performed after injecting 20 mL of gadopentetate dimeglumine (Magnevist, Bayer HealthCare) over 15 seconds. Acquisition time of each DCE axial image was 79 seconds for the GE scanner (repetition time (TR)/echo time (TE), 6.2/1.3 ms; section thickness (ST), 3 mm; field-of-view (FOV), 32 cm) and 65 seconds for the Phillips scanner (TR/TE, 3.9/1.4 ms; ST, 1.5 mm; FOV, 32 cm). For each set of DCE axial images, subtraction images were generated.

Radiomics analysis

The DICOM files of the second phase of fat-suppressed DCE-MRI for the GE scanner and second subtracted DCE-MRI images for the Philips scanner were downloaded to draw regions-of-interest (ROIs). One radiologist (H.J.M.) drew a ROI on every slice of MRI that demonstrated the index tumor along the tumor border while excluding normal breast parenchyma with the semi-automated process provided by the Medical Image Processing, Analysis, and Visualization (MIPAV) application (http://mipav.cit.nih.gov/index.php), and captured images within the new frame were saved. Radiomics analysis was performed on the ROI by one of the authors (E. L.). In order to reduce data redundancy and inconsistency, once a ROI was located, a box enclosing the ROI was extracted from the 3D image and the voxel values in the box were scaled to normalize the intensity so that the data read the same way across all images. Then, information on the exact position of the ROI boundary was gathered and applied to the same image without ROI segmentation to extract the ROI only image (Supplementary Fig. S1). This process was necessary to remove interference from drawing the ROI boundary with a colored marker when analyzing features. To normalize the images, we used min-max scaling so that the voxel values of each image were within a range from 0 to 255. As a pre-processing step, all images were normalized to have similar data distribution. The histogram analysis consisted of 14 features (Supplementary Fig. S2). Shape- and size-based features were made up of 8 features. Textural features were calculated from the gray-level co-occurrence matrix (GLCM) and gray-level run-length matrix (GLRLM). The 3-dimensional GLCM and GLRLM were calculated in 13 directions. GLCM features were analyzed for 22 parameters and GLRLM features were analyzed for 11 parameters. The 3-dimensional wavelet transform was then applied with 8 decompositions, by directional low-pass and high-pass filtering (XLLL, XLHL, XLLH, XHLL, XLHH, XHLH, XHHL and XHHH). Matlab R2018a was used to generate the corresponding values of each parameter (The MathWorks, Inc. Natick, MA, US).

Establishing the Rad score

The total number of radiomic features was 3995. We used the least absolute shrinkage and selection operator (LASSO) with 10-fold cross-validation to identify the optimal number of features to predict systemic recurrence in the training set. Cross-validation was repeated 100 times to minimize bias from random partition and coefficients of the features were calculated. Relative standard deviation (standard deviation/mean) was calculated from the coefficients to select features that predicted systemic recurrence. The equation for the Rad score is as follows.

Rad score = M1X1 + M2X2 + .. + MnXn (Mn: coefficients of regression, Xn: selected features)

Data analysis

Clinical and pathologic data were reviewed. Patient age, neoadjuvant chemotherapy, surgery type, adjuvant chemotherapy, and radiotherapy were recorded. On the pathologic report, presence of pCR after neoadjuvant chemotherapy, pathologic invasive tumor size, nuclear grade, histologic grade, lymphovascular invasion, and the number of metastatic axillary lymph nodes after surgery were recorded. Interval from initial diagnosis to systemic recurrence or the date of the last follow-up was recorded. Patients were assigned to the training and validation sets according to the MRI scanner used. Patients who underwent MRI examinations with the GE scanner were classified as the training set (n = 182) and those who underwent MRI with the Philips scanner were classified as the validation set (n = 49), respectively.

Statistical analysis

Continuous variables of the training and validation sets were compared with the Mann-Whitney U test or independent two-sample t-test, and categorical variables were compared with Fisher’s exact test or the chi-square test, respectively. The Rad score was compared according to systemic recurrence with the Mann-Whitney U test. Univariable Cox proportional hazard regression was performed for the features predicting recurrence. Multivariable Cox proportional hazard regression was performed with the best subset selection method over 6 models to select the appropriate combination of features for external validation because the number of variables was larger than the number of events (Supplementary Table S3). For all 6 models, the Rad score was compared one-on-one with six variables which were significantly associated with systemic recurrence on univariable analysis. The concordance-index (C-index) for predicting systemic recurrence was calculated in the training set to evaluate the discriminative ability of each model30. From the six examined models, features that showed a high C-index for predicting systemic recurrence in the training set were selected to create the Clinical and Rad models. The Rad score consistently remained significant for the 6 models examined. We excluded pCR and histologic grade, because the data was completely separable and also showed low C-indexes. Moreover, pCR could be replaced by invasive tumor size on pathology and histologic grade was only significant between grade 1 and not available categories which were derived from pCR cases on the univariable analysis. The Clinical model was built up with only clinicopathologic variables. The Rad model was composed with the Rad score added to the Clinical model. External validation was performed with the features selected through multivariable analysis, and the C-index was calculated. We compared the C-indexes of the Clinical and Rad models. Statistical analyses were performed by R Statistical Software (version 3.5.1.; R Foundation for Statistical Computing, Vienna, Austria).

Results

Patients’ characteristics and recurrence events

The clinical and pathologic characteristics were compared between the training and validation sets and the results are presented in Table 1. Interval from initial diagnosis to systemic recurrence or the date of the last follow-up was significantly different between the two groups (P < 0.001). All other characteristics were not significantly different. Systemic recurrence was observed in 22 (9.5%) cases (training set, n = 19; validation set, n = 3). The median time to recurrence was 18.8 months (mean, 22.9 months; range, 7.0 to 61.6 months). The organs with tumor recurrence were the lungs (n = 14), liver (n = 10), bones (n = 8), distant lymph nodes (n = 8), brain (n = 7), peritoneum (n = 1), and adrenal gland (n = 1) with 19 cases showed multiple sites of recurrence.

Table 1 Comparison of clinical and pathologic characteristics between the training and validation sets.

Radiomic features and Rad score

The mean number of features selected from 100 repeats of the 10-fold cross-validation for LASSO was 32.23 (range 3 to 54). We chose mean number of 32 features to generate the Rad score (Table 2). Among them, 7 were selected from GLCM-related features, 24 from GLRLM-related features, and 1 from histogram analysis. The most frequently selected feature was difference entropy (n = 6). Twenty-five selected features were from wavelet decomposition. In the training set, the Rad score was significantly higher in the group with systemic recurrence (median, −8.430; interquartile range (IQR), −8.800 to −8.259) than the group without (median, −9.873; IQR, −10.226 to −9.468, P < 0.001) (Fig. 1).

Table 2 Selected 32 radiomics features of the Rad score.
Figure 1
figure 1

Distribution of the Rad score according to systemic recurrence. The Rad score was significantly higher in the group with systemic recurrence (Label 1, median, −8.430; interquartile range (IQR), −8.800 to −8.259) than the group without (Label 0, median, −9.873; IQR, −10.226 to −9.468, P < 0.001).

Clinical and Rad model for predicting systemic recurrence

On univariable analysis, pCR status, pathologic invasive cancer size, histologic grade, lymphovascular invasion, surgery type, number of metastatic axillary lymph nodes after surgery, and the Rad score were significantly associated with systemic recurrence (Table 3). Through the best subset methods, the Clinical model was built with pathologic invasive cancer size, lymphovascular invasion, surgery type, and number of metastatic axillary lymph nodes after surgery. The Clinical model for predicting systemic recurrence showed that lymphovascular invasion (hazard ratio (HR), 7.875; 95% confidence interval (CI), 2.679, 23.153; P < 0.001), surgery type (HR, 0.231; 95% CI, 0.078, 0.679; P = 0.008), and the number of metastatic axillary lymph nodes after surgery (HR, 1.06; 95% CI, 1.002, 1.121; P = 0.043) were statistically significant on multivariable analysis (Table 4). The C-index of the Rad score for predicting systemic recurrence in the training set was 0.964 (95% CI, 0.829, 1), which was significantly different from that of the Clinical model (0.879; 95% CI, 0.744, 1; P = 0.009). When the models were externally validated in the validation set, the C-index was 0.939 (95% CI, 0.604, 1) for the Clinical model and 0.765 (95% CI, 0.430, 1) for the Rad score. The C-index of the Rad score was significantly lower than that of the Clinical model (P = 0.001). The Rad model was composed with the Rad score added to the Clinical model. In the Rad model, lymphovascular invasion (HR, 4.414; 95% CI, 1.331, 14.637; P = 0.015) and the Rad score (HR, 39.302; 95% CI, 11.351, 136.078; P < 0.001) remained statistically significant. The C-index of the Rad model for predicting systemic recurrence in the training set was 0.97 (95% CI, 0.835, 1), which was significantly higher than the C-index of the Clinical model (0.879; 95% CI, 0.744, 1; P = 0.009).

Table 3 Univariable analysis of variables to predict systemic recurrence in TNBC.
Table 4 Multivariable analysis of variables to predict systemic recurrence in TNBC.

External validation

When the models were externally validated in the validation set, the C-index was 0.939 (95% CI, 0.604, 1) for the Clinical model and 0.848 (95% CI, 0.513, 1) for the Rad model (Table 4). The C-index of the Rad model was lower than the Clinical model, even though there was no statistical difference (P = 0.100).

Discussion

We tried to build up a model to predict systemic recurrence in TNBCs using the 3-dimensional radiomics features of pretreatment breast MRI. When the Rad score was combined with clinicopathologic variables (the Rad model), the new model was able to predict systemic recurrence better than the model comprised only with clinicopathologic variables (the Clinical model). However, the Rad model was not superior over the Clinical model in an external validation which used another MRI scanner with a different protocol.

The Rad score was extracted from a massive volume of radiomics features on MRI and was significantly higher in patients with systemic recurrence. Most features were selected from the GLCM and GLRLM, and the most frequently selected feature was the difference entropy of GLCM which was selected 6 times with different angles and wavelet combinations. Difference entropy is the measure of randomness and variability in neighborhood intensity value differences23. GLCM and GLRLM are calculated from the interaction between image pixels; thus, tumor heterogeneity can be better reflected with these parameters than histogram analysis or shape- and size-based features. Park et al. also found that GLCM had more valuable features for predicting the prognosis of invasive breast cancer although their study population was not TNBC-specific18.

Systemic recurrence occurred in 9.5% of our study population, which was within the range of a previous study3,24,25. The median interval to recurrence was 18.8 months and this finding was also seen in TNBC patients who relapsed and died within 5 years of diagnosis4. We created the Clinical model which was comprised of the pathologic invasive cancer size, lymphovascular invasion, surgery type, and number of metastatic axillary lymph nodes after surgery to predict systemic recurrence. The Clinical model in the training set showed a high C-index of 0.879 to predict systemic recurrence and the C-index of the Rad score itself was 0.964 (95% CI, 0.829, 1), which was significantly higher than the C-index of the Clinical model. The Rad model consisted of the Clinical model plus the Rad score. In the training set, the Rad model showed a significantly higher C-index of 0.97 than the Clinical model (P = 0.009) for predicting systemic recurrence in TNBC. While there have been several radiomics studies on invasive breast cancers and prognosis, they have not been TNBC-specific. Kim and colleagues evaluated entropy and uniformity only and they associated higher entropy on T2-weighted images and lower entropy on contrast-enhanced T1-weighted subtraction images with poorer recurrence-free survival in invasive ductal carcinoma20. Yamamoto and colleagues investigated only eight radiomics features and an enhancing rim fraction was related with metastasis-free survival19. Li and colleagues showed that there was a relationship between radiomics and multigene assays, but their study did not evaluate recurrence nor survival21. Park and colleagues studied four radiomics features and reported that the combination of radiomics and clinicopathologic features could predict disease-free survival18. The strength of our study is that we analyzed 3-dimensional images from the whole tumor with 3995 derived radiomics features, and thereby took full advantage of the massive volume of radiomic features. The Rad model which added the Rad score to the Clinical model showed better performance than the Clinical model to predict systemic recurrence in TNBC patients.

We validated the Rad model with a different MRI scanner and different protocol. The Rad model of the validation set also showed a high C-index of 0.848. However, the value of 0.848 was lower than the 0.939 of the Clinical model even though there was no statistical difference. In a study on prediction of local tumor control after radio-chemotherapy for head and neck cancer, computed tomography (CT) radiomics were investigated and the results were validated with a different CT model and protocols. A past study experienced lower performance with the clinical plus radiomics models in the validation set than the training set, and they assumed that different CT protocols or parameters would not affect the results26. Another prior study used positron emission tomography–computed tomography radiomics with different scanners to predict the recurrence of cervix cancer and a higher C-index was observed in the training set than in the validation set27. We also used different MRI scanners with the GE scanner being used for the training set and the Philips scanner being used for the validation set, and different protocols were used for the fat-suppressed enhanced images of the training set and subtracted images of the validation set. The Rad model showed lower performance in the validation set. We could not find studies on MRI radiomics that had validated their findings with a different scanner; however, according to an experimental study, the type of MRI machine used and slice thickness can affect the radiomics data22. We assumed that a difference between the two scanners might have caused inconsistent results in the training set. The time of acquisition for the second phase of DCE-MRI was different between the two scanners. ROIs were drawn on the fat-suppressed native images for the training set, but were drawn on the subtracted images for the validation set. Therefore, we had inconsistent results between the training and validation sets.

We acknowledge that there were several limitations in this study. First, the number of included patients was relatively small and the study was performed retrospectively. However, TNBC only comprises a small portion of invasive breast cancer. Second, this study was performed in a single institution. External validation at other institutions is necessary although we validated our results with a different MRI scanner and protocol to minimize overfitting. Third, we only analyzed the second phase of the DCE T1-weighted images. While it would be ideal to include all sequences of the dynamic images for evaluation, we chose one sequence from the DCE-MRI to simplify the methods and results, as the second phase is the most important phase in breast cancer evaluation. Early enhancement is a representative character of breast cancer which is why we chose this sequence. T2-weighted images were valuable in a previous radiomics study18,20 and, therefore, this study should be followed with further studies that include the delayed phase of DCE T1-weighted images or T2-weighted images. Fourth, ROIs were drawn by one radiologist, and inter- and intra-observer variabilities were not evaluated when drawing ROs for the radiomics analysis. However, we used the MIPAV application which drew ROIs along the tumor border that excluded the normal breast parenchyma with a semi-automated process. Thus, this would not have a great effect on the results. In addition, according to Park et al., drawing ROIs between the two observers showed good agreement18.

In conclusion, the Rad model for predicting systemic recurrence in TNBC showed a significantly higher C-index than the Clinical model. However, external validation with a different MRI scanner did not show the Rad model to be superior over the Clinical model.