Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Computer-aided detection of brain metastasis on 3D MR imaging: Observer performance study

  • Leonard Sunwoo ,

    Contributed equally to this work with: Leonard Sunwoo, Young Jae Kim

    Affiliations Department of Radiology, Seoul National University College of Medicine, Seoul, Korea, Department of Radiology, Seoul National University Bundang Hospital, Seongnam, Korea

  • Young Jae Kim ,

    Contributed equally to this work with: Leonard Sunwoo, Young Jae Kim

    Affiliations Department of Biomedical Engineering, Gachon University, Incheon, Korea, Department of Plasma Bio Display, Kwangwoon University, Seoul, Korea

  • Seung Hong Choi ,

    verocay@snuh.org (SHC); kimkg@gachon.ac.kr (K-GK)

    ‡ These authors also contributed equally to this work.

    Affiliations Department of Radiology, Seoul National University College of Medicine, Seoul, Korea, Department of Radiology, Seoul National University Hospital, Seoul, Korea

  • Kwang-Gi Kim ,

    verocay@snuh.org (SHC); kimkg@gachon.ac.kr (K-GK)

    ‡ These authors also contributed equally to this work.

    Affiliation Department of Biomedical Engineering, Gachon University, Incheon, Korea

  • Ji Hee Kang,

    Affiliation Department of Radiology, Seoul National University Hospital, Seoul, Korea

  • Yeonah Kang,

    Affiliation Department of Radiology, Seoul Metropolitan Government - Seoul National University Boramae Medical Center, Seoul, Korea

  • Yun Jung Bae,

    Affiliations Department of Radiology, Seoul National University College of Medicine, Seoul, Korea, Department of Radiology, Seoul National University Bundang Hospital, Seongnam, Korea

  • Roh-Eul Yoo,

    Affiliations Department of Radiology, Seoul National University College of Medicine, Seoul, Korea, Department of Radiology, Seoul National University Hospital, Seoul, Korea

  • Jihang Kim,

    Affiliations Department of Radiology, Seoul National University College of Medicine, Seoul, Korea, Department of Radiology, Seoul National University Bundang Hospital, Seongnam, Korea

  • Kyong Joon Lee,

    Affiliations Department of Radiology, Seoul National University College of Medicine, Seoul, Korea, Department of Radiology, Seoul National University Bundang Hospital, Seongnam, Korea

  • Seung Hyun Lee,

    Affiliation Department of Plasma Bio Display, Kwangwoon University, Seoul, Korea

  • Byung Se Choi,

    Affiliations Department of Radiology, Seoul National University College of Medicine, Seoul, Korea, Department of Radiology, Seoul National University Bundang Hospital, Seongnam, Korea

  • Cheolkyu Jung,

    Affiliations Department of Radiology, Seoul National University College of Medicine, Seoul, Korea, Department of Radiology, Seoul National University Bundang Hospital, Seongnam, Korea

  • Chul-Ho Sohn,

    Affiliations Department of Radiology, Seoul National University College of Medicine, Seoul, Korea, Department of Radiology, Seoul National University Hospital, Seoul, Korea

  • Jae Hyoung Kim

    Affiliations Department of Radiology, Seoul National University College of Medicine, Seoul, Korea, Department of Radiology, Seoul National University Bundang Hospital, Seongnam, Korea

Abstract

Purpose

To assess the effect of computer-aided detection (CAD) of brain metastasis (BM) on radiologists’ diagnostic performance in interpreting three-dimensional brain magnetic resonance (MR) imaging using follow-up imaging and consensus as the reference standard.

Materials and methods

The institutional review board approved this retrospective study. The study cohort consisted of 110 consecutive patients with BM and 30 patients without BM. The training data set included MR images of 80 patients with 450 BM nodules. The test set included MR images of 30 patients with 134 BM nodules and 30 patients without BM. We developed a CAD system for BM detection using template-matching and K-means clustering algorithms for candidate detection and an artificial neural network for false-positive reduction. Four reviewers (two neuroradiologists and two radiology residents) interpreted the test set images before and after the use of CAD in a sequential manner. The sensitivity, false positive (FP) per case, and reading time were analyzed. A jackknife free-response receiver operating characteristic (JAFROC) method was used to determine the improvement in the diagnostic accuracy.

Results

The sensitivity of CAD was 87.3% with an FP per case of 302.4. CAD significantly improved the diagnostic performance of the four reviewers with a figure-of-merit (FOM) of 0.874 (without CAD) vs. 0.898 (with CAD) according to JAFROC analysis (p < 0.01). Statistically significant improvement was noted only for less-experienced reviewers (FOM without vs. with CAD, 0.834 vs. 0.877, p < 0.01). The additional time required to review the CAD results was approximately 72 sec (40% of the total review time).

Conclusion

CAD as a second reader helps radiologists improve their diagnostic performance in the detection of BM on MR imaging, particularly for less-experienced reviewers.

Introduction

Metastatic brain tumors are the most common brain tumors in adults [1]. Unfortunately, brain metastasis (BM) carries a dismal prognosis, with a median survival of only 1 month if left untreated [2]. With the use of whole-brain radiation therapy (WBRT), which has been the primary treatment modality of BM for over 50 years [3], the prognosis of patients with BM remains poor, with a median survival of 4 to 6 months [4]. Because WBRT may induce neurocognitive function impairment in some patients [5, 6], stereotactic radiosurgery alone has been increasingly considered the first-line treatment for patients with limited BM [7, 8]. Additionally, growing evidence suggests that stereotactic radiosurgery can be safely used for patients with up to 10 BM nodules [9, 10]. Thus, the accurate determination of the number, size, and location of metastatic lesions on brain imaging has become crucial for selecting the most appropriate treatment method.

Introduction of three-dimensional (3D) sequences in magnetic resonance (MR) imaging, which allows the acquisition of thin-section thickness images in a reasonable time, has significantly enhanced the sensitivity of BM detection, particularly for small nodules [11]. However, this demands time and effort on radiologists due to the increased number of images, which can be on the order of hundreds for a single patient. In addition, the enhancement of a small vessels may occasionally be confused with a small metastatic nodule on magnetization-prepared rapid-gradient-echo (MP-RAGE) imaging [12, 13], which is currently the most widely used 3D T1-weighted imaging (T1WI) sequence.

Computer-aided detection (CAD) was developed to assist radiologists by providing a second opinion. Previous studies have found that CAD increases the sensitivity of detecting lesions in the breast [1416], lung [1719], and colon [2023]. While CAD has also been applied for the detection of BM on MR imaging [2427], to our knowledge, no studies have yet attempted to validate its usefulness in clinical practice. In this study, we developed CAD software for the detection of BM and conducted an observer performance study. We aimed to assess the effect of CAD of BM on radiologists’ diagnostic performance in interpreting 3D brain MR imaging using follow-up imaging and consensus as the reference standard.

Materials and methods

Observer study cohort

The institutional review board waived the need for written informed consent from the participants because this was a retrospective study, and the patient records and information were anonymized and de-identified prior to analysis. From January 2015 through March 2016, 1751 consecutive MR imaging studies collected using a ‘BM work-up’ protocol from 1435 patients who had confirmed systemic malignancy were selected from the radiology database of Seoul National University Bundang Hospital. Two non-observer neuroradiologists (S.H.C. and B.S.C., with 16 and 18 years of clinical experience, respectively), who had access to the patients’ histories and follow-up imaging studies, determined the reference standard of BM nodules based on consensus. Among these, 353 patients were excluded using the following criteria: (a) presence of metastasis involving bone, dura, or skin, or suspicious lesions for leptomeningeal seeding (n = 129); (b) presence of other pathological conditions, such as meningioma, vestibular schwannoma, pituitary adenoma, cavernous malformation, or hemorrhagic infarction (n = 64); (c) presence of equivocal nodule(s) determined to be BM (n = 99); (d) presence of excessive artifacts or poor image quality (n = 31); and (e) presence of more than 50 metastatic nodules (n = 30). For patients who underwent multiple MR imaging studies during the period, one study was chosen. After the initial selection, 80 patients with the presence of BM according to studies performed in 2015 were designated as the training set. Next, 30 patients with the presence of BM according to studies performed in 2016 were designated as the test set. Among the 236 patients without evidence of BM on MR studies performed in the same period, 30 patients were randomly chosen after age and sex matching and included in the test set (Fig 1).

thumbnail
Fig 1. Flow diagram for patient selection.

The diagram shows the initial case selection and final distribution of study cases into the training set and test set. Jan = January, Mar = March.

https://doi.org/10.1371/journal.pone.0178265.g001

Image acquisition

MR images were obtained with a 1.5-T (Intera; Philips Healthcare, Best, the Netherlands) or 3-T (Achieva or Ingenia; Philips Healthcare) MR scanner with an 8- or 32- channel head coil. MR imaging parameters for the 3D gradient-echo sequence (GRE) were as follows: field-of-view, 240 × 240 mm2; acquisition matrix, 240 × 240; slice thickness, 1 mm; number of excitations, 1; repetition time (TR), 8–10.6 msec; echo time (TE), 3.7–5.7 msec; and flip angle, 8°. For contrast enhancement, gadobutrol (Gadovist®, Bayer Schering Pharma AG, Berlin, Germany; 0.1 mmol/kg) was injected as a bolus intravenously. While CAD analyzed the 3D GRE contrast-enhanced T1WI only, non-observer reviewers (S.H.C. and B.S.C.) also assessed other imaging sequences in the routine protocol, including pre-contrast T1WI, T2-weighted images (T2WI), and fluid-attenuated inversion recovery (FLAIR) images.

Development of CAD software

The algorithm of the developed CAD software are classified into brain segmentation-phase, BM candidate detection-phase and BM discrimination-phase algorithms. Fig 2 shows the complete flowchart of the proposed algorithms.

thumbnail
Fig 2. Flow diagram of our proposed CAD algorithms.

TP = true positive, FP = false positive, ANN = artificial neural network.

https://doi.org/10.1371/journal.pone.0178265.g002

Normalization.

While the attenuation values of CT are absolute values, the signal intensity of MR imaging is a relative value. Therefore, the range of signal intensity differs depending on the scanning parameters. To solve this problem, we normalized the image by resampling the signal of the whole image to the same range based on the signal intensity at the initial seed position manually selected in the gray matter.

Brain segmentation.

We attempted to limit the region of interest to the brain by extracting the brain tissue from the source MR images. Restricting the algorithm to the brain region may reduce the potential false-positive (FP) nodules in anatomical structures outside the brain region.

A 3D spherical-based seed region growing (SSRG) algorithm was used for brain segmentation based on the manually determined seed position in the gray matter. Seed region growing (SRG) is a general method of segmenting a homogeneous region by 3D expansion from a seed position (x, y, z). The SRG algorithm expands the region pixel by pixel [28, 29]. Therefore, when the signal intensity of a brain region is similar to those of neighboring structures, the brain segmentation might fail with only one pixel. To resolve this problem, we developed the SSRG algorithm, which expands the region when all pixels within the sphere comply with the expansion conditions.

BM candidate detection.

BM typically has a spheroid-like structure and shows contrast enhancement on T1WI. Thus, BMs usually have well-defined borders with the surrounding anatomical tissue [30, 31]. However, large BMs tend to have irregular shapes. In addition, when internal necrosis is present, BM may appear as a peripheral rim-enhancing lesion. We proposed two types of algorithms according to the size of the nodules based on the characteristics of typical BM morphologies.

First, we used a 3D template-matching algorithm for BM detection with a small spheroid-like structure. Specifically, we used two spherical template models (a solid type and an inner-hole type) to compensate for the internal necrosis. The size of the voxel was determined by considering the ratio between the in-plane pixel spacing and slice thickness. Three templates were created for each of the two models and had diameters of 2 mm, 3 mm, and 4 mm. The size of the inner hole was determined to be 50% of each template. Fig 3 shows the various templates created for each size and type.

thumbnail
Fig 3. Six spherical templates by sizes (2, 3, and 4 mm) and types (solid and inner-hole).

https://doi.org/10.1371/journal.pone.0178265.g003

Within the extracted brain volume, we performed a convolution of the brain volume using the template models. We detected BM candidates by evaluating the similarity in each position in the brain volume. The normalized cross correlation (NCC) was selected as the similarity measure because it is independent of the voxel attenuation, as defined in (Eq 1) [32, 33]. (1) where n is the count of pixels, f(x, y, z) is the brain image, t(x, y, z) is the template, and and are the means of the brain image and template, respectively. σf and σt are the standard deviations of the brain image and template, respectively.

We initially detected image coordinates that exceeded the experimentally determined threshold value of similarity measured by NCC in the brain volume. Then, labelling was performed for the detected coordinates, and a 3D spherical region was created using the center position of each label and the radius of the template. Finally, 3D spherical regions were considered as potential candidates.

Next, we used a K-means clustering algorithm for the detection of large BM nodules with irregular shapes. K-means clustering is one of the simplest unsupervised classification techniques and is widely used due to its simplicity. K-means clustering is an algorithm for grouping data into k clusters. The data are distributed over the nearest cluster by calculating the Euclidean distance between the data and the center of each cluster [34, 35].

We defined seven clusters (i.e., attenuation of enhanced tissues, ambiguous attenuation between enhanced tissues and white matter, attenuation of white matter, ambiguous attenuation between white matter and gray matter, attenuation of gray matter, ambiguous attenuation between gray matter and necrotic tissue, and attenuation of necrotic tissue) and then performed K-means clustering on the attenuation of all coordinates in the brain images. Next, we aligned each cluster to a mean value of attenuation. On the aligned clusters, the ends had the highest or lowest attenuation. In other words, there is a high probability that clusters at both ends represent enhanced BM or BM including necrotic tissue. We performed 3D labelling on the coordinates of clusters at both ends. Morphological features were calculated for each label and used for the discrimination of BM. Finally, the labels with the feature values greater than the experimentally defined thresholds were considered as potential candidates. Other labels were defined as FP results and deleted.

BM discrimination from the candidates using machine learning.

We removed the FP nodules from the BM candidates to improve the accuracy. For the discrimination of the nodule candidates, we used the artificial neural network (ANN) algorithm, which is a machine learning technique. ANNs are mathematical models based on biological neural networks [36]. They consist of interconnected groups of artificial neurons organized into layers. We used three layers: the input, output and hidden layers (Fig 4). The input layer consisted of 30 neurons, and we used 30 features measured from the BM candidate images as input neurons.

thumbnail
Fig 4. Example of an ANN for FP reduction of BM candidates using computer features.

https://doi.org/10.1371/journal.pone.0178265.g004

We initially selected 272 features based on the histogram, morphology, and texture [3739]. From among these, the following 30 features were chosen using logistic regression analysis: volume, min, max, mean, standard deviation, variance, skewness, kurtosis, energy, entropy, fractal dimension (box counting), gray level co-occurrence matrix (GLCM)-contrast, GLCM-dissimilarity, GLCM-homogeneity, GLCM-angular second moment (ASM), GLCM-energy, GLCM-probability max, GLCM-entropy, GLCM-correlation, GLCM-mean reference, GLCM-mean neighbor, GLCM-variance reference, GLCM-variance neighbor, GLCM-standard deviation reference, GLCM-standard deviation neighbor, gray level run length matrix (GLRLM)-long run emphasis (LRE), GLRLM-gray level non-uniformity (GLN), GLRLM-run length non-uniformity (RLN), GLRLM-low gray level run emphasis (LGRE), and GLRLM-high gray level run emphasis (HGRE). The output layer consisted of two neurons representing BM and non-BM.

The ANN model used in our study had a feed-forward architecture and was trained by using the back-propagation algorithm with the hyperbolic tangent activation function (1.7159 tanh 2/3 x) [40]. The result of an output node represents the likelihood that a nodule may be classified into each corresponding class. Thus, in this study, the output was interpreted as the probability that a BM candidate is a true-positive (TP) nodule.

Thresholds of nodule detection.

The main algorithms we used in our CAD software were template-matching and K-means clustering. These algorithms use a threshold value to determine the BM candidates, and the detection result depends on the threshold value. Lower threshold values provide higher sensitivity and more FP results (algorithm A). In contrast, higher threshold values provide lower sensitivity and fewer FP results (algorithm B). Thus, we developed two versions of the CAD software using algorithm A and algorithm B and applied them in the experiments.

Clinically, it is important to detect as many BM nodules as possible. Therefore, we selected algorithm A as the main algorithm, and observer performance was evaluated using the CAD software with algorithm A. In addition, the stand-alone performances were evaluated using both algorithm A and algorithm B.

Observer performance study

Four radiologists who were blinded to the patients’ histories and pathological data independently reviewed MR image sets in a random order. Reviewers 1 and 2 were radiology residents (Y.K. and J.H.K.; in the fourth year and second year of training, respectively), and reviewers 3 and 4 (L.S. and R-E.Y.) were board-certified neuroradiologists with 7 years of clinical experience. Review sessions were performed in a sequential manner [17, 21]. First, a reviewer searched for potential nodules on each study without the use of CAD marking (referred to as without CAD). The reviewers were encouraged to identify all BM candidates regardless of their size and to record their confidence score based on the likelihood that the candidate was a true BM using a five-point scale (1 = definitely not a BM, 2 = probably not a BM, 3 = indeterminate, 4 = probably a BM, 5 = definitely a BM). When the reviewer completed nodule detection for each case, the reading time was automatically recorded. Then, the reviewer reviewed each marked nodule to assign a confidence score.

Second, once score assignment was complete, pre-processed CAD markings with probability scores determined using the CAD algorithm with maximized sensitivity were displayed. The reviewer was then allowed to add any new nodules or remove previously marked nodules. The reviewer was also allowed to modify the confidence scores. The additional reading time was automatically recorded separately. This second reading session was referred to as with CAD. A video clip of a sample sequential reading session in our study can be found in S1 Video.

Statistical analysis

To determine the improvement in the diagnostic accuracy using CAD as a second reader, a jackknife free-response receiver operating characteristic (JAFROC) analysis was performed [41, 42]. JAFROC software (version 4.2.1; http://www.devchakraborty.com) was used to compute a figure-of-merit (FOM), which is defined as the probability that lesions, including unmarked lesions, are rated higher than non-lesion marks (analogous to the area under the receiver operating characteristic curve).

The sensitivities and FP markings per patient of the reviewers and the CAD algorithms were evaluated. Among the nodules marked by the reviewers, those with confidence scores equal to or higher than 3 were considered positive, whereas those with confidence scores of 1 and 2 were considered negative. Subgroup analysis on a patient-by-patient basis was also performed, in which a reviewer’s assessment was assumed to be correct when at least one lesion was correctly detected for patients with BM or when no lesion was marked for control studies. If no lesion was correctly marked in a study with BM, or if an FP nodule was marked in a control study, then the assessment was considered incorrect.

Fisher’s exact test, the Mann-Whitney U test, the Wilcoxon test, and Pearson’s correlation were used to analyze the demographic data of the subjects and the reading time of the reviewers. Statistical analyses were performed with SPSS (version 24.0 for Windows, SPSS, Chicago, IL, USA) or MedCalc (version 16.8.4, MedCalc Software, Mariakerke, Belgium). P values of less than 0.05 were considered to be statistically significant.

Results

Patient demographics

The clinical characteristics of the subjects are summarized in Table 1. The primary malignancies that the patients harbored included lung cancer (n = 112), breast cancer (n = 13), colorectal cancer (n = 5), renal cell carcinoma (n = 3), melanoma (n = 1), ovarian cancer (n = 1), hepatocellular carcinoma (n = 1), gastric cancer (n = 1), follicular thyroid carcinoma (n = 1), cutaneous squamous cell carcinoma (n = 1), osteosarcoma (n = 1), and synovial sarcoma (n = 1). One patient with lung cancer was also diagnosed with advanced gastric cancer.

The training set consisted of 80 patients with 450 metastatic nodules, and the test set included 134 metastatic nodules from 30 patients with BM. The distribution of the nodule sizes is shown in Fig 5. No significant difference was found in the median size of the nodules between the two sets. However, the proportion of small nodules (1 to 3 mm in diameter) was significantly larger in the test set than in the training set (p = 0.01).

thumbnail
Fig 5. Bar graph of the nodule size distributions in the training and test sets.

The relative frequency of nodules with diameters of 1 to 3 mm differed significantly between the two groups (p = 0.01).

https://doi.org/10.1371/journal.pone.0178265.g005

Stand-alone performance of CAD

Two CAD algorithms were independently analyzed (Table 2). Algorithm A exhibited a sensitivity of 87.3% (117/134 nodules) and an FP per patient of 302.4. In contrast, algorithm B showed a sensitivity of 75.4% (101/134 nodules) and an FP per patient of 35.5. For algorithm A, Fig 6 shows examples of TP and FP nodules identified using CAD. No significant difference was found in the median processing time between the two algorithms (264.7 sec vs. 268.6 sec, p = 0.52). For both algorithms, the probability score was significantly higher in the metastasis group than in the non-metastasis group (p < 0.01 and p < 0.01, respectively). When tiny nodules less than or equal to 2 mm in diameter were excluded, the sensitivity was increased to 92.7% (89/96 nodules) for algorithm A and 82.3% for algorithm B (79/96 nodules).

thumbnail
Fig 6. Examples of CAD results using algorithm A.

A–D: Examples of the correct detection of BM by CAD software. E–H: Examples of the incorrect detection (FPs) by CAD software. Common sources of FPs included the cortical vessel (F), dural sinus (G), and choroid plexus (H).

https://doi.org/10.1371/journal.pone.0178265.g006

thumbnail
Table 2. Comparison of the nodule detection performances of algorithm A and algorithm B.

https://doi.org/10.1371/journal.pone.0178265.t002

Observer performance study

The performances of the reviewers before and after the application of CAD are summarized in Table 3. The average sensitivity and FP per patient for BM detection without CAD by the four reviewers were 77.6% and 0.18, respectively. With CAD, the sensitivity and FP per patient were 81.9% and 0.18, respectively. According to JAFROC analysis, the FOM value was significantly increased by the use of CAD (0.87 without CAD vs. 0.90 with CAD, p < 0.01).

thumbnail
Table 3. Comparison of the reviewers’ nodule detection performances.

https://doi.org/10.1371/journal.pone.0178265.t003

For the radiology residents (reviewers 1 and 2), the sensitivity and FP per patient without CAD were 67.9% and 0.10, respectively. With CAD, the sensitivity was improved to 76.1%, while the FP per patient was slightly elevated to 0.12. For the neuroradiologists (reviewers 3 and 4), the sensitivity and FP per patient without CAD were 87.3% and 0.25, respectively. After reviewing the CAD results, the sensitivity and FP per patient changed to 88.7% and 0.25, respectively. Specifically, the two residents found 22 TP nodules and five FP nodules upon reviewing the CAD results. However, they were also able to remove three FP nodules with the aid of CAD. The experienced reviewers detected two additional TP nodules and three additional FP nodules with CAD but discarded one TP nodule and three FP nodules. Overall, the use of CAD led to the detection of 23 TP nodules at the cost of 2 additional FP nodules by the four reviewers. Per-reviewer JAFROC analysis revealed that both reviewers 1 and 2 showed significant improvement in their nodule detection performance (p = 0.01 and p < 0.01, respectively), whereas neither reviewers 3 nor 4 exhibited a statistically significant improvement (p = 0.19 and p = 0.67, respectively). A representative case is shown in Fig 7.

thumbnail
Fig 7. 3D gradient-echo contrast-enhanced T1-weighted MR images in an 81-year-old female patient with metastatic lung cancer.

A and B: Axial (A) and coronal (B) images show a tiny enhancing nodule at the left inferior temporal gyrus (arrowhead). This nodule was missed by all four reviewers but was successfully detected by CAD. C: On the navigation MR image for a gamma-knife surgery performed 2 days after (A) and (B), the nodule showed no interval changes. D: On the follow-up MR image taken after 3 months, the nodule disappeared.

https://doi.org/10.1371/journal.pone.0178265.g007

When tiny nodules with diameters less than or equal to 2 mm were excluded, the average sensitivities for less-experienced reviewers were 85.4% without CAD and 90.1% with CAD. For experienced reviewers, the average sensitivities were 93.2% without CAD and 93.8% with CAD.

Among the 30 patients with BM, reviewers failed to detect at least one TP nodule in 6.7% (8/120) of the cases. Notably, CAD successfully detected all of the missed nodules. With the aid of CAD, the reviewers detected three initially missed nodules; thus, the reviewers detected at least one TP nodule in 95.8% (115/120 cases). Among the 30 patients without BM, reviewers detected at least one FP nodule in 5% (6/120 cases). After reviewing the CAD results, one reviewer successfully removed one FP nodule; thus, the reviewers found at least one FP nodule in 4.2% (5/120) of cases. Overall, the reviewers correctly classified patients without CAD and with CAD in 94.2% (226/240) and 95.8% (230/240) of the cases, respectively.

The median reading times without and with CAD were 114.4 sec and 72.1 sec, respectively. No significant difference was found in the overall reading time between less-experienced and experienced reviewers (178.5 sec vs. 174.3 sec, p = 0.13). However, less-experienced reviewers spent significantly less time than experienced reviewers in reviewing the images without CAD (98.5 sec vs. 121.5 sec, p < 0.01). In contrast, less-experienced reviewers spend significantly more time than experienced reviewers on reviewing the CAD results (74.3 sec vs. 58.3 sec, p < 0.01). We found only a weak positive trend between the number of total nodules detected by CAD and the additional reading time with CAD (r = 0.24, p = 0.06).

The total reading time for patients with BM was significantly longer than that for patients without BM (202.8 sec vs. 161.3 sec, p < 0.01). Although the reading time without CAD differed significantly between patients with BM and without BM (144.5 sec vs. 94.4 sec, p < 0.01), the reading time with CAD was not significantly different between the two groups (59.4 sec vs. 76.0 sec, p = 0.38).

Discussion

In the present study, we developed CAD software, evaluated its stand-alone performance, and conducted an observer performance study. The sensitivity of the CAD software itself was between that of the experienced neuroradiologists and the radiologists in training. CAD significantly improved the diagnostic performances of the four reviewers, as indicated by the FOM determined by JAFROC analysis (without CAD vs. with CAD, 0.874 vs. 0.898, p < 0.01). The median time required to review the CAD results was approximately 72 sec (40% of the total review time). Notably, the two trainees detected 22 additional TP nodules after reviewing the CAD results. Although CAD significantly improved the overall performance of the reviewers, a statistically significant improvement was noted only for less-experienced reviewers (FOM without vs. with CAD, 0.834 vs. 0.877, p < 0.01).

Technical advances in 3D MR imaging have significantly improved the sensitivity of BM detection. However, concomitantly increased numbers of images per study have raised the burden of reading and the risk of detection failure. Missed BM nodules may underestimate the cancer staging, lead to inappropriate treatment, and negatively affect the prognosis. To address this issue, efforts have been increasingly focused on improving the diagnostic accuracy using CAD. CAD does not overlook a lesion because of exhaustion or other extrinsic factors. Thus, when used as a second reader, CAD may be feasible for time-consuming tasks, such as detecting BM nodules.

The sensitivities of BM detection in previous CAD studies ranged from 30.2% to 93.5% [2427], which are comparable to that of our study. However, the FP per patient in previous studies ranged from 5.18 to 34.8 [2426], which are lower than that of our study. In contrast to all but one of these studies [25], we enrolled consecutive patients to minimize selection bias. However, whereas the other study [25] enrolled a small cohort of patients in a prospective manner, we enrolled a relatively large cohort in a retrospective manner. Our data contained a relatively high proportion of nodules equal to or smaller than 3 mm in diameter. Additionally, this proportion was higher in the test set than in the training set (Fig 5, 43.3% vs. 31.1%, p < 0.01). Therefore, the inclusion of a larger proportion of small or less-conspicuous nodules (i.e., nodules that are relatively difficult to detect), at least partially due to consecutive enrolment, might have affected the overall performance observed in our study. When nodules smaller than 2 mm were removed, the sensitivity was improved (from 87.3% to 92.7% for algorithm A).

When unassisted, neuroradiologists showed higher sensitivity for BM detection than the radiology residents at the cost of slightly more FPs. However, the less-experienced reviewers seem to have benefited more from the aid provided by CAD than the experienced reviewers. This finding is consistent with previous studies of CAD for computed tomography (CT) colonography [2022]. While the reviewers detected a total of 23 additional TP nodules after reviewing the CAD results, the use of CAD also resulted in the detection of two additional FP nodules. This increase in the FP per case was minimal given the large number of FP nodules identified by CAD. Indeed, most of the FP nodules detected by CAD were easily rejected by human reviewers because of their typical locations (Fig 6). The weak correlation between the number of nodules marked by CAD and the time spent on reviewing the CAD results also supports this observation. In addition, the significant improvement in FOM with the use of CAD suggests that the increased FP was disproportionately offset by increased sensitivity.

The strategy of our proposed algorithm was to first detect the BM candidates as sensitively as possible and then discriminate TP nodules from FP nodules. We used a template-matching algorithm to find small BMs. While other similar studies used larger templates with a minimum diameter of 3.4 mm, we were able to find smaller nodules by using smaller templates. In addition, other studies used only one type of template model [24, 26], whereas we used two spherical types of template models (solid and inner-hole to detect necrotic nodules. In our data, the actual size of one voxel was 1.0 × 1.0 × 1.0 mm3. Hence, an 1-mm template would cover only one voxel, which is too small for accurate BM detection. Thus, we determined that the minimum template size is 2 mm. Interestingly, we were able to detect some BM nodules that were 1 mm in size using a 2-mm-diameter template. We speculate that the difference in size between the template and the BM is one cause of the increased FPs. We expect to reduce the numbers of FPs by using a 1-mm template on higher-resolution images.

We removed the FPs using an ANN algorithm, which is a type of machine learning technique. We selected 30 out of 272 features using logistic regression analysis to effectively reduce the FPs. The ANN algorithm was superior to other machine learning classifiers in our training data, for which the support vector machine (SVM) algorithm [43] showed an accuracy of 57.9%, the Bayes classifier algorithm [44] showed an accuracy of 83.2%, and the boosting algorithm [45] showed an accuracy of 83.1%; the accuracy of the ANN algorithm was 87.7%. Despite the use of the ANN algorithm, approximately 12% of the correctly detected nodules were removed during the FP-removal process. To reduce the chance of removing a correctly detected nodule, the amount of training data should be increased, and BMs of various sizes and shapes should be included. In addition, the features used in the ANN model should be further optimized.

Our proposed method required approximately 4 min to process the MR images. This is much shorter than the processing times reported in other studies [24, 26], which ranged from 15 to 50 min. In addition, the time needed to review the CAD results was, on average, approximately 72 sec. Therefore, once the CAD results using our proposed method can be provided to the radiologists before reading, this strategy could be applied to clinical practice with an acceptable range of extra time.

In addition to the retrospective nature of this study, our study has limitations. First, most of the subjects with BM did not undergo pathologic confirmation of the brain lesions. To address this problem, two independent reviewers determined the ground truth based on consensus with access to clinical information and follow-up imaging studies. Second, although we included a relatively large number of subjects compared to previous studies, the sample size was still too small to train the algorithm sufficiently. In the future, we believe that the performance could be improved by using a larger amount of data and more recent algorithms, such as convolutional neural networks.

Conclusions

In conclusion, using CAD as a second reader helps radiologists improve their diagnostic performance in the detection of BM on MR imaging, particularly for less-experienced reviewers.

Supporting information

S1 Dataset. Dataset for subjects in the training and test sets.

https://doi.org/10.1371/journal.pone.0178265.s001

(XLSX)

S1 Video. A video clip of a sample sequential reading session using CAD.

On brain MR images with a 74-year-old male patient with lung cancer, the reviewer initially detects four metastatic nodules without CAD, and then detects additional one metastatic nodule with the aid of CAD.

https://doi.org/10.1371/journal.pone.0178265.s002

(ZIP)

Acknowledgments

The authors acknowledge the assistance of Jungmi Jo. The authors are also grateful to the Medical Research Collaborating Center of Seoul National University Bundang Hospital for assisting with the statistical analysis.

Author Contributions

  1. Conceptualization: LS YJK SHC K-GK.
  2. Data curation: LS YJK SHC Ji Hee Kang YK YJB BSC.
  3. Formal analysis: LS YJK.
  4. Funding acquisition: LS SHC.
  5. Investigation: LS YJK CJ.
  6. Methodology: LS YJK SHC K-GK.
  7. Project administration: SHC K-GK.
  8. Resources: LS YJK SHC K-GK.
  9. Software: YJK K-GK SHL.
  10. Supervision: SHC K-GK JK KJL C-HS Jae Hyoung Kim.
  11. Validation: Ji Hee Kang YK YJB R-EY.
  12. Visualization: LS YJK.
  13. Writing – original draft: LS YJK.
  14. Writing – review & editing: SHC K-GK JK C-HS.

References

  1. 1. Gavrilovic IT, Posner JB. Brain metastases: Epidemiology and pathophysiology. J Neurooncol. 2005;75: 5–14. pmid:16215811
  2. 2. Richards P, Mckissock W. Intracranial Metastases. Br Med J. 1963;1: 15–18. pmid:13982100
  3. 3. Chao J -H, Phillips R, Nickson JJ. Roentgen-ray therapy of cerebral metastases. Cancer. 1954;7: 682–689. pmid:13172684
  4. 4. Sundstrom JT, Minn H, Lertola KK, Nordman E. Prognosis of patients treated for intracranial metastases with whole-brain irradiation. Ann Med. 1998;30: 296–299. pmid:9677016
  5. 5. Chang EL, Wefel JS, Hess KR, Allen PK, Lang FF, Kornguth DG, et al. Neurocognition in patients with brain metastases treated with radiosurgery or radiosurgery plus whole-brain irradiation: a randomised controlled trial. Lancet Oncol. 2009;10: 1037–1044. pmid:19801201
  6. 6. Tallet A V, Azria D, Barlesi F, Spano J-P, Carpentier AF, Gonçalves A, et al. Neurocognitive function impairment after whole brain radiotherapy for brain metastases: actual assessment. Radiat Oncol. 2012;7: 77. pmid:22640600
  7. 7. Gantery El MM, Baky El HMA, Hossieny El HA, Mahmoud M, Youssef O. Management of brain metastases with stereotactic radiosurgery alone versus whole brain irradiation alone versus both. Radiat Oncol. 2014;9: 116. pmid:24884624
  8. 8. Aoyama H, Hiroki S, Tago M, Nakagawa K, Toyoda T, Hatano K, et al. Stereotactic radiosurgery plus whole-brain radiation therapy vs stereotactic radiosurgery alone for treatment of brain metastases. JAMA. 2006;295: 2483–2491. pmid:16757720
  9. 9. Yamamoto M, Serizawa T, Shuto T, Akabane A, Higuchi Y, Kawagishi J, et al. Stereotactic radiosurgery for patients with multiple brain metastases (JLGK0901): A multi-institutional prospective observational study. Lancet Oncol. 2014;15: 387–395. pmid:24621620
  10. 10. Chang WS, Kim HY, Chang JW, Park YG, Chang JH. Analysis of radiosurgical results in patients with brain metastases according to the number of brain lesions: is stereotactic radiosurgery effective for multiple brain metastases? J Neurosurg. 2010;113: 73–8. pmid:21121789
  11. 11. Kakeda S, Korogi Y, Hiai Y, Ohnari N, Moriya J, Kamada K, et al. Detection of brain metastasis at 3T: Comparison among SE, IR-FSE and 3D-GRE sequences. Eur Radiol. 2007;17: 2345–2351. pmid:17318603
  12. 12. Kato Y, Higano S, Tamura H, Mugikura S, Umetsu A, Murata T, et al. Usefulness of contrast-enhanced T1-weighted sampling perfection with application-optimized contrasts by using different flip angle evolutions in detection of small brain metastasis at 3T MR imaging: Comparison with magnetization-prepared rapid acquisition. AJNR Am J Neuroradiol. 2009;30: 923–929. pmid:19213825
  13. 13. Park J, Kim EY. Contrast-enhanced, three-dimensional, whole-brain, black-blood imaging: Application to small brain metastases. Magn Reson Med. 2010;63: 553–561. pmid:20187162
  14. 14. Chan H-P, Kunio D, Vybrony CJ, Schmidt RA, Metz CE, Lam KL, et al. Improvement in radiologists’ detection of clustered microcalcifications on mammograms: the potential of computer-aided diagnosis. Invest Radiol. 1990;25: 1102–1110. pmid:2079409
  15. 15. Freer TW, Ulissey MJ. Screening mammography with computer-aided detection: prospective study of 12,860 patients in a community breast center. Radiology. 2001;220: 781–6. pmid:11526282
  16. 16. Cupples TE, Cunningham JE, Reynolds JC. Impact of computer-aided detection in a regional screening mammography program. Am J Roentgenol. 2005;185: 944–950. pmid:16177413
  17. 17. Kobayashi T, Xu X-W, MacMahon H, Metz CE, Doi K. Effect of a computer-aided diagnosis scheme on radiologists’ performance in detection of lung nodules on radiograph. Radiology. 1996;199: 843–848. pmid:8638015
  18. 18. Xu Y, Ma D, He W. Assessing the use of digital radiography and a real-time interactive pulmonary nodule analysis system for large population lung cancer screening. Eur J Radiol. 2012;81: e451–e456. pmid:21621935
  19. 19. Sahiner B, Chan HP, Hadjiiski LM, Cascade PN, Kazerooni EA, Chughtai AR, et al. Effect of CAD on radiologists’ detection of lung nodules on thoracic CT scans: analysis of an observer performance study by nodule size. Acad Radiol. 2009;16: 1518–1530. pmid:19896069
  20. 20. Baker ME, Bogoni L, Obuchowski NA, Dass C, Kendzierski RM, Remer EM, et al. Computer-aided detection of colorectal polyps : can it improve sensitivity of less-experienced readers? preliminary findings1. Radiology. 2007;245: 140–149. pmid:17885187
  21. 21. Petrick N, Haider M, Summers RM, Yeshwant SC, Brown L, Iuliano EM, et al. CT colonography with computer-aided detection as a second reader: observer performance study. Radiology. 2008;246: 148–156. pmid:18096536
  22. 22. Taylor SA, Charman SC, Lefere P, Mcfarland EG, Paulson EK, Yee J, et al. CT colonography: Investigation of the optimum reader paradigm by using computer-aided detection software1. Radiology. 2008;246: 463–471. pmid:18094263
  23. 23. Dachman AH, Obuchowski N a, Hoffmeister JW, Hinshaw JL, Frew MI, Winter TC, et al. Effect of computer-aided detection for CT colonography. Radiology. 2010;256: 827–835. pmid:20663975
  24. 24. Ambrosini RD, Wang P, O’Dell WG. Computer-aided detection of metastatic brain tumors using automated three-dimensional template matching. J Magn Reson Imaging. 2010;31: 85–93. pmid:20027576
  25. 25. Farjam R, Parmar HA, Noll DC, Tsien CI, Cao Y. An approach for computer-aided detection of brain metastases in post-Gd T1-W MRI. Magn Reson Imaging. 2012;30: 824–836. pmid:22521993
  26. 26. Pérez-Ramírez Ú, Arana E, Moratal D. Brain metastases detection on MR by means of three-dimensional tumor-appearance template matching. J Magn Reson Imaging. 2016;0: 1–11. pmid:26934581
  27. 27. Yang S, Nam Y, Kim M-O, Kim EY, Park J, Kim D-H. Computer-aided detection of metastatic brain tumors using magnetic resonance black-blood imaging. Invest Radiol. 2013;48: 113–9. pmid:23211553
  28. 28. Pohle R, Toennies KD. Segmentation of medical images using adaptive region growing. SPIE 4322, Medical Imaging 2001. 2001. pp. 1337–1346. 10.1117/12.431013
  29. 29. Revol-Muller C, Peyrin F, Carrillon Y, Odet C. Automated 3D region growing algorithm based on an assessment function. Pattern Recognit Lett. 2002;23: 137–150.
  30. 30. Jagannathan J, Sherman JH, Mehta GU, Chin LS. Radiobiology of brain metastasis: applications in stereotactic radiosurgery. Neurosurg Focus. 2007;22: E4.
  31. 31. Ranasinghe MG, Sheehan JM. Surgical management of brain metastases. Neurosurg Clin N Am. 2007;22: E2. pmid:21109149
  32. 32. Pérez-Ramírez Ú, Arana E, Moratal D. Computer-aided detection of brain metastases using a three-dimensional template-based matching algorithm. Conf Proc. Annu Int Conf IEEE Eng Med Biol Soc IEEE Eng Med Biol Soc Annu Conf. 2014;2014: 2384–2387. 10.1109/EMBC.2014.6944101
  33. 33. Sarvaiya JN, Patnaik S, Bombaywala S. Image registration by template matching using normalized cross-correlation. International Conference on Advances in Computing, Control, and Telecommunication Technologies. 2009. pp. 819–822. 10.1109/ACT.2009.207
  34. 34. Juang L-HH, Wu M-NN. MRI brain lesion image detection based on color-converted K-means clustering segmentation. Measurement. 2010;43: 941–949.
  35. 35. Lee GN, Fujita H. K-means clustering for classifying unlabelled MRI data. Proceedings—Digital Image Computing Techniques and Applications: 9th Biennial Conference of the Australian Pattern Recognition Society, DICTA 2007. 2007. pp. 92–98. 10.1109/DICTA.2007.4426781
  36. 36. Bollschweiler EH, Mönig SP, Hensler K, Baldus SE, Maruyama K, Hölscher AH. Artificial neural network for prediction of lymph node metastases in gastric cancer: a phase II diagnostic study. Ann Surg Oncol. 2004;11: 506–11. pmid:15123460
  37. 37. Castellano G, Bonilha L, Li LM, Cendes F. Texture analysis of medical images. Clin Radiol. 2004;59: 1061–9. pmid:15556588
  38. 38. Sergyan S. Color histogram features based image classification in content-based image retrieval systems. 2008 6th International Symposium on Applied Machine Intelligence and Informatics. 2008. pp. 221–224. 10.1109/SAMI.2008.4469170
  39. 39. Korchiyne R, Farssi SM, Sbihi A, Touahni R, Tahiri Alaoui M. A combined method of fractal and GLCM features for MRI and CT scan images classification. Signal Image Process An Int J. 2014;5: 85–97.
  40. 40. Montavon G, Orr GGB, Müller K-R, LeCun Y, Bottou L, Orr GGB, et al. Neural networks: Tricks of the trade [Internet]. Springer Lecture Notes in Computer Sciences. 1998.
  41. 41. Chakraborty DP, Berbaum KS. Observer studies involving detection and localization: modeling, analysis, and validation. Med Phys. 2004;31: 2313–2330. pmid:15377098
  42. 42. Chakraborty DP. Analysis of location specific observer performance data: validated extensions of the jackknife free-response (JAFROC) method. Acad Radiol. 2006;13: 1187–1193. pmid:16979067
  43. 43. Zacharaki EI, Wang S, Chawla S, Yoo DS, Wolf R, Melhem ER, et al. Classification of brain tumor type and grade using MRI texture and shape in a machine learning scheme. Magn Reson Med. 2009;62: 1609–1618. pmid:19859947
  44. 44. Ramoni M, Sebastiani P. Robust bayes classifiers. Artif Intell. 2001;125: 209–226.
  45. 45. Dettling M, Bühlmann P. Boosting for tumor classification with gene expression data. Bioinformatics. 2003;19: 1061–1069. pmid:12801866