Skip to main content
Top
Published in: Journal of Digital Imaging 4/2016

01-08-2016

A Comparison of Lung Nodule Segmentation Algorithms: Methods and Results from a Multi-institutional Study

Authors: Jayashree Kalpathy-Cramer, Binsheng Zhao, Dmitry Goldgof, Yuhua Gu, Xingwei Wang, Hao Yang, Yongqiang Tan, Robert Gillies, Sandy Napel

Published in: Journal of Imaging Informatics in Medicine | Issue 4/2016

Login to get access

Abstract

Tumor volume estimation, as well as accurate and reproducible borders segmentation in medical images, are important in the diagnosis, staging, and assessment of response to cancer therapy. The goal of this study was to demonstrate the feasibility of a multi-institutional effort to assess the repeatability and reproducibility of nodule borders and volume estimate bias of computerized segmentation algorithms in CT images of lung cancer, and to provide results from such a study. The dataset used for this evaluation consisted of 52 tumors in 41 CT volumes (40 patient datasets and 1 dataset containing scans of 12 phantom nodules of known volume) from five collections available in The Cancer Imaging Archive. Three academic institutions developing lung nodule segmentation algorithms submitted results for three repeat runs for each of the nodules. We compared the performance of lung nodule segmentation algorithms by assessing several measurements of spatial overlap and volume measurement. Nodule sizes varied from 29 μl to 66 ml and demonstrated a diversity of shapes. Agreement in spatial overlap of segmentations was significantly higher for multiple runs of the same algorithm than between segmentations generated by different algorithms (p < 0.05) and was significantly higher on the phantom dataset compared to the other datasets (p < 0.05). Algorithms differed significantly in the bias of the measured volumes of the phantom nodules (p < 0.05) underscoring the need for assessing performance on clinical data in addition to phantoms. Algorithms that most accurately estimated nodule volumes were not the most repeatable, emphasizing the need to evaluate both their accuracy and precision. There were considerable differences between algorithms, especially in a subset of heterogeneous nodules, underscoring the recommendation that the same software be used at all time points in longitudinal studies.
Literature
3.
go back to reference Rivera MP, Mehta AC, Wahidi MM: Establishing the diagnosis of lung cancer: Diagnosis and management of lung cancer, 3rd ed: American College of Chest Physicians evidence-based clinical practice guidelines. Chest 143(5 Suppl):e142S–e165S, 2013. doi:10.1378/chest.12-2353 CrossRefPubMed Rivera MP, Mehta AC, Wahidi MM: Establishing the diagnosis of lung cancer: Diagnosis and management of lung cancer, 3rd ed: American College of Chest Physicians evidence-based clinical practice guidelines. Chest 143(5 Suppl):e142S–e165S, 2013. doi:10.​1378/​chest.​12-2353 CrossRefPubMed
5.
go back to reference National Lung Screening Trial Research Team, Aberle DR, Adams AM, Berg CD, Black WC, Clapp JD, Fagerstrom RM, Gareen IF, Gatsonis C, Marcus PM, Sicks JD: Reduced lung-cancer mortality with low-dose computed tomographic screening. N Engl J Med 365(5):395–409, 2011. doi:10.1056/NEJMoa1102873 CrossRef National Lung Screening Trial Research Team, Aberle DR, Adams AM, Berg CD, Black WC, Clapp JD, Fagerstrom RM, Gareen IF, Gatsonis C, Marcus PM, Sicks JD: Reduced lung-cancer mortality with low-dose computed tomographic screening. N Engl J Med 365(5):395–409, 2011. doi:10.​1056/​NEJMoa1102873 CrossRef
6.
go back to reference National Lung Screening Trial Research Team, Church TR, Black WC, Aberle DR, Berg CD, Clingan KL, Duan F, Fagerstrom RM, Gareen IF, Gierada DS, Jones GC, Mahon I, Marcus PM, Sicks JD, Jain A, Baum S: Results of initial low-dose computed tomographic screening for lung cancer. N Engl J Med 368(21):1980–1991, 2013. doi:10.1056/NEJMoa1209120 CrossRef National Lung Screening Trial Research Team, Church TR, Black WC, Aberle DR, Berg CD, Clingan KL, Duan F, Fagerstrom RM, Gareen IF, Gierada DS, Jones GC, Mahon I, Marcus PM, Sicks JD, Jain A, Baum S: Results of initial low-dose computed tomographic screening for lung cancer. N Engl J Med 368(21):1980–1991, 2013. doi:10.​1056/​NEJMoa1209120 CrossRef
7.
go back to reference MacMahon H, Austin JH, Gamsu G, Herold CJ, Jett JR, Naidich DP, Patz Jr, EF, Swensen SJ, Fleischner S: Guidelines for management of small pulmonary nodules detected on CT scans: a statement from the Fleischner Society. Radiology 237(2):395–400, 2005. doi:10.1148/radiol.2372041887 CrossRefPubMed MacMahon H, Austin JH, Gamsu G, Herold CJ, Jett JR, Naidich DP, Patz Jr, EF, Swensen SJ, Fleischner S: Guidelines for management of small pulmonary nodules detected on CT scans: a statement from the Fleischner Society. Radiology 237(2):395–400, 2005. doi:10.​1148/​radiol.​2372041887 CrossRefPubMed
9.
go back to reference Patel VK, Naik SK, Naidich DP, Travis WD, Weingarten JA, Lazzaro R, Gutterman DD, Wentowski C, Grosu HB, Raoof S: A practical algorithmic approach to the diagnosis and management of solitary pulmonary nodules: part 2: pretest probability and algorithm. Chest 143(3):840–846, 2013. doi:10.1378/chest.12-1487 CrossRefPubMed Patel VK, Naik SK, Naidich DP, Travis WD, Weingarten JA, Lazzaro R, Gutterman DD, Wentowski C, Grosu HB, Raoof S: A practical algorithmic approach to the diagnosis and management of solitary pulmonary nodules: part 2: pretest probability and algorithm. Chest 143(3):840–846, 2013. doi:10.​1378/​chest.​12-1487 CrossRefPubMed
11.
go back to reference Eisenhauer EA, Therasse P, Bogaerts J, Schwartz LH, Sargent D, Ford R, Dancey J, Arbuck S, Gwyther S, Mooney M, Rubinstein L, Shankar L, Dodd L, Kaplan R, Lacombe D, Verweij J: New response evaluation criteria in solid tumours: Revised RECIST guideline (version 1.1). 45(2):228–247, 2009. doi:10.1016/j.ejca.2008.10.026 Eisenhauer EA, Therasse P, Bogaerts J, Schwartz LH, Sargent D, Ford R, Dancey J, Arbuck S, Gwyther S, Mooney M, Rubinstein L, Shankar L, Dodd L, Kaplan R, Lacombe D, Verweij J: New response evaluation criteria in solid tumours: Revised RECIST guideline (version 1.1). 45(2):228–247, 2009. doi:10.​1016/​j.​ejca.​2008.​10.​026
13.
go back to reference Reeves AP, Biancardi AM, Apanasovich TV, Meyer CR, MacMahon H, van Beek EJ, Kazerooni EA, Yankelevitz D, McNitt-Gray MF, McLennan G, Armato 3rd, SG, Henschke CI, Aberle DR, Croft BY, Clarke LP: The Lung Image Database Consortium (LIDC): a comparison of different size metrics for pulmonary nodule measurements. Acad Radiol 14(12):1475–1485, 2007. doi:10.1016/j.acra.2007.09.005 CrossRefPubMedPubMedCentral Reeves AP, Biancardi AM, Apanasovich TV, Meyer CR, MacMahon H, van Beek EJ, Kazerooni EA, Yankelevitz D, McNitt-Gray MF, McLennan G, Armato 3rd, SG, Henschke CI, Aberle DR, Croft BY, Clarke LP: The Lung Image Database Consortium (LIDC): a comparison of different size metrics for pulmonary nodule measurements. Acad Radiol 14(12):1475–1485, 2007. doi:10.​1016/​j.​acra.​2007.​09.​005 CrossRefPubMedPubMedCentral
14.
go back to reference Marten K, Auer F, Schmidt S, Kohl G, Rummeny EJ: Inadequacy of manual measurements compared to automated CT volumetry in assessment of treatment response of pulmonary metastases using RECIST criteria - Springer. European. 2006 Marten K, Auer F, Schmidt S, Kohl G, Rummeny EJ: Inadequacy of manual measurements compared to automated CT volumetry in assessment of treatment response of pulmonary metastases using RECIST criteria - Springer. European. 2006
15.
go back to reference Zhao YR, Ooijen PMv, Dorrius MD, Heuvelmans M, de Bock GH, Vliegenthart R, Oudkerk M: Comparison of three software systems for semi-automatic volumetry of pulmonary nodules on baseline and follow-up CT examinations. Acta radiologica (Stockholm, Sweden : 1987). 2013. doi:10.1177/0284185113508177 Zhao YR, Ooijen PMv, Dorrius MD, Heuvelmans M, de Bock GH, Vliegenthart R, Oudkerk M: Comparison of three software systems for semi-automatic volumetry of pulmonary nodules on baseline and follow-up CT examinations. Acta radiologica (Stockholm, Sweden : 1987). 2013. doi:10.​1177/​0284185113508177​
17.
go back to reference Kalpathy-Cramer J, Fuller CD: Target Contour Testing/Instructional Computer Software (TaCTICS): a novel training and evaluation platform for radiotherapy target delineation.2010:361–365, 2010 Kalpathy-Cramer J, Fuller CD: Target Contour Testing/Instructional Computer Software (TaCTICS): a novel training and evaluation platform for radiotherapy target delineation.2010:361–365, 2010
18.
go back to reference Kalpathy-Cramer J, Bedrick SD, Boccia K, Fuller CD: A pilot prospective feasibility study of organ-at-risk definition using Target Contour Testing/Instructional Computer Software (TaCTICS), a training and evaluation platform for radiotherapy target delineation.2011:654–663,2011 Kalpathy-Cramer J, Bedrick SD, Boccia K, Fuller CD: A pilot prospective feasibility study of organ-at-risk definition using Target Contour Testing/Instructional Computer Software (TaCTICS), a training and evaluation platform for radiotherapy target delineation.2011:654–663,2011
19.
go back to reference Kalpathy-Cramer J, Awan M, Bedrick S, Rasch CR, Rosenthal DI, Fuller CD: Development of a software for quantitative evaluation radiotherapy target and organ-at-risk segmentation comparison. J Digit Imaging 27(1):108–119, 2014. doi:10.1007/s10278-013-9633-4 CrossRefPubMed Kalpathy-Cramer J, Awan M, Bedrick S, Rasch CR, Rosenthal DI, Fuller CD: Development of a software for quantitative evaluation radiotherapy target and organ-at-risk segmentation comparison. J Digit Imaging 27(1):108–119, 2014. doi:10.​1007/​s10278-013-9633-4 CrossRefPubMed
21.
23.
go back to reference Turner WD, Kelliher TP, Ross JC, Miller JV: An analysis of early studies released by the Lung Imaging Database Consortium (LIDC). Med Image Comput Comput Assist Interv 9(Pt 2):487–494, 2006PubMed Turner WD, Kelliher TP, Ross JC, Miller JV: An analysis of early studies released by the Lung Imaging Database Consortium (LIDC). Med Image Comput Comput Assist Interv 9(Pt 2):487–494, 2006PubMed
24.
go back to reference Armato III, SG, McNitt-Gray MF, Reeves AP, Meyer CR, McLennan G, Aberle DR, Kazerooni EA, MacMahon H, van Beek EJ, Yankelevitz D, Hoffman EA, Henschke CI, Roberts RY, Brown MS, Engelmann RM, Pais RC, Piker CW, Qing D, Kocherginsky M, Croft BY, Clarke LP: The Lung Image Database Consortium (LIDC): an evaluation of radiologist variability in the identification of lung nodules on CT scans. Acad Radiol 14(11):1409–1421, 2007. doi:10.1016/j.acra.2007.07.008 CrossRefPubMedPubMedCentral Armato III, SG, McNitt-Gray MF, Reeves AP, Meyer CR, McLennan G, Aberle DR, Kazerooni EA, MacMahon H, van Beek EJ, Yankelevitz D, Hoffman EA, Henschke CI, Roberts RY, Brown MS, Engelmann RM, Pais RC, Piker CW, Qing D, Kocherginsky M, Croft BY, Clarke LP: The Lung Image Database Consortium (LIDC): an evaluation of radiologist variability in the identification of lung nodules on CT scans. Acad Radiol 14(11):1409–1421, 2007. doi:10.​1016/​j.​acra.​2007.​07.​008 CrossRefPubMedPubMedCentral
25.
go back to reference Gevaert O, Xu J, Hoang CD, Leung AN, Xu Y, Quon A, Rubin DL, Napel S, Plevritis SK: Non-small cell lung cancer: identifying prognostic imaging biomarkers by leveraging public gene expression microarray data—methods and preliminary results. Radiology 264(2):387–396, 2012. doi:10.1148/radiol.12111607 CrossRefPubMedPubMedCentral Gevaert O, Xu J, Hoang CD, Leung AN, Xu Y, Quon A, Rubin DL, Napel S, Plevritis SK: Non-small cell lung cancer: identifying prognostic imaging biomarkers by leveraging public gene expression microarray data—methods and preliminary results. Radiology 264(2):387–396, 2012. doi:10.​1148/​radiol.​12111607 CrossRefPubMedPubMedCentral
26.
go back to reference Zhao B, Tan Y, Tsai WY, Schwartz LH, Lu L: Exploring variability in CT characterization of tumors: a preliminary phantom study. Transl Oncol 7(1):88–93, 2014CrossRefPubMedPubMedCentral Zhao B, Tan Y, Tsai WY, Schwartz LH, Lu L: Exploring variability in CT characterization of tumors: a preliminary phantom study. Transl Oncol 7(1):88–93, 2014CrossRefPubMedPubMedCentral
27.
go back to reference Gu Y, Kumar V, Hall LO, Goldgof DB, Li C-Y, Korn R, Bendtsen C, Velazquez ER, Dekker A, Aerts H, Lambin P, Li X, Tian J, Gatenby RA, Gillies RJ: Automated delineation of lung tumors from CT images using a single click ensemble segmentation approach. Pattern Recogn 46(3):692–702, 2013. doi:10.1016/j.patcog.2012.10.005 CrossRef Gu Y, Kumar V, Hall LO, Goldgof DB, Li C-Y, Korn R, Bendtsen C, Velazquez ER, Dekker A, Aerts H, Lambin P, Li X, Tian J, Gatenby RA, Gillies RJ: Automated delineation of lung tumors from CT images using a single click ensemble segmentation approach. Pattern Recogn 46(3):692–702, 2013. doi:10.​1016/​j.​patcog.​2012.​10.​005 CrossRef
31.
go back to reference Obuchowski NA, Reeves AP, Huang EP, Wang XF, Buckler AJ, Kim HJ, Barnhart HX, Jackson EF, Giger ML, Pennello G, Toledano AY, Kalpathy-Cramer J, Apanasovich TV, Kinahan PE, Myers KJ, Goldgof DB, Barboriak DP, Gillies RJ, Schwartz LH, Sullivan AD: Quantitative imaging biomarkers: A review of statistical methods for computer algorithm comparisons. Statistical methods in medical research. 2014. doi:10.1177/0962280214537390 Obuchowski NA, Reeves AP, Huang EP, Wang XF, Buckler AJ, Kim HJ, Barnhart HX, Jackson EF, Giger ML, Pennello G, Toledano AY, Kalpathy-Cramer J, Apanasovich TV, Kinahan PE, Myers KJ, Goldgof DB, Barboriak DP, Gillies RJ, Schwartz LH, Sullivan AD: Quantitative imaging biomarkers: A review of statistical methods for computer algorithm comparisons. Statistical methods in medical research. 2014. doi:10.​1177/​0962280214537390​
32.
go back to reference Raunig DL, McShane LM, Pennello G, Gatsonis C, Carson PL, Voyvodic JT, Wahl RL, Kurland BF, Schwarz AJ, Gonen M, Zahlmann G, Kondratovich M, O’Donnell K, Petrick N, Cole PE, Garra B, Sullivan DC, Group QTPW: Quantitative imaging biomarkers: A review of statistical methods for technical performance assessment. Statistical methods in medical research. 2014. doi:10.1177/0962280214537344 Raunig DL, McShane LM, Pennello G, Gatsonis C, Carson PL, Voyvodic JT, Wahl RL, Kurland BF, Schwarz AJ, Gonen M, Zahlmann G, Kondratovich M, O’Donnell K, Petrick N, Cole PE, Garra B, Sullivan DC, Group QTPW: Quantitative imaging biomarkers: A review of statistical methods for technical performance assessment. Statistical methods in medical research. 2014. doi:10.​1177/​0962280214537344​
33.
go back to reference Kessler LG, Barnhart HX, Buckler AJ, Choudhury KR, Kondratovich MV, Toledano A, Guimaraes AR, Filice R, Zhang Z, Sullivan DC, Group QTW: The emerging science of quantitative imaging biomarkers terminology and definitions for scientific studies and regulatory submissions. Stat Methods Med Res, 2014. doi:10.1177/0962280214537333 PubMed Kessler LG, Barnhart HX, Buckler AJ, Choudhury KR, Kondratovich MV, Toledano A, Guimaraes AR, Filice R, Zhang Z, Sullivan DC, Group QTW: The emerging science of quantitative imaging biomarkers terminology and definitions for scientific studies and regulatory submissions. Stat Methods Med Res, 2014. doi:10.​1177/​0962280214537333​ PubMed
34.
go back to reference Barnhart HX, Haber M, Song J: Overall concordance correlation coefficient for evaluating agreement among multiple observers. Biometrics 58(4):1020–1027, 2002CrossRefPubMed Barnhart HX, Haber M, Song J: Overall concordance correlation coefficient for evaluating agreement among multiple observers. Biometrics 58(4):1020–1027, 2002CrossRefPubMed
35.
go back to reference Lin LI: A concordance correlation coefficient to evaluate reproducibility. Biometrics 45(1):255–268, 1989CrossRefPubMed Lin LI: A concordance correlation coefficient to evaluate reproducibility. Biometrics 45(1):255–268, 1989CrossRefPubMed
36.
go back to reference Shrout PE, Fleiss JL: Intraclass correlations: uses in assessing rater reliability. Psychol Bull 86(2):420–428, 1979CrossRefPubMed Shrout PE, Fleiss JL: Intraclass correlations: uses in assessing rater reliability. Psychol Bull 86(2):420–428, 1979CrossRefPubMed
38.
go back to reference Barnhart HX, Barboriak DP: Applications of the repeatability of quantitative imaging biomarkers: a review of statistical analysis of repeat data sets. Transl Oncol 2(4):231–235, 2009CrossRefPubMedPubMedCentral Barnhart HX, Barboriak DP: Applications of the repeatability of quantitative imaging biomarkers: a review of statistical analysis of repeat data sets. Transl Oncol 2(4):231–235, 2009CrossRefPubMedPubMedCentral
39.
go back to reference Bland JM, Altman DG: Statistical methods for assessing agreement between two methods of clinical measurement. Lancet 1(8476):307–310, 1986CrossRefPubMed Bland JM, Altman DG: Statistical methods for assessing agreement between two methods of clinical measurement. Lancet 1(8476):307–310, 1986CrossRefPubMed
40.
go back to reference Nevill AM, Atkinson G: Assessing agreement between measurements recorded on a ratio scale in sports medicine and sports science. Br J Sports Med 31(4):314–318, 1997CrossRefPubMedPubMedCentral Nevill AM, Atkinson G: Assessing agreement between measurements recorded on a ratio scale in sports medicine and sports science. Br J Sports Med 31(4):314–318, 1997CrossRefPubMedPubMedCentral
41.
go back to reference Obuchowski NA, Barnhart HX, Buckler AJ, Pennello G, Wang XF, Kalpathy-Cramer J, Kim HJ, Reeves AP, for the Case Example Working G: Statistical issues in the comparison of quantitative imaging biomarker algorithms using pulmonary nodule volume as an example. Statistical methods in medical research. Stat Methods Med Res 24(1):107–140, 2015. doi:10.1177/0962280214537392 CrossRefPubMed Obuchowski NA, Barnhart HX, Buckler AJ, Pennello G, Wang XF, Kalpathy-Cramer J, Kim HJ, Reeves AP, for the Case Example Working G: Statistical issues in the comparison of quantitative imaging biomarker algorithms using pulmonary nodule volume as an example. Statistical methods in medical research. Stat Methods Med Res 24(1):107–140, 2015. doi:10.​1177/​0962280214537392​ CrossRefPubMed
42.
go back to reference Dice LR: Measures of the amount of ecologic association between species. Ecology 26(3):297–302, 1945CrossRef Dice LR: Measures of the amount of ecologic association between species. Ecology 26(3):297–302, 1945CrossRef
43.
44.
go back to reference Siegel S, Castellan Jr, NJ: Nonparametric Statistics for the Behavioral Sciences, 2nd edition. McGraw-Hill Humanities/Social Sciences/Languages, New York, 1988 Siegel S, Castellan Jr, NJ: Nonparametric Statistics for the Behavioral Sciences, 2nd edition. McGraw-Hill Humanities/Social Sciences/Languages, New York, 1988
Metadata
Title
A Comparison of Lung Nodule Segmentation Algorithms: Methods and Results from a Multi-institutional Study
Authors
Jayashree Kalpathy-Cramer
Binsheng Zhao
Dmitry Goldgof
Yuhua Gu
Xingwei Wang
Hao Yang
Yongqiang Tan
Robert Gillies
Sandy Napel
Publication date
01-08-2016
Publisher
Springer International Publishing
Published in
Journal of Imaging Informatics in Medicine / Issue 4/2016
Print ISSN: 2948-2925
Electronic ISSN: 2948-2933
DOI
https://doi.org/10.1007/s10278-016-9859-z

Other articles of this Issue 4/2016

Journal of Digital Imaging 4/2016 Go to the issue