Skip to main content
Top
Published in: Journal of Medical Systems 9/2016

01-09-2016 | Systems-Level Quality Improvement

Applying Data Mining Techniques to Improve Breast Cancer Diagnosis

Authors: Joana Diz, Goreti Marreiros, Alberto Freitas

Published in: Journal of Medical Systems | Issue 9/2016

Login to get access

Abstract

In the field of breast cancer research, and more than ever, new computer aided diagnosis based systems have been developed aiming to reduce diagnostic tests false-positives. Within this work, we present a data mining based approach which might support oncologists in the process of breast cancer classification and diagnosis. The present study aims to compare two breast cancer datasets and find the best methods in predicting benign/malignant lesions, breast density classification, and even for finding identification (mass / microcalcification distinction). To carry out these tasks, two matrices of texture features extraction were implemented using Matlab, and classified using data mining algorithms, on WEKA. Results revealed good percentages of accuracy for each class: 89.3 to 64.7 % - benign/malignant; 75.8 to 78.3 % - dense/fatty tissue; 71.0 to 83.1 % - finding identification. Among the different tests classifiers, Naive Bayes was the best to identify masses texture, and Random Forests was the first or second best classifier for the majority of tested groups.
Literature
1.
go back to reference Ferlay, J., Soerjomataram, I., Dikshit, R., Eser, S., Mathers, C., Rebelo, M., Parkin, D.M., Forman, D., and Bray, F., Cancer incidence and mortality worldwide: Sources, methods and major patterns in GLOBOCAN 2012. Int J Cancer. 136(5):E359–E386, 2015. doi:10.1002/ijc.29210.CrossRefPubMed Ferlay, J., Soerjomataram, I., Dikshit, R., Eser, S., Mathers, C., Rebelo, M., Parkin, D.M., Forman, D., and Bray, F., Cancer incidence and mortality worldwide: Sources, methods and major patterns in GLOBOCAN 2012. Int J Cancer. 136(5):E359–E386, 2015. doi:10.​1002/​ijc.​29210.CrossRefPubMed
2.
go back to reference American Cancer Society, Cancer facts and figs. 2016. American Cancer Society, Atlanta, Ga, 2016. American Cancer Society, Cancer facts and figs. 2016. American Cancer Society, Atlanta, Ga, 2016.
3.
go back to reference de Oliveira, J.E., Machado, A.M., Chavez, G.C., Lopes, A.P., Deserno, T.M., and Araujo Ade, A., MammoSys: A content-based image retrieval system using breast density patterns. Comput Methods Prog Biomed. 99(3):289–297, 2010. doi:10.1016/j.cmpb.2010.01.005.CrossRef de Oliveira, J.E., Machado, A.M., Chavez, G.C., Lopes, A.P., Deserno, T.M., and Araujo Ade, A., MammoSys: A content-based image retrieval system using breast density patterns. Comput Methods Prog Biomed. 99(3):289–297, 2010. doi:10.​1016/​j.​cmpb.​2010.​01.​005.CrossRef
4.
go back to reference Matheus, B., and Schiabel, H., A CADx scheme in mammography: considerations on a novel approach. In: ADVCOMP 2013. The Seventh International Conference on Advanced Engineering Computing and Applications in Sciences. 2013:15–18, 2013. Matheus, B., and Schiabel, H., A CADx scheme in mammography: considerations on a novel approach. In: ADVCOMP 2013. The Seventh International Conference on Advanced Engineering Computing and Applications in Sciences. 2013:15–18, 2013.
7.
go back to reference Ogiela, L., Computational intelligence in cognitive healthcare information systems. In: Bichindaritz, I., Vaidya, S., Jain, A., and Jain, L.C. (Eds.), Computational intelligence in healthcare 4: Advanced methodologies. Springer, Berlin Heidelberg, Berlin, Heidelberg, pp. 347–369, 2010. doi:10.1007/978-3-642-14464-6_16.CrossRef Ogiela, L., Computational intelligence in cognitive healthcare information systems. In: Bichindaritz, I., Vaidya, S., Jain, A., and Jain, L.C. (Eds.), Computational intelligence in healthcare 4: Advanced methodologies. Springer, Berlin Heidelberg, Berlin, Heidelberg, pp. 347–369, 2010. doi:10.​1007/​978-3-642-14464-6_​16.CrossRef
10.
go back to reference Kuusisto, F., Dutra, I., Elezaby, M., Mendonça, E.A., Shavlik, J., and Burnside, E.S., Leveraging expert knowledge to improve machine-learned decision support systems. AMIA Summits Transl Sci Proceed. 2015:87–91, 2015. Kuusisto, F., Dutra, I., Elezaby, M., Mendonça, E.A., Shavlik, J., and Burnside, E.S., Leveraging expert knowledge to improve machine-learned decision support systems. AMIA Summits Transl Sci Proceed. 2015:87–91, 2015.
11.
go back to reference Diz, J., Marreiros, G., and Freitas, A., Using data mining techniques to support breast cancer diagnosis. New Contributions in Information Systems and Technologies. Springer, In, pp. 689–700, 2015. doi:10.1007/978-3-319-16486-1_68. Diz, J., Marreiros, G., and Freitas, A., Using data mining techniques to support breast cancer diagnosis. New Contributions in Information Systems and Technologies. Springer, In, pp. 689–700, 2015. doi:10.​1007/​978-3-319-16486-1_​68.
15.
go back to reference D’Orsi, C.J., Sickles, E.A., Mendelson, E.B., Morris, E.A., et al., ACR BI-RADS® atlas, breast imaging reporting and data system. Reston, VA, American College of Radiology, 2013. D’Orsi, C.J., Sickles, E.A., Mendelson, E.B., Morris, E.A., et al., ACR BI-RADS® atlas, breast imaging reporting and data system. Reston, VA, American College of Radiology, 2013.
17.
go back to reference Gierach, G.L., Ichikawa, L., Kerlikowske, K., Brinton, L.A., Farhat, G.N., Vacek, P.M., Weaver, D.L., Schairer, C., Taplin, S.H., and Sherman, M.E., Relationship between mammographic density and breast cancer death in the breast cancer surveillance consortium. J Nat Cancer Inst. 104(16):1218–1227, 2012. doi:10.1093/jnci/djs327.CrossRefPubMedPubMedCentral Gierach, G.L., Ichikawa, L., Kerlikowske, K., Brinton, L.A., Farhat, G.N., Vacek, P.M., Weaver, D.L., Schairer, C., Taplin, S.H., and Sherman, M.E., Relationship between mammographic density and breast cancer death in the breast cancer surveillance consortium. J Nat Cancer Inst. 104(16):1218–1227, 2012. doi:10.​1093/​jnci/​djs327.CrossRefPubMedPubMedCentral
18.
go back to reference López MAG, Posada N, Moura DC, Pollán RR, Valiente JMF, Ortega CS, Solar M, Diaz-Herrero G, Ramos I, Loureiro J, Fernandes TC, Araújo BMF. (2012) BCDR: a breast cancer digital repository. In: 15th International Conference on Experimental Mechanics, FEUP-EURASEM-APAET, Porto/Portugal, 22–27 July 2012. ISBN: 978–972–8826-26-02. López MAG, Posada N, Moura DC, Pollán RR, Valiente JMF, Ortega CS, Solar M, Diaz-Herrero G, Ramos I, Loureiro J, Fernandes TC, Araújo BMF. (2012) BCDR: a breast cancer digital repository. In: 15th International Conference on Experimental Mechanics, FEUP-EURASEM-APAET, Porto/Portugal, 22–27 July 2012. ISBN: 978–972–8826-26-02.
19.
go back to reference Suri JS, Wilson DL, Laxminarayan S (2005) Handbook of biomedical image analysis, vol 2. Springer Science & Business Media. doi:10.1007/b104806 Suri JS, Wilson DL, Laxminarayan S (2005) Handbook of biomedical image analysis, vol 2. Springer Science & Business Media. doi:10.​1007/​b104806
20.
go back to reference Carneiro P, Patrocinio (2014) A Análise de atributos de intensidade e textura na classificação de densidade mamária. In: XXIV Congresso Brasileiro de Engenharia Biomédica – CBEB 2014, pp 634–637 Carneiro P, Patrocinio (2014) A Análise de atributos de intensidade e textura na classificação de densidade mamária. In: XXIV Congresso Brasileiro de Engenharia Biomédica – CBEB 2014, pp 634–637
22.
go back to reference Mohanty, A.K., Senapati, M.R., Beberta, S., and Lenka, S.K., Texture-based features for classification of mammograms using decision tree. Neural Comput Applic. 23(3–4):1011–1017, 2013. doi:10.1007/s00521-012-1025-z.CrossRef Mohanty, A.K., Senapati, M.R., Beberta, S., and Lenka, S.K., Texture-based features for classification of mammograms using decision tree. Neural Comput Applic. 23(3–4):1011–1017, 2013. doi:10.​1007/​s00521-012-1025-z.CrossRef
25.
go back to reference Pérez N, Guevara MA, Silva A, Ramos I, Loureiro J (2014) Improving the performance of machine learning classifiers for Breast Cancer diagnosis based on feature selection. In: Computer Science and Information Systems (FedCSIS), 2014 Federated Conference on. IEEE, pp 209–217. doi:10.15439/2014F249 Pérez N, Guevara MA, Silva A, Ramos I, Loureiro J (2014) Improving the performance of machine learning classifiers for Breast Cancer diagnosis based on feature selection. In: Computer Science and Information Systems (FedCSIS), 2014 Federated Conference on. IEEE, pp 209–217. doi:10.​15439/​2014F249
26.
go back to reference Bueno, G., Vállez, N., Déniz, O., Esteve, P., Rienda, M.A., Arias, M., and Pastor, C., Automatic breast parenchymal density classification integrated into a CADe system. Int J Comput Assist Radiol Surg. 6(3):309–318, 2011. doi:10.1007/s11548-010-0510-z.CrossRefPubMed Bueno, G., Vállez, N., Déniz, O., Esteve, P., Rienda, M.A., Arias, M., and Pastor, C., Automatic breast parenchymal density classification integrated into a CADe system. Int J Comput Assist Radiol Surg. 6(3):309–318, 2011. doi:10.​1007/​s11548-010-0510-z.CrossRefPubMed
27.
go back to reference Ramos-Pollán, R., Guevara-López, M.A., Suárez-Ortega, C., Díaz-Herrero, G., Franco-Valiente, J.M., Rubio-del-Solar, M., González-de-Posada, N., Vaz, M.A.P., Loureiro, J., and Ramos, I., Discovering mammography-based machine learning classifiers for breast cancer diagnosis. J Med Syst. 36(4):2259–2269, 2012. doi:10.1007/s10916-011-9693-2.CrossRefPubMed Ramos-Pollán, R., Guevara-López, M.A., Suárez-Ortega, C., Díaz-Herrero, G., Franco-Valiente, J.M., Rubio-del-Solar, M., González-de-Posada, N., Vaz, M.A.P., Loureiro, J., and Ramos, I., Discovering mammography-based machine learning classifiers for breast cancer diagnosis. J Med Syst. 36(4):2259–2269, 2012. doi:10.​1007/​s10916-011-9693-2.CrossRefPubMed
28.
go back to reference Oliver A, Freixenet J, Martí R, Zwiggelaar R (2006) A comparison of breast tissue classification techniques. In: Medical Image Computing and Computer-Assisted Intervention–MICCAI 2006. Springer, pp 872–879. doi:10.1007/11866763_107 Oliver A, Freixenet J, Martí R, Zwiggelaar R (2006) A comparison of breast tissue classification techniques. In: Medical Image Computing and Computer-Assisted Intervention–MICCAI 2006. Springer, pp 872–879. doi:10.​1007/​11866763_​107
29.
go back to reference Lesniak, J., Hupse, R., Blanc, R., Karssemeijer, N., and Székely, G., Comparative evaluation of support vector machine classification for computer aided detection of breast masses in mammography. Phys Med Biol. 57(16):5295–5307, 2012. doi:10.1088/0031-9155/57/16/5295.CrossRefPubMed Lesniak, J., Hupse, R., Blanc, R., Karssemeijer, N., and Székely, G., Comparative evaluation of support vector machine classification for computer aided detection of breast masses in mammography. Phys Med Biol. 57(16):5295–5307, 2012. doi:10.​1088/​0031-9155/​57/​16/​5295.CrossRefPubMed
32.
go back to reference Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques. Morgan Kaufmann, San Francisco. ISBN:0120884070 Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques. Morgan Kaufmann, San Francisco. ISBN:0120884070
33.
34.
go back to reference Fonseca, J., Pre-CADs in breast cancer. FEUP, MSc Thesis in Engenharia Eletrotécnica e de Computadores, 2013. Fonseca, J., Pre-CADs in breast cancer. FEUP, MSc Thesis in Engenharia Eletrotécnica e de Computadores, 2013.
35.
go back to reference Benndorf, M., Kotter, E., Langer, M., Herda, C., Wu, Y., and Burnside, E.S., Development of an online, publicly accessible naive Bayesian decision support tool for mammographic mass lesions based on the American College of Radiology (ACR) BI-RADS lexicon. Eur Radiol. 25(6):1768–1775, 2015. doi:10.1007/s00330-014-3570-6.CrossRefPubMedPubMedCentral Benndorf, M., Kotter, E., Langer, M., Herda, C., Wu, Y., and Burnside, E.S., Development of an online, publicly accessible naive Bayesian decision support tool for mammographic mass lesions based on the American College of Radiology (ACR) BI-RADS lexicon. Eur Radiol. 25(6):1768–1775, 2015. doi:10.​1007/​s00330-014-3570-6.CrossRefPubMedPubMedCentral
36.
go back to reference Timmers, J.M.H., van Doorne-Nagtegaal, H.J., Verbeek, A.L.M., den Heeten, G.J., and Broeders, M.J.M., A dedicated BI-RADS training programme: effect on the inter-observer variation among screening radiologists. Eur J Radiol. 81(9):2184–2188, 2012. doi:10.1016/j.ejrad.2011.07.011.CrossRefPubMed Timmers, J.M.H., van Doorne-Nagtegaal, H.J., Verbeek, A.L.M., den Heeten, G.J., and Broeders, M.J.M., A dedicated BI-RADS training programme: effect on the inter-observer variation among screening radiologists. Eur J Radiol. 81(9):2184–2188, 2012. doi:10.​1016/​j.​ejrad.​2011.​07.​011.CrossRefPubMed
38.
go back to reference Fischer EA, Lo JY, Markey MK (2004) Bayesian networks of BI-RADS descriptors for breast lesion classification. Annual International Conference of the IEEE Engineering in Medicine and Biology - Proceedings 4:3031–3034. issn: 0589–1019 Fischer EA, Lo JY, Markey MK (2004) Bayesian networks of BI-RADS descriptors for breast lesion classification. Annual International Conference of the IEEE Engineering in Medicine and Biology - Proceedings 4:3031–3034. issn: 0589–1019
39.
go back to reference Elter, M., Schulz-Wendtland, R., and Wittenberg, T., The prediction of breast cancer biopsy outcomes using two CAD approaches that both emphasize an intelligible decision process. Med Phys. 34(11):4164–4172, 2007. doi:10.1118/1.2786864.CrossRefPubMed Elter, M., Schulz-Wendtland, R., and Wittenberg, T., The prediction of breast cancer biopsy outcomes using two CAD approaches that both emphasize an intelligible decision process. Med Phys. 34(11):4164–4172, 2007. doi:10.​1118/​1.​2786864.CrossRefPubMed
41.
go back to reference Burnside, E.S., Davis, J., Chhatwal, J., Alagoz, O., Lindstrom, M.J., Geller, B.M., Littenberg, B., Shaffer, K.A., Kahn Jr., C.E., and Page, C.D., Probabilistic computer model developed from clinical data in national mammography database format to classify mammographic findings. Radiology. 251(3):663–672, 2009. doi:10.1148/radiol.2513081346.CrossRefPubMedCentral Burnside, E.S., Davis, J., Chhatwal, J., Alagoz, O., Lindstrom, M.J., Geller, B.M., Littenberg, B., Shaffer, K.A., Kahn Jr., C.E., and Page, C.D., Probabilistic computer model developed from clinical data in national mammography database format to classify mammographic findings. Radiology. 251(3):663–672, 2009. doi:10.​1148/​radiol.​2513081346.CrossRefPubMedCentral
42.
go back to reference Mandelson, M.T., Oestreicher, N., Porter, P.L., White, D., Finder, C.A., Taplin, S.H., and White, E., Breast density as a predictor of mammographic detection: comparison of interval- and screen-detected cancers. J Natl Cancer Ins. 92(13):1081–1087, 2000. doi:10.1093/jnci/92.13.1081.CrossRef Mandelson, M.T., Oestreicher, N., Porter, P.L., White, D., Finder, C.A., Taplin, S.H., and White, E., Breast density as a predictor of mammographic detection: comparison of interval- and screen-detected cancers. J Natl Cancer Ins. 92(13):1081–1087, 2000. doi:10.​1093/​jnci/​92.​13.​1081.CrossRef
Metadata
Title
Applying Data Mining Techniques to Improve Breast Cancer Diagnosis
Authors
Joana Diz
Goreti Marreiros
Alberto Freitas
Publication date
01-09-2016
Publisher
Springer US
Published in
Journal of Medical Systems / Issue 9/2016
Print ISSN: 0148-5598
Electronic ISSN: 1573-689X
DOI
https://doi.org/10.1007/s10916-016-0561-y

Other articles of this Issue 9/2016

Journal of Medical Systems 9/2016 Go to the issue

Systems-Level Quality Improvement

How to Develop the Medical Neighborhood