Skip to main content
Top
Published in: Journal of Medical Systems 4/2007

01-08-2007

Predicting Metastasis in Breast Cancer: Comparing a Decision Tree with Domain Experts

Authors: Amir R. Razavi, Hans Gill, Hans Åhlfeldt, Nosrat Shahsavar

Published in: Journal of Medical Systems | Issue 4/2007

Login to get access

Abstract

Breast malignancy is the second most common cause of cancer death among women in Western countries. Identifying high-risk patients is vital in order to provide them with specialized treatment. In some situations, such as when access to experienced oncologists is not possible, decision support methods can be helpful in predicting the recurrence of cancer. Three thousand six hundred ninety-nine breast cancer patients admitted in south-east Sweden from 1986 to 1995 were studied. A decision tree was trained with all patients except for 100 cases and tested with those 100 cases. Two domain experts were asked for their opinions about the probability of recurrence of a certain outcome for these 100 patients. ROC curves, area under the ROC curves, and calibration for predictions were computed and compared. After comparing the predictions from a model built by data mining with predictions made by two domain experts, no significant differences were noted. In situations where experienced oncologists are not available, predictive models created with data mining techniques can be used to support physicians in decision making with acceptable accuracy.
Literature
1.
go back to reference Sakorafas, G. H., Krespis, E., and Pavlakis, G., Risk estimation for breast cancer development; a clinical perspective. Surg. Oncol. 10(4):183–192, 2002 May.CrossRef Sakorafas, G. H., Krespis, E., and Pavlakis, G., Risk estimation for breast cancer development; a clinical perspective. Surg. Oncol. 10(4):183–192, 2002 May.CrossRef
2.
go back to reference Fieschi, M., Dufour, J. C., Staccini, P., Gouvernet, J., and Bouhaddou, O., Medical decision support systems: Old dilemmas and new paradigms? Methods Inf. Med. 42(3):190–198, 2003. Fieschi, M., Dufour, J. C., Staccini, P., Gouvernet, J., and Bouhaddou, O., Medical decision support systems: Old dilemmas and new paradigms? Methods Inf. Med. 42(3):190–198, 2003.
3.
go back to reference Fayyad, U., PiatetskyShapiro, G., and Smyth, P., From data mining to knowledge discovery in databases. AI Mag. 17(3):37–54, 1996 Fal. Fayyad, U., PiatetskyShapiro, G., and Smyth, P., From data mining to knowledge discovery in databases. AI Mag. 17(3):37–54, 1996 Fal.
4.
go back to reference Han, J., and Kamber, M., Data mining concepts and techniques. San Francisco: Morgan Kaufmann, 2001. Han, J., and Kamber, M., Data mining concepts and techniques. San Francisco: Morgan Kaufmann, 2001.
5.
go back to reference Quinlan, J. R., C4.5: Programs for machine learning. San Mateo, CA: Morgan Kaufmann, 1993. Quinlan, J. R., C4.5: Programs for machine learning. San Mateo, CA: Morgan Kaufmann, 1993.
6.
go back to reference Podgorelec, V., Kokol, P., Stiglic, B., and Rozman, I., Decision trees: An overview and their use in medicine. J. Med. Syst. 26(5):445–463, 2002 Oct.CrossRef Podgorelec, V., Kokol, P., Stiglic, B., and Rozman, I., Decision trees: An overview and their use in medicine. J. Med. Syst. 26(5):445–463, 2002 Oct.CrossRef
7.
go back to reference Delen, D., Walker, G., and Kadam, A., Predicting breast cancer survivability: A comparison of three data mining methods. Artif. Intell. Med. 34(2):113–127, 2005 Jun.CrossRef Delen, D., Walker, G., and Kadam, A., Predicting breast cancer survivability: A comparison of three data mining methods. Artif. Intell. Med. 34(2):113–127, 2005 Jun.CrossRef
8.
go back to reference Vlahou, A., Schorge, J. O., Gregory, B. W., and Coleman, R. L., Diagnosis of ovarian cancer using decision tree classification of mass spectral data. J. Biomed. Biotechnol. 4(5):308–314, 2003 Dec.CrossRef Vlahou, A., Schorge, J. O., Gregory, B. W., and Coleman, R. L., Diagnosis of ovarian cancer using decision tree classification of mass spectral data. J. Biomed. Biotechnol. 4(5):308–314, 2003 Dec.CrossRef
9.
go back to reference Gerald, L. B., Tang, S., Bruce, F., Redden, D., Kimerling, M. E., Brook, N., et al., A decision tree for tuberculosis contact investigation. Am. J. Respir. Crit. Care Med. 166(8):1122–1127, 2002 Oct.CrossRef Gerald, L. B., Tang, S., Bruce, F., Redden, D., Kimerling, M. E., Brook, N., et al., A decision tree for tuberculosis contact investigation. Am. J. Respir. Crit. Care Med. 166(8):1122–1127, 2002 Oct.CrossRef
10.
go back to reference Atlas, L., Cole, R., Muthusamy, Y., Lippman, A., Connor, J., Park, D., et al., A performance comparison of trained multilayer perceptrons and trained classification trees. IEEE International Conference on Systems, Man and Cybernetics; 1989 Oct. Cambridge, MA, USA: Institute of Electrical and Electronic Engineers, pp. 1614–1619, 1989. Atlas, L., Cole, R., Muthusamy, Y., Lippman, A., Connor, J., Park, D., et al., A performance comparison of trained multilayer perceptrons and trained classification trees. IEEE International Conference on Systems, Man and Cybernetics; 1989 Oct. Cambridge, MA, USA: Institute of Electrical and Electronic Engineers, pp. 1614–1619, 1989.
11.
go back to reference Brown, D. E., Corruble, V., and Pittard, C. L., A comparison of decision tree classifiers with backpropagation neural networks for multimodal classification problems. Pattern Recogn. 26(6):953–961, 1993 Jun.CrossRef Brown, D. E., Corruble, V., and Pittard, C. L., A comparison of decision tree classifiers with backpropagation neural networks for multimodal classification problems. Pattern Recogn. 26(6):953–961, 1993 Jun.CrossRef
12.
go back to reference Talmon, J., Dassen, R., and Karthaus, V., Neural nets and classification trees: A comparison in the domain of ECG analysis. In: Gelsema, E. S., and Kanal, L. N., (Eds.), Pattern Recognition in Practice IV: Multiple Paradigms, Comparative Studies and Hybrid Systems; 1994. The Netherlands: Vlieland, pp. 415–423, 1994. Talmon, J., Dassen, R., and Karthaus, V., Neural nets and classification trees: A comparison in the domain of ECG analysis. In: Gelsema, E. S., and Kanal, L. N., (Eds.), Pattern Recognition in Practice IV: Multiple Paradigms, Comparative Studies and Hybrid Systems; 1994. The Netherlands: Vlieland, pp. 415–423, 1994.
13.
go back to reference Esposito, F., Malerba, D., and Semeraro, G., A comparative analysis of methods for pruning decision trees. IEEE Trans. Pattern Anal. Machine Intel. 19(5):476–491, 1997 May.CrossRef Esposito, F., Malerba, D., and Semeraro, G., A comparative analysis of methods for pruning decision trees. IEEE Trans. Pattern Anal. Machine Intel. 19(5):476–491, 1997 May.CrossRef
14.
go back to reference Mehrotra, J., Vali, M., McVeigh, M., Kominsky, S. L., Fackler, M. J., Lahti-Domenici, J., et al., Very high frequency of hypermethylated genes in breast cancer metastasis to the bone, brain, and lung. Clin. Cancer Res. 10(9):3104–3109, 2004 May.CrossRef Mehrotra, J., Vali, M., McVeigh, M., Kominsky, S. L., Fackler, M. J., Lahti-Domenici, J., et al., Very high frequency of hypermethylated genes in breast cancer metastasis to the bone, brain, and lung. Clin. Cancer Res. 10(9):3104–3109, 2004 May.CrossRef
15.
go back to reference Wenger, C. R., and Clark, G. M., S-phase fraction and breast cancer—a decade of experience. Breast Cancer Res. Treatment 51(3):255–265, 1998.CrossRef Wenger, C. R., and Clark, G. M., S-phase fraction and breast cancer—a decade of experience. Breast Cancer Res. Treatment 51(3):255–265, 1998.CrossRef
16.
go back to reference Sundquist, M., Thorstenson, S., Brudin, L., Wingren, S., and Nordenskjold, B., Incidence and prognosis in early onset breast cancer. Breast 11(1):30–35, 2002 Feb.CrossRef Sundquist, M., Thorstenson, S., Brudin, L., Wingren, S., and Nordenskjold, B., Incidence and prognosis in early onset breast cancer. Breast 11(1):30–35, 2002 Feb.CrossRef
17.
go back to reference Adami, H. O., Graffman, S., Johansson, H., and Rimsten, A., Survival and recurrences five years after selective treatment for breast carcinoma. Br. J. Cancer 38(5):624–630, 1978 Nov. Adami, H. O., Graffman, S., Johansson, H., and Rimsten, A., Survival and recurrences five years after selective treatment for breast carcinoma. Br. J. Cancer 38(5):624–630, 1978 Nov.
18.
go back to reference Sundquist, M., Thorstenson, S., Brudin, L., and Nordenskjold, B., Applying the Nottingham Prognostic Index to a Swedish breast cancer population. South East Swedish Breast Cancer Study Group. Breast Cancer Res. Treat. 53(1):1–8, 1999 Jan.CrossRef Sundquist, M., Thorstenson, S., Brudin, L., and Nordenskjold, B., Applying the Nottingham Prognostic Index to a Swedish breast cancer population. South East Swedish Breast Cancer Study Group. Breast Cancer Res. Treat. 53(1):1–8, 1999 Jan.CrossRef
19.
go back to reference Ciocca, D. R., and Elledge, R., Molecular markers for predicting response to tamoxifen in breast cancer patients. Endocrine 13(1):1–10, 2000 Aug.CrossRef Ciocca, D. R., and Elledge, R., Molecular markers for predicting response to tamoxifen in breast cancer patients. Endocrine 13(1):1–10, 2000 Aug.CrossRef
20.
go back to reference Lyman, G. H., Lyman, S., Balducci, L., Kuderer, N., Reintgen, D., Cox, C., et al., Age and the risk of breast cancer recurrence. Cancer Control 3(5):421–427, 1996 Oct. Lyman, G. H., Lyman, S., Balducci, L., Kuderer, N., Reintgen, D., Cox, C., et al., Age and the risk of breast cancer recurrence. Cancer Control 3(5):421–427, 1996 Oct.
21.
go back to reference Razavi, A. R., Gill, H., Stal, O., Sundquist, M., Thorstenson, S., Ahlfeldt, H., et al., Exploring cancer register data to find risk factors for recurrence of breast cancer—Application of Canonical Correlation Analysis. BMC Med. Inf. Decis. Mak. 5:29, 2005 Aug.CrossRef Razavi, A. R., Gill, H., Stal, O., Sundquist, M., Thorstenson, S., Ahlfeldt, H., et al., Exploring cancer register data to find risk factors for recurrence of breast cancer—Application of Canonical Correlation Analysis. BMC Med. Inf. Decis. Mak. 5:29, 2005 Aug.CrossRef
22.
go back to reference Tejler, G., Norberg, B., Dufmats, M., and Nordenskjold, B., Survival after treatment for breast cancer in a geographically defined population. Br. J. Surg. 91(10):1307–1312, 2004 Oct.CrossRef Tejler, G., Norberg, B., Dufmats, M., and Nordenskjold, B., Survival after treatment for breast cancer in a geographically defined population. Br. J. Surg. 91(10):1307–1312, 2004 Oct.CrossRef
23.
go back to reference Piatetskyshapiro, G., Knowledge discovery in databases. IEEE Intell. Syst. Appl. 6(5):74–76, 1991 Oct. Piatetskyshapiro, G., Knowledge discovery in databases. IEEE Intell. Syst. Appl. 6(5):74–76, 1991 Oct.
24.
go back to reference Lavrac, N., Selected techniques for data mining in medicine. Artif. Intell. Med. 16(1):3–23, 1999 May.CrossRef Lavrac, N., Selected techniques for data mining in medicine. Artif. Intell. Med. 16(1):3–23, 1999 May.CrossRef
25.
go back to reference Frawley, W. J., Piatetsky-Shapiro, G., and Matheus, C. J., Knowledge discovery in databases—An overview. AI Mag. 13:57–70, 1992. Frawley, W. J., Piatetsky-Shapiro, G., and Matheus, C. J., Knowledge discovery in databases—An overview. AI Mag. 13:57–70, 1992.
26.
go back to reference Hand, D. J., Smyth, P., and Mannila, H., Principles of data mining. Cambridge: MIT Press, 2001. Hand, D. J., Smyth, P., and Mannila, H., Principles of data mining. Cambridge: MIT Press, 2001.
27.
go back to reference Razavi, A. R., Gill, H., Åhlfeldt, H., and Shahsavar, N., A data pre-processing method to increase efficiency and accuracy in data mining. In: Miksch, S., Hunter, J., and Keravnou, E., (Eds.), 10th Conference on Artificial Intelligence in Medicine; 2005 July 23–27. Aberdeen, UK: Springer-Verlag GmbH, pp. 434–443, 2005. Razavi, A. R., Gill, H., Åhlfeldt, H., and Shahsavar, N., A data pre-processing method to increase efficiency and accuracy in data mining. In: Miksch, S., Hunter, J., and Keravnou, E., (Eds.), 10th Conference on Artificial Intelligence in Medicine; 2005 July 23–27. Aberdeen, UK: Springer-Verlag GmbH, pp. 434–443, 2005.
28.
go back to reference Rubin, D. B., and Schenker, N., Multiple imputation in health-care databases—An overview and some applications. Stat. Med. 10(4):585–598, 1991 Apr.CrossRef Rubin, D. B., and Schenker, N., Multiple imputation in health-care databases—An overview and some applications. Stat. Med. 10(4):585–598, 1991 Apr.CrossRef
29.
go back to reference Schafer, J. L., Analysis of incomplete multivariate data. London: Chapman & Hall, 1997.MATH Schafer, J. L., Analysis of incomplete multivariate data. London: Chapman & Hall, 1997.MATH
30.
go back to reference McLachlan, G. J., and Krishnan, T., The EM algorithm and extensions. New York: Wiley, 1997.MATH McLachlan, G. J., and Krishnan, T., The EM algorithm and extensions. New York: Wiley, 1997.MATH
31.
go back to reference Burke, H. B., Goodman, P. H., Rosen, D. B., Henson, D. E., Weinstein, J. N., Harrell, F. E. Jr., et al., Artificial neural networks improve the accuracy of cancer survival prediction. Cancer 79(4):857–862, 1997 Feb.CrossRef Burke, H. B., Goodman, P. H., Rosen, D. B., Henson, D. E., Weinstein, J. N., Harrell, F. E. Jr., et al., Artificial neural networks improve the accuracy of cancer survival prediction. Cancer 79(4):857–862, 1997 Feb.CrossRef
32.
go back to reference Luo, Y., and Lin, S., Information gain for genetic parameter estimation with incorporation of marker data. Biometrics 59(2):393–401, 2003 Jun.CrossRefMathSciNet Luo, Y., and Lin, S., Information gain for genetic parameter estimation with incorporation of marker data. Biometrics 59(2):393–401, 2003 Jun.CrossRefMathSciNet
33.
go back to reference Zorman, M., Eich, H. P., Stiglic, B., Ohmann, C., and Lenic, M., Does size really matter-using a decision tree approach for comparison of three different databases from the medical field of acute appendicitis. J. Med. Syst. 26(5):465–477, 2002 Oct.CrossRef Zorman, M., Eich, H. P., Stiglic, B., Ohmann, C., and Lenic, M., Does size really matter-using a decision tree approach for comparison of three different databases from the medical field of acute appendicitis. J. Med. Syst. 26(5):465–477, 2002 Oct.CrossRef
34.
go back to reference Witten, I. H., and Frank, E., Data mining: Practical machine learning tools with Java implementations. San Francisco: Morgan Kaufmann, 2000. Witten, I. H., and Frank, E., Data mining: Practical machine learning tools with Java implementations. San Francisco: Morgan Kaufmann, 2000.
35.
go back to reference Stone, M., Cross-validation choice and assessment of statistical predictions. J. Royal Stat. Soc. Ser. B 36:111–147, 1974.MATH Stone, M., Cross-validation choice and assessment of statistical predictions. J. Royal Stat. Soc. Ser. B 36:111–147, 1974.MATH
36.
go back to reference Bradley, A. P., The use of the area under the roc curve in the evaluation of machine learning algorithms. Pattern Recogn. 30(7):1145–1159, 1997 Jul.CrossRef Bradley, A. P., The use of the area under the roc curve in the evaluation of machine learning algorithms. Pattern Recogn. 30(7):1145–1159, 1997 Jul.CrossRef
37.
go back to reference Holmes, J. H., Quantitative methods for evaluating learning classifier system performance in forced two-choice decision tasks. 2nd International Workshop on Learning Classifier Systems. pp. 250–257, 1999. Holmes, J. H., Quantitative methods for evaluating learning classifier system performance in forced two-choice decision tasks. 2nd International Workshop on Learning Classifier Systems. pp. 250–257, 1999.
38.
go back to reference Ling, C. X., Huang, J., and Zhang, H., AUC: A better measure than accuracy in comparing learning algorithms. Adv. Artif. Intell. Proc. 2671:329–341, 2003.MathSciNet Ling, C. X., Huang, J., and Zhang, H., AUC: A better measure than accuracy in comparing learning algorithms. Adv. Artif. Intell. Proc. 2671:329–341, 2003.MathSciNet
39.
go back to reference Hosmer, D. W., and Lemeshow, S., Applied logistic regression. New York: Wiley, 1989. Hosmer, D. W., and Lemeshow, S., Applied logistic regression. New York: Wiley, 1989.
40.
go back to reference Jaimes, F., Farbiarz, J., Alvarez, D., and Martinez, C., Comparison between logistic regression and neural networks to predict death in patients with suspected sepsis in the emergency room. Crit. Care 9(2):R150–R156, 2005 Apr.CrossRef Jaimes, F., Farbiarz, J., Alvarez, D., and Martinez, C., Comparison between logistic regression and neural networks to predict death in patients with suspected sepsis in the emergency room. Crit. Care 9(2):R150–R156, 2005 Apr.CrossRef
41.
go back to reference Duhamel, A., Nuttens, M. C., Devos, P., Picavet, M., and Beuscart, R., A preprocessing method for improving data mining techniques. Application to a large medical diabetes database. Stud. Health Technol. Inf. 95:269–274, 2003. Duhamel, A., Nuttens, M. C., Devos, P., Picavet, M., and Beuscart, R., A preprocessing method for improving data mining techniques. Application to a large medical diabetes database. Stud. Health Technol. Inf. 95:269–274, 2003.
42.
go back to reference Chawla, N. V., Bowyer, K. W., Hall, L. O., and Kegelmeyer, W. P., SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 16:321–357, 2002.MATH Chawla, N. V., Bowyer, K. W., Hall, L. O., and Kegelmeyer, W. P., SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 16:321–357, 2002.MATH
43.
go back to reference Crockett, K., Bandar, Z., and O’Shea, J., On producing balanced fuzzy decision tree classifiers. pp. 1756, 2006. Crockett, K., Bandar, Z., and O’Shea, J., On producing balanced fuzzy decision tree classifiers. pp. 1756, 2006.
Metadata
Title
Predicting Metastasis in Breast Cancer: Comparing a Decision Tree with Domain Experts
Authors
Amir R. Razavi
Hans Gill
Hans Åhlfeldt
Nosrat Shahsavar
Publication date
01-08-2007
Published in
Journal of Medical Systems / Issue 4/2007
Print ISSN: 0148-5598
Electronic ISSN: 1573-689X
DOI
https://doi.org/10.1007/s10916-007-9064-1

Other articles of this Issue 4/2007

Journal of Medical Systems 4/2007 Go to the issue