Skip to main content
Top
Published in: BMC Medical Informatics and Decision Making 1/2023

Open Access 01-12-2023 | Research

Descriptive forest: experiments on a novel tree-structure-generalization method for describing cardiovascular diseases

Author: Peera Liewlom

Published in: BMC Medical Informatics and Decision Making | Issue 1/2023

Login to get access

Abstract

Background

A decision tree is a crucial tool for describing the factors related to cardiovascular disease (CVD) risk and for predicting and explaining it for patients. Notably, the decision tree must be simplified because patients may have different primary topics or factors related to the CVD risk. Many decision trees can describe the data collected from multiple environmental heart disease risk datasets or a forest, where each tree describes the CVD risk for each primary topic.

Methods

We demonstrate the presence of trees, or a forest, using an integrated CVD dataset obtained from multiple datasets. Moreover, we apply a novel method to an association-rule tree to discover each primary topic hidden within a dataset. To generalize the tree structure for descriptive tasks, each primary topic is a boundary node acting as a root node of a C4.5 tree with the least prodigality for the tree structure (PTS). All trees are assigned to a descriptive forest describing the CVD risks in a dataset. A descriptive forest is used to describe each CVD patient’s primary risk topics and related factors. We describe eight primary topics in a descriptive forest acquired from 918 records of a heart failure–prediction dataset with 11 features obtained from five datasets. We apply the proposed method to 253,680 records with 22 features from imbalanced classes of a heart disease health–indicators dataset.

Results

The usability of the descriptive forest is demonstrated by a comparative study (on qualitative and quantitative tasks of the CVD-risk explanation) with a C4.5 tree generated from the same dataset but with the least PTS. The qualitative descriptive task confirms that compared to a single C4.5 tree, the descriptive forest is more flexible and can better describe the CVD risk, whereas the quantitative descriptive task confirms that it achieved higher coverage (recall) and correctness (accuracy and precision) and provided more detailed explanations. Additionally, for these tasks, the descriptive forest still outperforms the C4.5 tree. To reduce the problem of imbalanced classes, the ratio of classes in each subdataset generating each tree is investigated.

Conclusion

The results provide confidence for using the descriptive forest.
Literature
2.
go back to reference Ahn I, Na W, Kwon O, Yang DH, Park G-M, Gwon H, et al. CardioNet: a manually curated database for artificial intelligence-based research on cardiovascular diseases. BMC Med Inform Decis Mak. 2021;21:1–15.CrossRef Ahn I, Na W, Kwon O, Yang DH, Park G-M, Gwon H, et al. CardioNet: a manually curated database for artificial intelligence-based research on cardiovascular diseases. BMC Med Inform Decis Mak. 2021;21:1–15.CrossRef
3.
go back to reference Leach HJ, O’Connor DP, Simpson RJ, Rifai HS, Mama SK, Lee RE. An exploratory decision tree analysis to predict cardiovascular disease risk in African American women. Health Psychol. 2016;35:397.CrossRefPubMed Leach HJ, O’Connor DP, Simpson RJ, Rifai HS, Mama SK, Lee RE. An exploratory decision tree analysis to predict cardiovascular disease risk in African American women. Health Psychol. 2016;35:397.CrossRefPubMed
5.
go back to reference Qawqzeh YK, Otoom MM, Al-Fayez F, Almarashdeh I, Alsmadi M, Jaradat G. A proposed decision tree classifier for atherosclerosis prediction and classification. IJCSNS. 2019;19:197. Qawqzeh YK, Otoom MM, Al-Fayez F, Almarashdeh I, Alsmadi M, Jaradat G. A proposed decision tree classifier for atherosclerosis prediction and classification. IJCSNS. 2019;19:197.
6.
go back to reference Quinlan JR. C4. 5: programs for machine learning. USA: Morgan Kaufmann Publishers; 1993. Quinlan JR. C4. 5: programs for machine learning. USA: Morgan Kaufmann Publishers; 1993.
7.
go back to reference Safavian SR, Landgrebe D. A survey of decision tree classifier methodology. IEEE Trans Syst Man Cybern. 1991;21:660–74.CrossRef Safavian SR, Landgrebe D. A survey of decision tree classifier methodology. IEEE Trans Syst Man Cybern. 1991;21:660–74.CrossRef
8.
go back to reference Son C-S, Kim Y-N, Kim H-S, Park H-S, Kim M-S. Decision-making model for early diagnosis of congestive heart failure using rough set and decision tree approaches. J Biomed Inform. 2012;45:999–1008.CrossRefPubMed Son C-S, Kim Y-N, Kim H-S, Park H-S, Kim M-S. Decision-making model for early diagnosis of congestive heart failure using rough set and decision tree approaches. J Biomed Inform. 2012;45:999–1008.CrossRefPubMed
9.
11.
go back to reference Nicora G, Rios M, Abu-Hanna A, Bellazzi R. Evaluating Pointwise Reliability of Machine Learning prediction. J Biomed Inform. 2022;127:103996.CrossRefPubMed Nicora G, Rios M, Abu-Hanna A, Bellazzi R. Evaluating Pointwise Reliability of Machine Learning prediction. J Biomed Inform. 2022;127:103996.CrossRefPubMed
13.
go back to reference Cormen TH, Leiserson CE, Rivest RL, Stein C. Introduction to algorithms. 4th ed. Cambridge, Massachusetts: the MIT press; 2022. Cormen TH, Leiserson CE, Rivest RL, Stein C. Introduction to algorithms. 4th ed. Cambridge, Massachusetts: the MIT press; 2022.
14.
go back to reference Han J, Pei J, Tong H. Data mining: concepts and techniques. 4th ed. USA: Morgan Kaufmann Publishers; 2022. Han J, Pei J, Tong H. Data mining: concepts and techniques. 4th ed. USA: Morgan Kaufmann Publishers; 2022.
15.
go back to reference Scheurwegs E, Sushil M, Tulkens S, Daelemans W, Luyckx K. Counting trees in random forests: predicting symptom severity in psychiatric intake reports. J Biomed Inform. 2017;75:S112–9.CrossRef Scheurwegs E, Sushil M, Tulkens S, Daelemans W, Luyckx K. Counting trees in random forests: predicting symptom severity in psychiatric intake reports. J Biomed Inform. 2017;75:S112–9.CrossRef
17.
go back to reference Yang L, Wu H, Jin X, Zheng P, Hu S, Xu X, et al. Study of cardiovascular disease prediction model based on random forest in eastern China. Sci Rep. 2020;10:1–8. Yang L, Wu H, Jin X, Zheng P, Hu S, Xu X, et al. Study of cardiovascular disease prediction model based on random forest in eastern China. Sci Rep. 2020;10:1–8.
18.
go back to reference Joloudari JH, Hassannataj Joloudari E, Saadatfar H, Ghasemigol M, Razavi SM, Mosavi A, et al. Coronary artery disease diagnosis; ranking the significant features using a random trees model. Int J Environ Res Public Health. 2020;17:731.CrossRefPubMedPubMedCentral Joloudari JH, Hassannataj Joloudari E, Saadatfar H, Ghasemigol M, Razavi SM, Mosavi A, et al. Coronary artery disease diagnosis; ranking the significant features using a random trees model. Int J Environ Res Public Health. 2020;17:731.CrossRefPubMedPubMedCentral
19.
go back to reference Guidi G, Pettenati MC, Melillo P, Iadanza E. A machine learning system to improve heart failure patient assistance. IEEE J Biomed Health Inform. 2014;18:1750–6.CrossRefPubMed Guidi G, Pettenati MC, Melillo P, Iadanza E. A machine learning system to improve heart failure patient assistance. IEEE J Biomed Health Inform. 2014;18:1750–6.CrossRefPubMed
20.
go back to reference Mohan S, Thirumalai C, Srivastava G. Effective heart disease prediction using hybrid machine learning techniques. IEEE access. 2019;7:81542–54.CrossRef Mohan S, Thirumalai C, Srivastava G. Effective heart disease prediction using hybrid machine learning techniques. IEEE access. 2019;7:81542–54.CrossRef
21.
go back to reference Ghosh P, Azam S, Jonkman M, Karim A, Shamrat FJM, Ignatious E, et al. Efficient prediction of cardiovascular disease using machine learning algorithms with relief and LASSO feature selection techniques. IEEE Access. 2021;9:19304–26.CrossRef Ghosh P, Azam S, Jonkman M, Karim A, Shamrat FJM, Ignatious E, et al. Efficient prediction of cardiovascular disease using machine learning algorithms with relief and LASSO feature selection techniques. IEEE Access. 2021;9:19304–26.CrossRef
22.
go back to reference Ashri SE, El-Gayar MM, El-Daydamony EM. HDPF: Heart Disease Prediction Framework Based on Hybrid Classifiers and Genetic Algorithm. IEEE Access. 2021;9:146797–809.CrossRef Ashri SE, El-Gayar MM, El-Daydamony EM. HDPF: Heart Disease Prediction Framework Based on Hybrid Classifiers and Genetic Algorithm. IEEE Access. 2021;9:146797–809.CrossRef
24.
go back to reference Sangsuriyun S, Liewlom P, Tangsakul S, Suchaiya S. Integrating fishbone diagram from descriptive and Ppredictive data mining for describing the relation between cardiovascular diseases and related items. In: Meesad P, Sodsee S, Jitsakul W, Tangwannawit S, editors. Proceedings of the 18th International Conference on Computing and Information Technology (IC2IT 2022). Lecture Notes in Networks and Systems, vol 453. Springer International Publishing; 2022. p. 53–67. https://link.springer.com/chapter/10.1007/978-3-030-99948-3_6. Sangsuriyun S, Liewlom P, Tangsakul S, Suchaiya S. Integrating fishbone diagram from descriptive and Ppredictive data mining for describing the relation between cardiovascular diseases and related items. In: Meesad P, Sodsee S, Jitsakul W, Tangwannawit S, editors. Proceedings of the 18th International Conference on Computing and Information Technology (IC2IT 2022). Lecture Notes in Networks and Systems, vol 453. Springer International Publishing; 2022. p. 53–67. https://​link.​springer.​com/​chapter/​10.​1007/​978-3-030-99948-3_​6.​
26.
go back to reference Liewlom P. Class-association-rules pruning by the profitability-of-interestingness measure: CASE STUDY OF AN IMBALANCED CLASS RATIO IN A BREAST CANCER DATASET. J Adv in Inf Technol. 2021;12:246–52. Liewlom P. Class-association-rules pruning by the profitability-of-interestingness measure: CASE STUDY OF AN IMBALANCED CLASS RATIO IN A BREAST CANCER DATASET. J Adv in Inf Technol. 2021;12:246–52.
33.
go back to reference Tan P-N, Steinbach M, Karpatne A, Kumar V. Association analysis: basic concepts and algorithms. In: Introduction to Data mining. 2nd ed. pearson; 2019. p. 357–449. Tan P-N, Steinbach M, Karpatne A, Kumar V. Association analysis: basic concepts and algorithms. In: Introduction to Data mining. 2nd ed. pearson; 2019. p. 357–449.
Metadata
Title
Descriptive forest: experiments on a novel tree-structure-generalization method for describing cardiovascular diseases
Author
Peera Liewlom
Publication date
01-12-2023
Publisher
BioMed Central
Published in
BMC Medical Informatics and Decision Making / Issue 1/2023
Electronic ISSN: 1472-6947
DOI
https://doi.org/10.1186/s12911-023-02228-x

Other articles of this Issue 1/2023

BMC Medical Informatics and Decision Making 1/2023 Go to the issue