Skip to main content
Top
Published in: BMC Medical Informatics and Decision Making 1/2019

Open Access 01-12-2019 | Stroke | Research article

Using machine learning models to improve stroke risk level classification methods of China national stroke screening

Authors: Xuemeng Li, Di Bian, Jinghui Yu, Mei Li, Dongsheng Zhao

Published in: BMC Medical Informatics and Decision Making | Issue 1/2019

Login to get access

Abstract

Background

With the character of high incidence, high prevalence and high mortality, stroke has brought a heavy burden to families and society in China. In 2009, the Ministry of Health of China launched the China national stroke screening and intervention program, which screens stroke and its risk factors and conducts high-risk population interventions for people aged above 40 years old all over China. In this program, stroke risk factors include hypertension, diabetes, dyslipidemia, smoking, lack of exercise, apparently overweight and family history of stroke. People with more than two risk factors or history of stroke or transient ischemic attack (TIA) are considered as high-risk. However, it is impossible for this criterion to classify stroke risk levels for people with unknown values in fields of risk factors. The missing of stroke risk levels results in reduced efficiency of stroke interventions and inaccuracies in statistical results at the national level. In this paper, we use 2017 national stroke screening data to develop stroke risk classification models based on machine learning algorithms to improve the classification efficiency.

Method

Firstly, we construct training set and test sets and process the imbalance training set based on oversampling and undersampling method. Then, we develop logistic regression model, Naïve Bayesian model, Bayesian network model, decision tree model, neural network model, random forest model, bagged decision tree model, voting model and boosting model with decision trees to classify stroke risk levels.

Result

The recall of the boosting model with decision trees is the highest (99.94%), and the precision of the model based on the random forest is highest (97.33%). Using the random forest model (recall: 98.44%), the recall will be increased by about 2.8% compared with the method currently used, and several thousands more people with high risk of stroke can be identified each year.

Conclusion

Models developed in this paper can improve the current screening method in the way that it can avoid the impact of unknown values, and avoid unnecessary rescreening and intervention expenditures. The national stroke screening program can choose classification models according to the practice need.
Appendix
Available only for authorised users
Footnotes
1
1,000,000*3%*19.7%*95.82% ≈ 6000
 
2
1,000,000*3%* (1–19.7%) * (1–36.35%) ≈ 15,000
 
Literature
1.
go back to reference Liu L, Wang D, Wong KS, Wang Y. Stroke and stroke care in China: huge burden, significant workload, and a national priority. Stroke. 2011;42:3651–4.CrossRef Liu L, Wang D, Wong KS, Wang Y. Stroke and stroke care in China: huge burden, significant workload, and a national priority. Stroke. 2011;42:3651–4.CrossRef
2.
go back to reference Liu M, et al. Stroke in China: epidemiology, prevention, and management strategies. Lancet Neurol. 2007;6:456–64.CrossRef Liu M, et al. Stroke in China: epidemiology, prevention, and management strategies. Lancet Neurol. 2007;6:456–64.CrossRef
3.
go back to reference Yu J, Mao H, Li M, et al. CSDC — A nationwide screening platform for stroke control and prevention in China. In: Proceedings of the 38th annual international conference of the IEEE Engineering in Medicine and Biology Society (EMBC 16); 2016. p. 2974. Yu J, Mao H, Li M, et al. CSDC — A nationwide screening platform for stroke control and prevention in China. In: Proceedings of the 38th annual international conference of the IEEE Engineering in Medicine and Biology Society (EMBC 16); 2016. p. 2974.
4.
go back to reference Wang L, An M, Zhang Z. Report on stroke prevention and treatment in China (Chinese version). China: People’s Medical Publishing House; 2018. Wang L, An M, Zhang Z. Report on stroke prevention and treatment in China (Chinese version). China: People’s Medical Publishing House; 2018.
5.
go back to reference Wang X, Fu Q, Song F, et al. Prevalence of atrial fibrillation in different socioeconomic regions of China and its association with stroke: results from a national stroke screening survey. Int J Cardiol. 2018;271:92–7.CrossRef Wang X, Fu Q, Song F, et al. Prevalence of atrial fibrillation in different socioeconomic regions of China and its association with stroke: results from a national stroke screening survey. Int J Cardiol. 2018;271:92–7.CrossRef
6.
go back to reference Wang X, Li W, Song F, et al. Carotid atherosclerosis detected by ultrasonography: a national cross-sectional study. J American Heart Assoc. 2018;7(8):1–14. Wang X, Li W, Song F, et al. Carotid atherosclerosis detected by ultrasonography: a national cross-sectional study. J American Heart Assoc. 2018;7(8):1–14.
7.
go back to reference Li W, Song F, Wang X, et al. Prevalence of metabolic syndrome among middle-aged and elderly adults in China: current status and temporal trends. Annals of medicine. 2018;50(4):345–53.CrossRef Li W, Song F, Wang X, et al. Prevalence of metabolic syndrome among middle-aged and elderly adults in China: current status and temporal trends. Annals of medicine. 2018;50(4):345–53.CrossRef
8.
go back to reference Chawla NV, Bowyer KW, Hall LO, et al. SMOTE: synthetic minority over-sampling technique. J Artif Intell Res. 2011;16(1):321–57. Chawla NV, Bowyer KW, Hall LO, et al. SMOTE: synthetic minority over-sampling technique. J Artif Intell Res. 2011;16(1):321–57.
9.
go back to reference Kermany DS, Goldbaum M, Cai W, et al. Identifying medical diagnoses and treatable diseases by image-based deep learning. Cell. 2018;172(5):1122–1131.e9.CrossRef Kermany DS, Goldbaum M, Cai W, et al. Identifying medical diagnoses and treatable diseases by image-based deep learning. Cell. 2018;172(5):1122–1131.e9.CrossRef
10.
go back to reference Hosmer DW Jr, Lemeshow S, Sturdivant RX. Applied logistic regression. New Zealan: Wiley; 2013. Hosmer DW Jr, Lemeshow S, Sturdivant RX. Applied logistic regression. New Zealan: Wiley; 2013.
11.
go back to reference Murphy KP. Naive bayes classifiers. Vancouver: University of British Columbia; 2006. p. 18. Murphy KP. Naive bayes classifiers. Vancouver: University of British Columbia; 2006. p. 18.
12.
go back to reference Friedman N, Dan G, Goldszmidt M. Bayesian network classifiers. Mach Learn. 1997;29(2–3):131–63.CrossRef Friedman N, Dan G, Goldszmidt M. Bayesian network classifiers. Mach Learn. 1997;29(2–3):131–63.CrossRef
13.
go back to reference Hagan MT, Beale M, Beale M. Neural network design; 2002. Hagan MT, Beale M, Beale M. Neural network design; 2002.
14.
go back to reference Liaw A, Wiener M. Classification and regression by random forest. R News. 2002;2(3):18–22. Liaw A, Wiener M. Classification and regression by random forest. R News. 2002;2(3):18–22.
15.
go back to reference Holmes G, Donkin A, Witten IH. WEKA: a machine learning workbench. New Zealand: The university of Waikato; 1994. Holmes G, Donkin A, Witten IH. WEKA: a machine learning workbench. New Zealand: The university of Waikato; 1994.
16.
go back to reference Singh S, Gupta P. Comparative study ID3, cart and C4. 5 decision tree algorithms: a survey. Int J Adv Inf Sci Technol. 2014;27(27):97–103. Singh S, Gupta P. Comparative study ID3, cart and C4. 5 decision tree algorithms: a survey. Int J Adv Inf Sci Technol. 2014;27(27):97–103.
17.
go back to reference Quinlan JR. C4. 5: programs for machine learning. Amsterdam: Elsevier; 2014. Quinlan JR. C4. 5: programs for machine learning. Amsterdam: Elsevier; 2014.
18.
go back to reference Li X, Liu H, Du X, et al. Integrated machine learning approaches for predicting ischemic stroke and thromboembolism in atrial fibrillation. AMIA Annu Symp Proc. 2017;2016:799.PubMedPubMedCentral Li X, Liu H, Du X, et al. Integrated machine learning approaches for predicting ischemic stroke and thromboembolism in atrial fibrillation. AMIA Annu Symp Proc. 2017;2016:799.PubMedPubMedCentral
19.
go back to reference Zhang Y, Zhou Y, Zhang D, et al. A stroke risk detection: improving hybrid feature selection method. J Med Internet Res. 2019;21(4):e12437.CrossRef Zhang Y, Zhou Y, Zhang D, et al. A stroke risk detection: improving hybrid feature selection method. J Med Internet Res. 2019;21(4):e12437.CrossRef
20.
go back to reference Asadi H, Dowling R, Yan B, et al. Machine learning for outcome prediction of acute ischemic stroke post intra-arterial therapy. PLoS One. 2014;9(2):e88225.CrossRef Asadi H, Dowling R, Yan B, et al. Machine learning for outcome prediction of acute ischemic stroke post intra-arterial therapy. PLoS One. 2014;9(2):e88225.CrossRef
21.
go back to reference Austin PC, Tu JV, Ho JE, et al. Using methods from the data-mining and machine-learning literature for disease classification and prediction: a case study examining classification of heart failure subtypes. J Clin Epidemiol. 2013;66(4):398–407.CrossRef Austin PC, Tu JV, Ho JE, et al. Using methods from the data-mining and machine-learning literature for disease classification and prediction: a case study examining classification of heart failure subtypes. J Clin Epidemiol. 2013;66(4):398–407.CrossRef
22.
go back to reference Kaur G, Chhabra A. Improved J48 classification algorithm for the prediction of diabetes. International Journal of Computer Applications. 2014;98(22):13–17.CrossRef Kaur G, Chhabra A. Improved J48 classification algorithm for the prediction of diabetes. International Journal of Computer Applications. 2014;98(22):13–17.CrossRef
23.
go back to reference Al-Maqaleh BM, Abdullah AMG. Intelligent predictive system using classification techniques for heart disease diagnosis. Int J Comput Sci Eng. 2017;6(6):145–51. Al-Maqaleh BM, Abdullah AMG. Intelligent predictive system using classification techniques for heart disease diagnosis. Int J Comput Sci Eng. 2017;6(6):145–51.
24.
go back to reference Jabbar MA, Deekshatulu BL, Chandra P. Prediction of heart disease using random forest and feature subset selection. In: Innovations in bio-inspired computing and applications. Cham: Springer; 2016. p. 187–96.CrossRef Jabbar MA, Deekshatulu BL, Chandra P. Prediction of heart disease using random forest and feature subset selection. In: Innovations in bio-inspired computing and applications. Cham: Springer; 2016. p. 187–96.CrossRef
25.
go back to reference Lee SJ, Xu Z, Li T, et al. A novel bagging C4.5 algorithm based on wrapper feature selection for supporting wise clinical decision making. J Biomed Inform. 2018;78:144–55.CrossRef Lee SJ, Xu Z, Li T, et al. A novel bagging C4.5 algorithm based on wrapper feature selection for supporting wise clinical decision making. J Biomed Inform. 2018;78:144–55.CrossRef
26.
go back to reference Bashir S, Qamar U, Khan FH. IntelliHealth: a medical decision support application using a novel weighted multi-layer classifier ensemble framework. J Biomed Inform. 2016;59:185–200.CrossRef Bashir S, Qamar U, Khan FH. IntelliHealth: a medical decision support application using a novel weighted multi-layer classifier ensemble framework. J Biomed Inform. 2016;59:185–200.CrossRef
27.
go back to reference Li X, Yu J, Li M, et al. Discover high-risk factor combinations using Bayesian network from national screening data in China. 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE. 2017. p. 1047–51. Li X, Yu J, Li M, et al. Discover high-risk factor combinations using Bayesian network from national screening data in China. 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE. 2017. p. 1047–51.
Metadata
Title
Using machine learning models to improve stroke risk level classification methods of China national stroke screening
Authors
Xuemeng Li
Di Bian
Jinghui Yu
Mei Li
Dongsheng Zhao
Publication date
01-12-2019
Publisher
BioMed Central
Keyword
Stroke
Published in
BMC Medical Informatics and Decision Making / Issue 1/2019
Electronic ISSN: 1472-6947
DOI
https://doi.org/10.1186/s12911-019-0998-2

Other articles of this Issue 1/2019

BMC Medical Informatics and Decision Making 1/2019 Go to the issue