Summary
Type 2 diabetes mellitus (T2DM) has become a prevalent health problem in China, especially in urban areas. Early prevention strategies are needed to reduce the associated mortality and morbidity. We applied the combination of rules and different machine learning techniques to assess the risk of development of T2DM in an urban Chinese adult population. A retrospective analysis was performed on 8000 people with non-diabetes and 3845 people with T2DM in Nanjing. Multilayer Perceptron (MLP), AdaBoost (AD), Trees Random Forest (TRF), Support Vector Machine (SVM), and Gradient Tree Boosting (GTB) machine learning techniques with 10 cross validation methods were used with the proposed model for the prediction of the risk of development of T2DM. The performance of these models was evaluated with accuracy, precision, sensitivity, specificity, and area under receiver operating characteristic (ROC) curve (AUC). After comparison, the prediction accuracy of the different five machine models was 0.87, 0.86, 0.86, 0.86 and 0.86 respectively. The combination model using the same voting weight of each component was built on T2DM, which was performed better than individual models. The findings indicate that, combining machine learning models could provide an accurate assessment model for T2DM risk prediction.
Similar content being viewed by others
References
Wang L, Gao P, Zhang M, et al. Prevalence and Ethnic Pattern of Diabetes and Prediabetes in China in 2013. JAMA, 2017, 317(24):2515–2523
Yang W, Lu J, Weng J, et al. Prevalence of diabetes among men and women in China. N Engl J Med, 2010, 362(12):1090–1101
Xu Y, Wang L, He J, et al. Prevalence and control of diabetes in Chinese adults. JAMA, 2013,310(9):948–959
Pan XR, Yang WY, Li GW, et al. Prevalence of diabetes and its risk factors in China, 1994. National Diabetes Prevention and Control Cooperative Group. Diabetes Care, 1997, 20(11):1664–1669
Li G, Zhang P, Wang J, et al. The long-term effect of lifestyle interventions to prevent diabetes in the China Da Qing Diabetes Prevention Study: a 20-year follow-up study. Lancet, 2008, 371(9626):1783–1789
Lindstrom J, Ilanne-Parikka P, Peltonen M, et al. Sustained reduction in the incidence of type 2 diabetes by lifestyle intervention: follow-up of the Finnish Diabetes Prevention Study. Lancet, 2006,368(9548):1673–1679
Knowler WC, Barrett-Connor E, Fowler SE, et al. Reduction in the incidence of type 2 diabetes with lifestyle intervention or metformin. N Engl J Med, 2002, 346(6):393–403
Knowler WC, Fowler SE, Hamman RF, et al. 10-year follow-up of diabetes incidence and weight loss in the Diabetes Prevention Program Outcomes Study. Lancet, 2009, 374(9702):1677–1686
Buijsse B, Simmons RK, Griffin SJ, et al. Risk assessment tools for identifying individuals at risk of developing type 2 diabetes. Epidemiol Rev, 2011, 33:46–62
Thoopputra T, Newby D, Schneider J, et al. Survey of diabetes risk assessment tools: concepts, structure and performance. Diabetes Metab Res Rev, 2012, 28(6):485–498
Abbasi A, Peelen LM, Corpeleijn E, et al. Prediction models for risk of developing type 2 diabetes: systematic literature search and independent external validation study. BMJ, 2012, 345:e5900
Collins GS, Mallett S, Omar O, et al. Developing risk prediction models for type 2 diabetes: a systematic review of methodology and reporting. BMC Med, 2011, 9:103
Noble D, Mathur R, Dent T, et al. Risk models and scores for type 2 diabetes: systematic review. BMJ, 2011, 343:d7163
Yoo I, Alafaireet P, Marinov M, et al. Data mining in healthcare and biomedicine: a survey of the literature. J Med Syst, 2012, 36(4):2431–2448
Barber SR, Davies MJ, Khunti K, et al. Risk assessment tools for detecting those with pre-diabetes: a systematic review. Diabetes Res Clin Pract, 2014, 105(1):1–13
Shankaracharya, Odedra D, Samanta S, et al. Computational intelligence in early diabetes diagnosis: a review. Rev Diabet Stud, 2010, 7(4):252–262
Choi SB, Kim WJ, Yoo TK, et al. Screening for prediabetes using machine learning models. Comput Math Methods Med, 2014, 2014:618976
Wang C, Li L, Wang L, et al. Evaluating the risk of type 2 diabetes mellitus using artificial neural network: an effective classification approach. Diabetes Res Clin Pract, 2013, 100(1):111–118
Mansour R, Eghbal Z, Amirhossein H. Comparison of Artificial Neural Network, Logistic Regression and Discriminant Analysis Efficiency in Determining Risk Factors of Type 2 Diabetes. World Appl Sci J, 2013, 23(11):1522–1529
Meng XH, Huang YX, Rao DP, et al. Comparison of three data mining models for predicting diabetes or prediabetes by risk factors. Kaohsiung J Med Sci, 2013, 29(2):93–99
Quinlan JR. Induction of decision trees. Machine Learning, 1986, 1(1):81–106
Seni G, Elder J. Ensemble Methods in Data Mining: Improving Accuracy Through Combining Predictions. USA: Morgan & Claypool Publishers. 2010.
Patel P, Macerollo A. Diabetes mellitus: diagnosis and screening. Am Fam Physician. 2010, 81(7):863–870
American Diabetes Association. 2. Classification and Diagnosis of Diabetes: Standards of Medical Care in Diabetes-2018. Diabetes Care, 2018, 1(Suppl 1):S13–S27
Gardner MW, Dorling SR. Artificial neural networks (the multilayer perceptron)—a review of applications in the atmospheric sciences. Atmos Environ, 1998, 32(14–15):2627–2636
Ferreira AJ, Figueiredo MAT. Boosting Algorithms: A Review of Methods, Theory, and Applications. Ensemble Machine Learning, 2012:35–85
Breiman L. Random Forests. Machine Learning, 2001, 45(1):5–32
Nazari Z, Kang D. Density Based Support Vector Machines for Classification. IJARAI, 2015, 4(4):64–76
Gerstein HC, Yusuf S, Bosch J, et al. Effect of rosiglitazone on the frequency of diabetes in patients with impaired glucose tolerance or impaired fasting glucose: a randomised controlled trial. Lancet, 2006, 368(9541):1096–1105
Norris SL, Kansagara D, Bougatsos C, et al. Screening adults for type 2 diabetes: a review of the evidence for the U.S. Preventive Services Task Force. Ann Intern Med, 2008, 148(11):855–868
Montazeri M, Nezamabadi-Pour H, editors. Automatic extraction of eye field from a gray intensity image using intensity filtering and hybrid projection function. International Conference on Communications, Computing and Control Applications. 2011.
Montazeri M, Nezamabadi-pour H, Montazeri M. Automatically Eye Detection with Different Gray Intensity Image Conditions. Computer Technol Appl, 2012 (8):525–532
Mitra M, Bahrololoum A, Nezamabadi-Pour H, et al, editors. Cooperating of Local Searches based Hyperheuristic Approach for Solving Traveling Salesman Problem. Ijcci, 2011.
Hashemian AH, Beiranvand B, Rezaei M, et al. Comparison of Artificial Neural Networks and Cox Regression Models in Prediction of Kidney Transplant Survival. Neuropharmacology, 2012, 62(4):1717–1729
Bang H, Edwards AM, Bomback AS, et al. Development and Validation of a Patient Self-assessment Score for Diabetes Risk. Ann Intern Med, 2009, 151(11):775–783
Lindström J, Tuomilehto J. The diabetes risk score: a practical tool to predict type 2 diabetes risk. Diabetes Care, 2003, 26(3):725–731
Schulze MB, Hoffmann K, Boeing H, et al. An Accurate Risk Score Based on Anthropometric, Dietary, and Lifestyle Factors to Predict the Development of Type 2 Diabetes. Diabetes Care, 2007, 30(8):e89
Glümer C, Carstensen B, Sandbæk A, et al. A Danish diabetes risk score for targeted screening: the Inter99 study. Diabetes Care, 2004, 27(3):727–733
Kahn HS, Cheng YJ, Thompson TJ, et al. Two risk-scoring systems for predicting incident diabetes mellitus in U.S. adults age 45 to 64 years. Ann Intern Med, 2009, 150(11):741–751
Ramachandran A, Snehalatha C, Vijay V, et al. Derivation and validation of diabetes risk score for urban Asian Indians. Diabetes Res Clin Pr, 2005, 70(1):63–70
Aekplakorn W, Bunnag P, Woodward M, et al. A Risk Score for Predicting Incident Diabetes in the Thai Population. Diabetes Care, 2006, 29(29):1872–1877
Gao WG, Dong YH, Pang ZC, et al. A simple Chinese risk score for undiagnosed diabetes. Diabetic Med, 2010, 27(3):274–281
Glümer C, Vistisen D, Borchjohnsen K, et al. Risk Scores for Type 2 Diabetes Can Be Applied in Some Populations but Not All. Diabetes Care, 2006, 29(2):410–414
Habibi S, Ahmadi M, Alizadeh S. Type 2 Diabetes Mellitus Screening and Risk Factors Using Decision Tree: Results of Data Mining. Glob J Health Sci, 2015, 7(5):304–310
Author information
Authors and Affiliations
Corresponding authors
Additional information
This work was supported by grants from the National Natural Science Foundation of China (No. 81570737, No. 81370947, No. 81570736, No. 81770819, No. 81500612, No. 81400832, No. 81600637, No. 81600632, and No. 81703294), the National Key Research and Development Program of China (No. 2016YFC1304804 and No. 2017YFC1309605), the Jiangsu Provincial Key Medical Discipline (No. ZDXKB2016012), the Key Project of Nanjing Clinical Medical Science, the Key Research and Development Program of Jiangsu Province of China (No. BE2015604 and No. BE2016606), the Jiangsu Provincial Medical Talent (No. ZDRCA2016062), and the Nanjing Science and Technology Development Project (No. 201605019).
Conflict of Interest Statement
The authors declare that they have no conflicts of interest.
Rights and permissions
About this article
Cite this article
Xiong, Xl., Zhang, Rx., Bi, Y. et al. Machine Learning Models in Type 2 Diabetes Risk Prediction: Results from a Cross-sectional Retrospective Study in Chinese Adults. CURR MED SCI 39, 582–588 (2019). https://doi.org/10.1007/s11596-019-2077-4
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11596-019-2077-4