Skip to main content
Log in

Machine Learning Models in Type 2 Diabetes Risk Prediction: Results from a Cross-sectional Retrospective Study in Chinese Adults

  • Published:
Current Medical Science Aims and scope Submit manuscript

Summary

Type 2 diabetes mellitus (T2DM) has become a prevalent health problem in China, especially in urban areas. Early prevention strategies are needed to reduce the associated mortality and morbidity. We applied the combination of rules and different machine learning techniques to assess the risk of development of T2DM in an urban Chinese adult population. A retrospective analysis was performed on 8000 people with non-diabetes and 3845 people with T2DM in Nanjing. Multilayer Perceptron (MLP), AdaBoost (AD), Trees Random Forest (TRF), Support Vector Machine (SVM), and Gradient Tree Boosting (GTB) machine learning techniques with 10 cross validation methods were used with the proposed model for the prediction of the risk of development of T2DM. The performance of these models was evaluated with accuracy, precision, sensitivity, specificity, and area under receiver operating characteristic (ROC) curve (AUC). After comparison, the prediction accuracy of the different five machine models was 0.87, 0.86, 0.86, 0.86 and 0.86 respectively. The combination model using the same voting weight of each component was built on T2DM, which was performed better than individual models. The findings indicate that, combining machine learning models could provide an accurate assessment model for T2DM risk prediction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Wang L, Gao P, Zhang M, et al. Prevalence and Ethnic Pattern of Diabetes and Prediabetes in China in 2013. JAMA, 2017, 317(24):2515–2523

    Article  PubMed  PubMed Central  Google Scholar 

  2. Yang W, Lu J, Weng J, et al. Prevalence of diabetes among men and women in China. N Engl J Med, 2010, 362(12):1090–1101

    Article  CAS  PubMed  Google Scholar 

  3. Xu Y, Wang L, He J, et al. Prevalence and control of diabetes in Chinese adults. JAMA, 2013,310(9):948–959

    Article  CAS  PubMed  Google Scholar 

  4. Pan XR, Yang WY, Li GW, et al. Prevalence of diabetes and its risk factors in China, 1994. National Diabetes Prevention and Control Cooperative Group. Diabetes Care, 1997, 20(11):1664–1669

    Article  CAS  PubMed  Google Scholar 

  5. Li G, Zhang P, Wang J, et al. The long-term effect of lifestyle interventions to prevent diabetes in the China Da Qing Diabetes Prevention Study: a 20-year follow-up study. Lancet, 2008, 371(9626):1783–1789

    Article  PubMed  Google Scholar 

  6. Lindstrom J, Ilanne-Parikka P, Peltonen M, et al. Sustained reduction in the incidence of type 2 diabetes by lifestyle intervention: follow-up of the Finnish Diabetes Prevention Study. Lancet, 2006,368(9548):1673–1679

    Google Scholar 

  7. Knowler WC, Barrett-Connor E, Fowler SE, et al. Reduction in the incidence of type 2 diabetes with lifestyle intervention or metformin. N Engl J Med, 2002, 346(6):393–403

    Article  CAS  PubMed  Google Scholar 

  8. Knowler WC, Fowler SE, Hamman RF, et al. 10-year follow-up of diabetes incidence and weight loss in the Diabetes Prevention Program Outcomes Study. Lancet, 2009, 374(9702):1677–1686

    Article  PubMed  Google Scholar 

  9. Buijsse B, Simmons RK, Griffin SJ, et al. Risk assessment tools for identifying individuals at risk of developing type 2 diabetes. Epidemiol Rev, 2011, 33:46–62

    Article  PubMed  PubMed Central  Google Scholar 

  10. Thoopputra T, Newby D, Schneider J, et al. Survey of diabetes risk assessment tools: concepts, structure and performance. Diabetes Metab Res Rev, 2012, 28(6):485–498

    Article  PubMed  Google Scholar 

  11. Abbasi A, Peelen LM, Corpeleijn E, et al. Prediction models for risk of developing type 2 diabetes: systematic literature search and independent external validation study. BMJ, 2012, 345:e5900

    Article  PubMed  PubMed Central  Google Scholar 

  12. Collins GS, Mallett S, Omar O, et al. Developing risk prediction models for type 2 diabetes: a systematic review of methodology and reporting. BMC Med, 2011, 9:103

    Article  PubMed  PubMed Central  Google Scholar 

  13. Noble D, Mathur R, Dent T, et al. Risk models and scores for type 2 diabetes: systematic review. BMJ, 2011, 343:d7163

    Article  PubMed  PubMed Central  Google Scholar 

  14. Yoo I, Alafaireet P, Marinov M, et al. Data mining in healthcare and biomedicine: a survey of the literature. J Med Syst, 2012, 36(4):2431–2448

    Article  PubMed  Google Scholar 

  15. Barber SR, Davies MJ, Khunti K, et al. Risk assessment tools for detecting those with pre-diabetes: a systematic review. Diabetes Res Clin Pract, 2014, 105(1):1–13

    Article  PubMed  Google Scholar 

  16. Shankaracharya, Odedra D, Samanta S, et al. Computational intelligence in early diabetes diagnosis: a review. Rev Diabet Stud, 2010, 7(4):252–262

    Article  CAS  PubMed  Google Scholar 

  17. Choi SB, Kim WJ, Yoo TK, et al. Screening for prediabetes using machine learning models. Comput Math Methods Med, 2014, 2014:618976

  18. Wang C, Li L, Wang L, et al. Evaluating the risk of type 2 diabetes mellitus using artificial neural network: an effective classification approach. Diabetes Res Clin Pract, 2013, 100(1):111–118

    Article  Google Scholar 

  19. Mansour R, Eghbal Z, Amirhossein H. Comparison of Artificial Neural Network, Logistic Regression and Discriminant Analysis Efficiency in Determining Risk Factors of Type 2 Diabetes. World Appl Sci J, 2013, 23(11):1522–1529

    Google Scholar 

  20. Meng XH, Huang YX, Rao DP, et al. Comparison of three data mining models for predicting diabetes or prediabetes by risk factors. Kaohsiung J Med Sci, 2013, 29(2):93–99

    Article  PubMed  Google Scholar 

  21. Quinlan JR. Induction of decision trees. Machine Learning, 1986, 1(1):81–106

    Google Scholar 

  22. Seni G, Elder J. Ensemble Methods in Data Mining: Improving Accuracy Through Combining Predictions. USA: Morgan & Claypool Publishers. 2010.

    Google Scholar 

  23. Patel P, Macerollo A. Diabetes mellitus: diagnosis and screening. Am Fam Physician. 2010, 81(7):863–870

    PubMed  Google Scholar 

  24. American Diabetes Association. 2. Classification and Diagnosis of Diabetes: Standards of Medical Care in Diabetes-2018. Diabetes Care, 2018, 1(Suppl 1):S13–S27

    Article  Google Scholar 

  25. Gardner MW, Dorling SR. Artificial neural networks (the multilayer perceptron)—a review of applications in the atmospheric sciences. Atmos Environ, 1998, 32(14–15):2627–2636

    Article  CAS  Google Scholar 

  26. Ferreira AJ, Figueiredo MAT. Boosting Algorithms: A Review of Methods, Theory, and Applications. Ensemble Machine Learning, 2012:35–85

    Chapter  Google Scholar 

  27. Breiman L. Random Forests. Machine Learning, 2001, 45(1):5–32

    Article  Google Scholar 

  28. Nazari Z, Kang D. Density Based Support Vector Machines for Classification. IJARAI, 2015, 4(4):64–76

    Article  CAS  Google Scholar 

  29. Gerstein HC, Yusuf S, Bosch J, et al. Effect of rosiglitazone on the frequency of diabetes in patients with impaired glucose tolerance or impaired fasting glucose: a randomised controlled trial. Lancet, 2006, 368(9541):1096–1105

    Article  CAS  PubMed  Google Scholar 

  30. Norris SL, Kansagara D, Bougatsos C, et al. Screening adults for type 2 diabetes: a review of the evidence for the U.S. Preventive Services Task Force. Ann Intern Med, 2008, 148(11):855–868

    Article  PubMed  Google Scholar 

  31. Montazeri M, Nezamabadi-Pour H, editors. Automatic extraction of eye field from a gray intensity image using intensity filtering and hybrid projection function. International Conference on Communications, Computing and Control Applications. 2011.

  32. Montazeri M, Nezamabadi-pour H, Montazeri M. Automatically Eye Detection with Different Gray Intensity Image Conditions. Computer Technol Appl, 2012 (8):525–532

    Google Scholar 

  33. Mitra M, Bahrololoum A, Nezamabadi-Pour H, et al, editors. Cooperating of Local Searches based Hyperheuristic Approach for Solving Traveling Salesman Problem. Ijcci, 2011.

  34. Hashemian AH, Beiranvand B, Rezaei M, et al. Comparison of Artificial Neural Networks and Cox Regression Models in Prediction of Kidney Transplant Survival. Neuropharmacology, 2012, 62(4):1717–1729

    Article  CAS  Google Scholar 

  35. Bang H, Edwards AM, Bomback AS, et al. Development and Validation of a Patient Self-assessment Score for Diabetes Risk. Ann Intern Med, 2009, 151(11):775–783

    Article  PubMed  PubMed Central  Google Scholar 

  36. Lindström J, Tuomilehto J. The diabetes risk score: a practical tool to predict type 2 diabetes risk. Diabetes Care, 2003, 26(3):725–731

    Article  PubMed  Google Scholar 

  37. Schulze MB, Hoffmann K, Boeing H, et al. An Accurate Risk Score Based on Anthropometric, Dietary, and Lifestyle Factors to Predict the Development of Type 2 Diabetes. Diabetes Care, 2007, 30(8):e89

    Google Scholar 

  38. Glümer C, Carstensen B, Sandbæk A, et al. A Danish diabetes risk score for targeted screening: the Inter99 study. Diabetes Care, 2004, 27(3):727–733

    Article  PubMed  Google Scholar 

  39. Kahn HS, Cheng YJ, Thompson TJ, et al. Two risk-scoring systems for predicting incident diabetes mellitus in U.S. adults age 45 to 64 years. Ann Intern Med, 2009, 150(11):741–751

    Article  PubMed  Google Scholar 

  40. Ramachandran A, Snehalatha C, Vijay V, et al. Derivation and validation of diabetes risk score for urban Asian Indians. Diabetes Res Clin Pr, 2005, 70(1):63–70

    Article  CAS  Google Scholar 

  41. Aekplakorn W, Bunnag P, Woodward M, et al. A Risk Score for Predicting Incident Diabetes in the Thai Population. Diabetes Care, 2006, 29(29):1872–1877

    Article  PubMed  Google Scholar 

  42. Gao WG, Dong YH, Pang ZC, et al. A simple Chinese risk score for undiagnosed diabetes. Diabetic Med, 2010, 27(3):274–281

    Article  CAS  PubMed  Google Scholar 

  43. Glümer C, Vistisen D, Borchjohnsen K, et al. Risk Scores for Type 2 Diabetes Can Be Applied in Some Populations but Not All. Diabetes Care, 2006, 29(2):410–414

    Article  PubMed  Google Scholar 

  44. Habibi S, Ahmadi M, Alizadeh S. Type 2 Diabetes Mellitus Screening and Risk Factors Using Decision Tree: Results of Data Mining. Glob J Health Sci, 2015, 7(5):304–310

    Article  PubMed  PubMed Central  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Wei-hong Zhou, Yun Yu or Da-long Zhu.

Additional information

This work was supported by grants from the National Natural Science Foundation of China (No. 81570737, No. 81370947, No. 81570736, No. 81770819, No. 81500612, No. 81400832, No. 81600637, No. 81600632, and No. 81703294), the National Key Research and Development Program of China (No. 2016YFC1304804 and No. 2017YFC1309605), the Jiangsu Provincial Key Medical Discipline (No. ZDXKB2016012), the Key Project of Nanjing Clinical Medical Science, the Key Research and Development Program of Jiangsu Province of China (No. BE2015604 and No. BE2016606), the Jiangsu Provincial Medical Talent (No. ZDRCA2016062), and the Nanjing Science and Technology Development Project (No. 201605019).

Conflict of Interest Statement

The authors declare that they have no conflicts of interest.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xiong, Xl., Zhang, Rx., Bi, Y. et al. Machine Learning Models in Type 2 Diabetes Risk Prediction: Results from a Cross-sectional Retrospective Study in Chinese Adults. CURR MED SCI 39, 582–588 (2019). https://doi.org/10.1007/s11596-019-2077-4

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11596-019-2077-4

Key words

Navigation