Skip to main content
Top
Published in: BMC Medical Informatics and Decision Making 1/2019

Open Access 01-12-2019 | Research article

Predicting patient-reported outcomes following hip and knee replacement surgery using supervised machine learning

Authors: Manuel Huber, Christoph Kurz, Reiner Leidl

Published in: BMC Medical Informatics and Decision Making | Issue 1/2019

Login to get access

Abstract

Background

Machine-learning classifiers mostly offer good predictive performance and are increasingly used to support shared decision-making in clinical practice. Focusing on performance and practicability, this study evaluates prediction of patient-reported outcomes (PROs) by eight supervised classifiers including a linear model, following hip and knee replacement surgery.

Methods

NHS PRO data (130,945 observations) from April 2015 to April 2017 were used to train and test eight classifiers to predict binary postoperative improvement based on minimal important differences. Area under the receiver operating characteristic, J-statistic and several other metrics were calculated. The dependent outcomes were generic and disease-specific improvement based on the EQ-5D-3L visual analogue scale (VAS) as well as the Oxford Hip and Knee Score (Q score).

Results

The area under the receiver operating characteristic of the best training models was around 0.87 (VAS) and 0.78 (Q score) for hip replacement, while it was around 0.86 (VAS) and 0.70 (Q score) for knee replacement surgery. Extreme gradient boosting, random forests, multistep elastic net and linear model provided the highest overall J-statistics. Based on variable importance, the most important predictors for post-operative outcomes were preoperative VAS, Q score and single Q score dimensions. Sensitivity analysis for hip replacement VAS evaluated the influence of minimal important difference, patient selection criteria as well as additional data years. Together with a small benchmark of the NHS prediction model, robustness of our results was confirmed.

Conclusions

Supervised machine-learning implementations, like extreme gradient boosting, can provide better performance than linear models and should be considered, when high predictive performance is needed. Preoperative VAS, Q score and specific dimensions like limping are the most important predictors for postoperative hip and knee PROMs.

Literature
  1. Elwyn G, Frosch D, Thomson R, et al. Shared decision making: a model for clinical practice. J Gen Intern Med. 2012;27:1361–7.View Article
  2. Oshima Lee E, Emanuel EJ. Shared decision making to improve care and reduce costs. N Engl J Med. 2013;368:6–8.View Article
  3. Stacey D, Légaré F, Lewis K, et al. Decision aids for people facing health treatment or screening decisions. Cochrane Database of Systematic Reviews. 2017.
  4. Wagle NW. Care Redesign 2016: Implementing Patient-Reported Outcome Measures. NEJM Catalyst:2016.
  5. Devlin NJ, Appleby J. Getting the most out of PROMS. King's Fund, 2010.
  6. Nilsson E, Orwelius L, Kristenson M. Patient-reported outcomes in the Swedish National Quality Registers. J Intern Med. 2016;279:141–53.View Article
  7. Baumhauer JF. Patient-reported outcomes — are they living up to their potential? N Engl J Med. 2017;377:6–9.View Article
  8. Eneqvist T, Nemes S, Bulow E, et al. Can patient-reported outcomes predict re-operations after total hip replacement? Int Orthop. 2018;42:273–9.View Article
  9. ShahabiKargar Z, Khanna S, Good N, et al. Predicting procedure duration to improve scheduling of elective surgery. Cham: Springer International Publishing; 2014.View Article
  10. Kargar ZS, Khanna S, Sattar A. Using prediction to improve elective surgery scheduling. Australas Med J. 2013;6:287–9.View Article
  11. Wong DJN, Oliver CM, Moonesinghe SR. Predicting postoperative morbidity in adult elective surgical patients using the surgical outcome risk tool (SORT). Br J Anaesth. 2017;119:95–105.View Article
  12. Moonesinghe SR, Mythen MG, Das P, et al. Risk stratification tools for predicting morbidity and mortality in adult patients undergoing major SurgeryQualitative systematic review. Anesthesiology. 2013;119:959–81.View Article
  13. National Joint Registry. Joint replacement statistics. 2017.
  14. Miguel-Hurtado O, Guest R, Stevenage SV, et al. Comparing machine learning classifiers and linear/logistic regression to explore the relationship between Hand dimensions and demographic characteristics. PLoS One. 2016;11:e0165521.View Article
  15. Seligman B, Tuljapurkar S, Rehkopf D. Machine learning approaches to the social determinants of health in the health and retirement study. SSM - Population Health. 2018;4:95–9.View Article
  16. Singal AG, Mukherjee A, Elmunzer BJ, et al. Machine learning algorithms outperform conventional regression models in predicting development of hepatocellular carcinoma. Am J Gastroenterol. 2013;108:1723–30.View Article
  17. Rigg J, Lodhi H, Nasuti P. PRM130 - using machine learning to detect patients with undiagnosed rare diseases: an application of support vector machines to a rare oncology disease. Value Health. 2015;18:A705.View Article
  18. Chen JH, Asch SM. Machine learning and prediction in medicine — beyond the peak of inflated expectations. N Engl J Med. 2017;376:2507–9.View Article
  19. Wolpert DH. The lack of a priori distinctions between learning algorithms. Neural Comput. 1996;8:1341–90.View Article
  20. Wolpert DH. Macready WG. Santa Fe Institute: No Free Lunch Theorems for Search; 1995.
  21. L’Heureux A, Grolinger K, Elyamany HF, et al. Machine learning with big data: challenges and approaches. IEEE Access. 2017;5:7776–97.View Article
  22. Luo G. PredicT-ML: a tool for automating machine learning model building with big clinical data. Health Inf Sci Syst. 2016;4:5.View Article
  23. Deo RC. Machine learning in medicine. Circulation. 2015;132:1920.View Article
  24. National Health Service. Patient Reported Outcome Measures (PROMs). 2018.
  25. EuroQol--a new facility for the measurement of health-related quality of life. Health policy. 1990;16(3):199-208.
  26. Dawson J, Fitzpatrick R, Carr A, et al. Questionnaire on the perceptions of patients about total hip replacement. J Bone Joint Surg Br. 1996;78:185–90.View Article
  27. Dawson J, Fitzpatrick R, Murray D, et al. Questionnaire on the perceptions of patients about total knee replacement. J Bone Joint Surg Br. 1998;80:63–9.View Article
  28. Chawla NV, Bowyer KW, Hall LO, et al. SMOTE: synthetic minority over-sampling technique. J Artif Intell Res. 2002;16:321–57.View Article
  29. Batista GEAPA, Prati RC, Monard MC. A study of the behavior of several methods for balancing machine learning training data. SIGKDD Explor Newsl. 2004;6:20–9.View Article
  30. Thornton C, Hutter F, Hoos HH, et al. Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms. Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining. Chicago, Illinois, USA: ACM; 2013.View Article
  31. Kuhn M. Caret package. J Stat Softw. 2008;28:1–26.View Article
  32. Ericson G, Rohm WA, et al. Machine learning algorithm cheat sheet for Azure Machine Learning Studio. Microsoft. 2017. https://​docs.​microsoft.​com/​en-us/​azure/​machine-learning/​studio/​algorithm-cheat-sheet. Accessed 30 Dec 2018.
  33. Li H. Which machine learning algorithm should I use? The SAS Data Science Blog. 2017. https://​blogs.​sas.​com/​content/​subconsciousmusi​ngs/​2017/​04/​12/​machine-learning-algorithm-use. Accessed 30 Dec 2018.
  34. scikit-learn developers. Choosing the right estimator. scikit-learn. 2017. https://​scikit-learn.​org/​stable/​tutorial/​machine_​learning_​map/​index.​html. Accessed 30 Dec 2018.
  35. Sauer S, Buettner R, Heidenreich T, et al. Mindful machine learning. Eur J Psychol Assess. 2018;34:6–13.View Article
  36. Chen T, Guestrin C. XGBoost: a scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. San Francisco, California, USA: ACM; 2016.View Article
  37. Xiao N, Xu Q-S. Multi-step adaptive elastic-net: reducing false positives in high-dimensional variable selection. J Stat Comput Simul. 2015;85:3755–65.View Article
  38. Liaw A, Wiener M. Classification and regression by RandomForest; 2001.
  39. Kleene SC. Representation of events in nerve nets and finite automata. Rand project air force Santa Monica CA. 1951. https://​apps.​dtic.​mil/​dtic/​tr/​fulltext/​u2/​a596138.​pdf. Accessed 30 Dec 2018.
  40. Haykin S. Neural networks: a Comprehensive Foundation. Prentice Hall PTR, 1998.
  41. Hand DJ, Yu K. Idiot's Bayes: not so stupid after all? International Statistical Review. 2001;69(3):385–98.
  42. Cover T, Hart P. Nearest neighbor pattern classification. IEEE Trans Inf Theory. 1967;13:21–7.View Article
  43. Friedman J, Hastie T, Tibshirani R. Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors). Ann Stat. 2000;28:337–407.View Article
  44. Steinwart I, Christmann A. Support vector machines. Incorporated: Springer Publishing Company; 2008.
  45. Bradley AP. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recogn. 1997;30:1145–59.View Article
  46. Youden WJ. Index for rating diagnostic tests. Cancer. 1950;3:32–5.View Article
  47. Kuhn M. Variable importance using the caret package.
  48. NHS Digital. Patient reported outcome measures (PROMs) in England - a guide to PROMs methodology. NHS Digital. 2017. https://​digital.​nhs.​uk/​binaries/​content/​assets/​legacy/​pdf/​g/​t/​proms_​guide_​v12.​pdf. Accessed 30 Dec 2018.
  49. King MT. A point of minimal important difference (MID): a critique of terminology and methods. Expert Rev Pharmacoecon Outcomes Res. 2011;11:171–84.View Article
  50. Revicki D, Hays RD, Cella D, et al. Recommended methods for determining responsiveness and minimally important differences for patient-reported outcomes. J Clin Epidemiol. 2008;61:102–9.View Article
  51. Norman GR, Sloan JA, Wyrwich KW. Interpretation of changes in health-related quality of life: the remarkable universality of half a standard deviation. Med Care. 2003;41:582–92.PubMed
  52. Paulsen A, Roos EM, Pedersen AB, et al. Minimal clinically important improvement (MCII) and patient-acceptable symptom state (PASS) in total hip arthroplasty (THA) patients 1 year postoperatively. Acta Orthop. 2014;85:39–48.View Article
  53. Beard DJ, Harris K, Dawson J, et al. Meaningful changes for the Oxford hip and knee scores after joint replacement surgery. J Clin Epidemiol. 2015;68:73–9.View Article
  54. Nielsen D. Tree Boosting With XGBoost-Why Does XGBoost Win" Every" Machine Learning Competition? Master's thesis, NTNU. 2016. https://​brage.​bibsys.​no/​xmlui/​bitstream/​handle/​11250/​2433761/​16128_​FULLTEXT.​pdf. Accessed 30 Dec 2018.
  55. Kruse C, Eiken P, Vestergaard P. Machine learning principles can improve hip fracture prediction. Calcif Tissue Int. 2017;100:348–60.View Article
  56. Taylor RA, Moore CL, Cheung KH, et al. Predicting urinary tract infections in the emergency department with machine learning. PLoS One. 2018;13:e0194085.View Article
  57. Livne M, Boldsen JK, Mikkelsen IK, et al. Boosted tree model reforms multimodal magnetic resonance imaging infarct prediction in acute stroke. Stroke. 2018;49:912–8.View Article
  58. Babajide Mustapha I, Saeed F. Bioactive molecule prediction using extreme gradient boosting. Molecules. 2016;21.
  59. Sheridan RP, Wang WM, Liaw A, et al. Extreme gradient boosting as a method for quantitative structure-activity relationships. J Chem Inf Model. 2016;56:2353–60.View Article
  60. Mullainathan S, Spiess J. Machine learning: an applied econometric approach. J Econ Perspect. 2017;31:87–106.View Article
  61. Obermeyer Z, Emanuel EJ. Predicting the future — big data, machine learning, and clinical medicine. N Engl J Med. 2016;375:1216–9.View Article
  62. Bzdok D, Altman N, Krzywinski M. Points of significance: statistics versus machine learning. Nat Methods. 2018:1–7.
  63. Rose S. Robust machine learning variable importance analyses of medical conditions for health care spending. Health Serv Res. 2018.
  64. Lazzarini N, Runhaar J, Bay-Jensen AC, et al. A machine learning approach for the identification of new biomarkers for knee osteoarthritis development in overweight and obese women. Osteoarthr Cartil. 2017;25:2014–21.View Article
  65. Archer KJ, Kimes RV. Empirical characterization of random forest variable importance measures. Comput Stat Data Anal. 2008;52:2249–60.View Article
  66. Yao J, Levy-Chapira M, Margaryan M. Checking account activity and credit default risk of enterprises: An application of statistical learning methods. arXiv. 2017. preprint arXiv:1707.00757.
  67. Tsanas A, Xifara A. Accurate quantitative estimation of energy performance of residential buildings using statistical machine learning tools. Energ Buildings. 2012;49:560–7.View Article
  68. Goetz J, Brenning A, Petschko H, et al. Evaluating machine learning and statistical prediction techniques for landslide susceptibility modeling. Comput Geosci. 2015;81:1–11.View Article
  69. Vogl M, Wilkesmann R, Lausmann C, et al. The impact of preoperative patient characteristics on health states after total hip replacement and related satisfaction thresholds: a cohort study. Health Qual Life Outcomes. 2014;12:108.View Article
  70. Schilling CG, Dowsey MM, Petrie DJ, et al. Predicting the Long-Term Gains in Health-Related Quality of Life After Total Knee Arthroplasty. J Arthroplasty. 2017;32:395–401 e2.View Article
  71. Sprague S, Bhandari M, Heetveld MJ, et al. Factors associated with health-related quality of life, hip function, and health utility after operative management of femoral neck fractures. Bone Joint J. 2018;100-b:361–9.View Article
  72. Gutacker N, Street A. Use of large-scale HRQoL datasets to generate individualised predictions and inform patients about the likely benefit of surgery. Qual Life Res. 2017;26:2497–505.View Article
  73. Strobl C, Boulesteix A-L, Zeileis A, et al. Bias in random forest variable importance measures: illustrations, sources and a solution. BMC Bioinformatics. 2007;8:25–5.
  74. Liaw A, Wiener M. Classification and regression by randomForest. R news. 2002;2:18–22.
  75. Mandzuk LL, McMillan DE, Bohm ER. A longitudinal study of quality of life and functional status in total hip and total knee replacement patients. International journal of orthopaedic and trauma nursing. 2015;19:102–13.View Article
  76. Liebs TR, Herzberg W, Ruther W, et al. Quality-adjusted life years gained by hip and knee replacement surgery and its aftercare. Arch Phys Med Rehabil. 2016;97:691–700.View Article
  77. Selten EM, Geenen R, van der Laan WH, et al. Hierarchical structure and importance of patients' reasons for treatment choices in knee and hip osteoarthritis: a concept mapping study. Rheumatology (Oxford). 2017;56:271–8.View Article
  78. Feng Y, Parkin D, Devlin NJ. Assessing the performance of the EQ-VAS in the NHS PROMs programme. Qual Life Res. 2014;23:977–89.View Article
  79. Prodinger B, Taylor P. Improving quality of care through patient-reported outcome measures (PROMs): expert interviews using the NHS PROMs Programme and the Swedish quality registers for knee and hip arthroplasty as examples. BMC Health Serv Res. 2018;18:87.View Article
  80. Singh JA, Lewallen D. Age, gender, obesity, and depression are associated with patient-related pain and function outcome after revision total hip arthroplasty. Clin Rheumatol. 2009;28:1419–30.View Article
  81. Otero JE, Graves CM, Gao Y, et al. Patient-reported allergies predict worse outcomes after hip and knee arthroplasty: results from a prospective cohort study. J Arthroplast. 2016;31:2746–9.View Article
  82. Xu S, Chen JY, Lo NN, et al. The influence of obesity on functional outcome and quality of life after total knee arthroplasty. Bone Joint J. 2018;100-b:579–83.View Article
  83. Snell DL, Hipango J, Sinnott KA, et al. Rehabilitation after total joint replacement: a scoping study. Disabil Rehabil. 2018;40:1718–31.
  84. Pickard AS, Hung YT, Lin FJ, et al. Patient experience-based value sets: are they stable? Med Care. 2017;55:979–84.PubMed
  85. Olomu AB, Corser WD, Stommel M, et al. Do self-report and medical record comorbidity data predict longitudinal functional capacity and quality of life health outcomes similarly? BMC Health Serv Res. 2012;12:398–8.
  86. van den Akker M, van Steenkiste B, Krutwagen E, et al. Disease or no disease? Disagreement on diagnoses between self-reports and medical records of adult patients. Eur J Gen Pract. 2015;21:45–51.View Article
  87. Ye F, Moon DH, Carpenter WR, et al. Comparison of patient report and medical records of comorbidities: results from a population-based cohort of patients with prostate cancer. JAMA Oncology. 2017;3:1035–42.View Article
Metadata
Title
Predicting patient-reported outcomes following hip and knee replacement surgery using supervised machine learning
Authors
Manuel Huber
Christoph Kurz
Reiner Leidl
Publication date
01-12-2019
Publisher
BioMed Central
Published in
BMC Medical Informatics and Decision Making / Issue 1/2019
Electronic ISSN: 1472-6947
DOI
https://doi.org/10.1186/s12911-018-0731-6

Other articles of this Issue 1/2019

BMC Medical Informatics and Decision Making 1/2019 Go to the issue