Skip to main content
Top
Published in: BMC Medical Research Methodology 1/2021

Open Access 01-12-2021 | Research

Predicting postoperative surgical site infection with administrative data: a random forests algorithm

Authors: Yelena Petrosyan, Kednapa Thavorn, Glenys Smith, Malcolm Maclure, Roanne Preston, Carl van Walravan, Alan J. Forster

Published in: BMC Medical Research Methodology | Issue 1/2021

Login to get access

Abstract

Background

Since primary data collection can be time-consuming and expensive, surgical site infections (SSIs) could ideally be monitored using routinely collected administrative data. We derived and internally validated efficient algorithms to identify SSIs within 30 days after surgery with health administrative data, using Machine Learning algorithms.

Methods

All patients enrolled in the National Surgical Quality Improvement Program from the Ottawa Hospital were linked to administrative datasets in Ontario, Canada. Machine Learning approaches, including a Random Forests algorithm and the high-performance logistic regression, were used to derive parsimonious models to predict SSI status. Finally, a risk score methodology was used to transform the final models into the risk score system. The SSI risk models were validated in the validation datasets.

Results

Of 14,351 patients, 795 (5.5%) had an SSI. First, separate predictive models were built for three distinct administrative datasets. The final model, including hospitalization diagnostic, physician diagnostic and procedure codes, demonstrated excellent discrimination (C statistics, 0.91, 95% CI, 0.90–0.92) and calibration (Hosmer-Lemeshow χ2 statistics, 4.531, p = 0.402).

Conclusion

We demonstrated that health administrative data can be effectively used to identify SSIs. Machine learning algorithms have shown a high degree of accuracy in predicting postoperative SSIs and can integrate and utilize a large amount of administrative data. External validation of this model is required before it can be routinely used to identify SSIs.
Appendix
Available only for authorised users
Literature
1.
go back to reference Pittet D, Harbarth S, Ruef C, Francioli P, Sudre P, Petignat C, et al. Prevalence and risk factors for nosocomial infections in four university hospitals in Switzerland. Infect Control Hosp Epidemiol. 1999;20(1):37–42. Pittet D, Harbarth S, Ruef C, Francioli P, Sudre P, Petignat C, et al. Prevalence and risk factors for nosocomial infections in four university hospitals in Switzerland. Infect Control Hosp Epidemiol. 1999;20(1):37–42.
2.
go back to reference Petrosyan Y, Thavorn K, Maclure M, Smith G, McIsaac DI, Schramm D, et al. Long-term health outcomes and health system costs associated with surgical site infections: a retrospective cohort study. Ann Surg. 2019. Petrosyan Y, Thavorn K, Maclure M, Smith G, McIsaac DI, Schramm D, et al. Long-term health outcomes and health system costs associated with surgical site infections: a retrospective cohort study. Ann Surg. 2019.
3.
go back to reference Jenks PJ, Laurent M, McQuarry S, Watkins R. Clinical and economic burden of surgical site infection (SSI) and predicted financial consequences of elimination of SSI from an English hospital. J Hosp Infect. 2014;86(1):24–33.CrossRef Jenks PJ, Laurent M, McQuarry S, Watkins R. Clinical and economic burden of surgical site infection (SSI) and predicted financial consequences of elimination of SSI from an English hospital. J Hosp Infect. 2014;86(1):24–33.CrossRef
4.
go back to reference Whitehouse JD, Friedman ND, Kirkland KB, Richardson WJ, Sexton DJ. The impact of surgical-site infections following orthopedic surgery at a community hospital and a university hospital: adverse quality of life, excess length of stay, and extra cost. Infect Control Hosp Epidemiol. 2002;23(4):183–9.CrossRef Whitehouse JD, Friedman ND, Kirkland KB, Richardson WJ, Sexton DJ. The impact of surgical-site infections following orthopedic surgery at a community hospital and a university hospital: adverse quality of life, excess length of stay, and extra cost. Infect Control Hosp Epidemiol. 2002;23(4):183–9.CrossRef
5.
go back to reference Badia JM, Casey AL, Petrosillo N, Hudson PM, Mitchell SA, Crosby C. Impact of surgical site infection on healthcare costs and patient outcomes: a systematic review in six European countries. J Hosp Infect. 2017;96(1):1–15.CrossRef Badia JM, Casey AL, Petrosillo N, Hudson PM, Mitchell SA, Crosby C. Impact of surgical site infection on healthcare costs and patient outcomes: a systematic review in six European countries. J Hosp Infect. 2017;96(1):1–15.CrossRef
6.
go back to reference van Walraven C, Jackson TD, Daneman N. Derivation and validation of the surgical site infections risk model using health administrative data. Infect Control Hosp Epidemiol. 2016;37(4):455–65.CrossRef van Walraven C, Jackson TD, Daneman N. Derivation and validation of the surgical site infections risk model using health administrative data. Infect Control Hosp Epidemiol. 2016;37(4):455–65.CrossRef
7.
go back to reference Grammatico-Guillon L, Baron S, Gaborit C, Rusch E, Astagneau P. Quality assessment of hospital discharge database for routine surveillance of hip and knee arthroplasty-related infections. Infect Control Hosp Epidemiol. 2014;35(6):646–51.CrossRef Grammatico-Guillon L, Baron S, Gaborit C, Rusch E, Astagneau P. Quality assessment of hospital discharge database for routine surveillance of hip and knee arthroplasty-related infections. Infect Control Hosp Epidemiol. 2014;35(6):646–51.CrossRef
8.
go back to reference Rennert-May E, Manns B, Smith S, Puloski S, Henderson E, Au F, et al. Validity of administrative data in identifying complex surgical site infections from a population-based cohort after primary hip and knee arthroplasty in Alberta, Canada. Am J Infect Control. 2018;46(10):1123–6.CrossRef Rennert-May E, Manns B, Smith S, Puloski S, Henderson E, Au F, et al. Validity of administrative data in identifying complex surgical site infections from a population-based cohort after primary hip and knee arthroplasty in Alberta, Canada. Am J Infect Control. 2018;46(10):1123–6.CrossRef
9.
go back to reference Sands K, Vineyard G, Livingston J, Christiansen C, Platt R. Efficient identification of postdischarge surgical site infections: use of automated pharmacy dispensing information, administrative data, and medical record information. J Infect Dis. 1999;179(2):434–41.CrossRef Sands K, Vineyard G, Livingston J, Christiansen C, Platt R. Efficient identification of postdischarge surgical site infections: use of automated pharmacy dispensing information, administrative data, and medical record information. J Infect Dis. 1999;179(2):434–41.CrossRef
10.
go back to reference van Walraven C, Jackson TD, Daneman N. Administrative data measured surgical site infection probability within 30 days of surgery in elderly patients. J Clin Epidemiol. 2016;77:112–7.CrossRef van Walraven C, Jackson TD, Daneman N. Administrative data measured surgical site infection probability within 30 days of surgery in elderly patients. J Clin Epidemiol. 2016;77:112–7.CrossRef
11.
go back to reference Song X, Cosgrove S, Pass M, Perl T. Using hospital claim data to monitor surgical site infections for inpatient procedures. Am J Infect Control. 2008;36(3). Song X, Cosgrove S, Pass M, Perl T. Using hospital claim data to monitor surgical site infections for inpatient procedures. Am J Infect Control. 2008;36(3).
12.
go back to reference Cohen AM, Ambert K, McDonagh M. A prospective evaluation of an automated classification system to support evidence-based medicine and systematic review. AMIA Annu Symp Proc. 2010;2010:121–5.PubMedPubMedCentral Cohen AM, Ambert K, McDonagh M. A prospective evaluation of an automated classification system to support evidence-based medicine and systematic review. AMIA Annu Symp Proc. 2010;2010:121–5.PubMedPubMedCentral
13.
go back to reference Szlosek DA, Ferrett J. Using Machine Learning and Natural Language Processing Algorithms to Automate the Evaluation of Clinical Decision Support in Electronic Medical Record Systems. EGEMS (Wash DC). 2016;4(3):1222. Szlosek DA, Ferrett J. Using Machine Learning and Natural Language Processing Algorithms to Automate the Evaluation of Clinical Decision Support in Electronic Medical Record Systems. EGEMS (Wash DC). 2016;4(3):1222.
14.
go back to reference Maroco J, Silva D, Rodrigues A, Guerreiro M, Santana I, de Mendonca A. Data mining methods in the prediction of dementia: a real-data comparison of the accuracy, sensitivity and specificity of linear discriminant analysis, logistic regression, neural networks, support vector machines, classification trees and random forests. BMC Res Notes. 2011;4:299.CrossRef Maroco J, Silva D, Rodrigues A, Guerreiro M, Santana I, de Mendonca A. Data mining methods in the prediction of dementia: a real-data comparison of the accuracy, sensitivity and specificity of linear discriminant analysis, logistic regression, neural networks, support vector machines, classification trees and random forests. BMC Res Notes. 2011;4:299.CrossRef
15.
go back to reference Douglas PK, Harris S, Yuille A, Cohen MS. Performance comparison of machine learning algorithms and number of independent components used in fMRI decoding of belief vs. disbelief. Neuroimage. 2011;56(2):544–53.CrossRef Douglas PK, Harris S, Yuille A, Cohen MS. Performance comparison of machine learning algorithms and number of independent components used in fMRI decoding of belief vs. disbelief. Neuroimage. 2011;56(2):544–53.CrossRef
16.
go back to reference Li J, Alvarez B, Siwabessy J, Tran M, Huang Z, Przeslawski R, et al. Application of random forest, generalised linear model and their hybrid methods with geostatistical techniques to count data: predicting sponge species richness. Environ Model Softw. 2017;97:112–29.CrossRef Li J, Alvarez B, Siwabessy J, Tran M, Huang Z, Przeslawski R, et al. Application of random forest, generalised linear model and their hybrid methods with geostatistical techniques to count data: predicting sponge species richness. Environ Model Softw. 2017;97:112–29.CrossRef
17.
go back to reference Li J, Tran M, Siwabessy J. Selecting optimal random Forest predictive models: a case study on predicting the spatial distribution of seabed hardness. PLoS One. 2016;11(2):e0149089.CrossRef Li J, Tran M, Siwabessy J. Selecting optimal random Forest predictive models: a case study on predicting the spatial distribution of seabed hardness. PLoS One. 2016;11(2):e0149089.CrossRef
18.
go back to reference Bartz-Kurycki MA, Green C, Anderson KT, Alder AC, Bucher BT, Cina RA, et al. Enhanced neonatal surgical site infection prediction model utilizing statistically and clinically significant variables in combination with a machine learning algorithm. Am J Surg. 2018;216(4):764–77.CrossRef Bartz-Kurycki MA, Green C, Anderson KT, Alder AC, Bucher BT, Cina RA, et al. Enhanced neonatal surgical site infection prediction model utilizing statistically and clinically significant variables in combination with a machine learning algorithm. Am J Surg. 2018;216(4):764–77.CrossRef
19.
go back to reference Breiman L. Random Forests Machine Learning. 2001;45:5–32. Breiman L. Random Forests Machine Learning. 2001;45:5–32.
20.
go back to reference Touw WG, Bayjanov JR, Overmars L, Backus L, Boekhorst J, Wels M, et al. Data mining in the life sciences with random Forest: a walk in the park or lost in the jungle? Brief Bioinform. 2013;14(3):315–26.CrossRef Touw WG, Bayjanov JR, Overmars L, Backus L, Boekhorst J, Wels M, et al. Data mining in the life sciences with random Forest: a walk in the park or lost in the jungle? Brief Bioinform. 2013;14(3):315–26.CrossRef
21.
go back to reference Liam A, Wiener M. Classification and regression by random forest. R News. 2002;2(3):18–22. Liam A, Wiener M. Classification and regression by random forest. R News. 2002;2(3):18–22.
22.
go back to reference Ozcift A. Enhanced cancer recognition system based on random forests feature elimination algorithm. J Med Syst. 2012;36(4):2577–85.CrossRef Ozcift A. Enhanced cancer recognition system based on random forests feature elimination algorithm. J Med Syst. 2012;36(4):2577–85.CrossRef
23.
go back to reference Diaz-Uriarte R, Alvarez de Andres S. Gene selection and classification of microarray data using random forest. BMC Bioinformatics. 2006;7:3.CrossRef Diaz-Uriarte R, Alvarez de Andres S. Gene selection and classification of microarray data using random forest. BMC Bioinformatics. 2006;7:3.CrossRef
24.
go back to reference Doerken S, Avalos M, Lagarde E, Schumacher M. Penalized logistic regression with low prevalence exposures beyond high dimensional settings. PLoS One. 2019;14(5):e0217057.CrossRef Doerken S, Avalos M, Lagarde E, Schumacher M. Penalized logistic regression with low prevalence exposures beyond high dimensional settings. PLoS One. 2019;14(5):e0217057.CrossRef
25.
go back to reference Yao D, Yang J, Zhan X. A novel method for disease prediction: hybrid of random Forest and multivariate adaptive regression splines. J Comput. 2013;8(1):170–7.CrossRef Yao D, Yang J, Zhan X. A novel method for disease prediction: hybrid of random Forest and multivariate adaptive regression splines. J Comput. 2013;8(1):170–7.CrossRef
26.
go back to reference Cohen R. “SAS Meets Big Iron: High Performance Computing in SAS Analytical Procedures,” in Proceedings of the Twenty-Seventh Annual SAS Users Group International Conference. Cary, NC: SASInstitute Inc.; 2002. Cohen R. “SAS Meets Big Iron: High Performance Computing in SAS Analytical Procedures,” in Proceedings of the Twenty-Seventh Annual SAS Users Group International Conference. Cary, NC: SASInstitute Inc.; 2002.
27.
go back to reference Sullivan LM, Massaro JM, D'Agostino RB Sr. Presentation of multivariate data for clinical use: the Framingham study risk score functions. Stat Med. 2004;23(10):1631–60.CrossRef Sullivan LM, Massaro JM, D'Agostino RB Sr. Presentation of multivariate data for clinical use: the Framingham study risk score functions. Stat Med. 2004;23(10):1631–60.CrossRef
28.
go back to reference Wu C, Hannan EL, Walford G, Ambrose JA, Holmes DR Jr, King SB 3rd, et al. A risk score to predict in-hospital mortality for percutaneous coronary interventions. J Am Coll Cardiol. 2006;47(3):654–60.CrossRef Wu C, Hannan EL, Walford G, Ambrose JA, Holmes DR Jr, King SB 3rd, et al. A risk score to predict in-hospital mortality for percutaneous coronary interventions. J Am Coll Cardiol. 2006;47(3):654–60.CrossRef
29.
go back to reference Hong W, Lillemoe KD, Pan S, Zimmer V, Kontopantelis E, Stock S, et al. Development and validation of a risk prediction score for severe acute pancreatitis. J Transl Med. 2019;17(1):146.CrossRef Hong W, Lillemoe KD, Pan S, Zimmer V, Kontopantelis E, Stock S, et al. Development and validation of a risk prediction score for severe acute pancreatitis. J Transl Med. 2019;17(1):146.CrossRef
30.
go back to reference Austin PC, Lee DS, D'Agostino RB, Fine JP. Developing points-based risk-scoring systems in the presence of competing risks. Stat Med. 2016;35(22):4056–72.CrossRef Austin PC, Lee DS, D'Agostino RB, Fine JP. Developing points-based risk-scoring systems in the presence of competing risks. Stat Med. 2016;35(22):4056–72.CrossRef
31.
go back to reference Luo W, Phung D, Tran T, Gupta S, Rana S, Karmakar C, et al. Guidelines for Developing and Reporting Machine Learning Predictive Models in Biomedical Research: A Multidisciplinary View. J Med Internet Res. 2016;18(12):e323-e.CrossRef Luo W, Phung D, Tran T, Gupta S, Rana S, Karmakar C, et al. Guidelines for Developing and Reporting Machine Learning Predictive Models in Biomedical Research: A Multidisciplinary View. J Med Internet Res. 2016;18(12):e323-e.CrossRef
32.
go back to reference Streiner DL, Cairney J. What's under the ROC? An introduction to receiver operating characteristics curves. Can J Psychiatr. 2007;52(2):121–8.CrossRef Streiner DL, Cairney J. What's under the ROC? An introduction to receiver operating characteristics curves. Can J Psychiatr. 2007;52(2):121–8.CrossRef
Metadata
Title
Predicting postoperative surgical site infection with administrative data: a random forests algorithm
Authors
Yelena Petrosyan
Kednapa Thavorn
Glenys Smith
Malcolm Maclure
Roanne Preston
Carl van Walravan
Alan J. Forster
Publication date
01-12-2021
Publisher
BioMed Central
Published in
BMC Medical Research Methodology / Issue 1/2021
Electronic ISSN: 1471-2288
DOI
https://doi.org/10.1186/s12874-021-01369-9

Other articles of this Issue 1/2021

BMC Medical Research Methodology 1/2021 Go to the issue