Skip to main content
Top
Published in: BMC Medical Informatics and Decision Making 1/2017

Open Access 01-12-2017 | Research article

Supervised learning for infection risk inference using pathology data

Authors: Bernard Hernandez, Pau Herrero, Timothy Miles Rawson, Luke S. P. Moore, Benjamin Evans, Christofer Toumazou, Alison H. Holmes, Pantelis Georgiou

Published in: BMC Medical Informatics and Decision Making | Issue 1/2017

Login to get access

Abstract

Background

Antimicrobial Resistance is threatening our ability to treat common infectious diseases and overuse of antimicrobials to treat human infections in hospitals is accelerating this process. Clinical Decision Support Systems (CDSSs) have been proven to enhance quality of care by promoting change in prescription practices through antimicrobial selection advice. However, bypassing an initial assessment to determine the existence of an underlying disease that justifies the need of antimicrobial therapy might lead to indiscriminate and often unnecessary prescriptions.

Methods

From pathology laboratory tests, six biochemical markers were selected and combined with microbiology outcomes from susceptibility tests to create a unique dataset with over one and a half million daily profiles to perform infection risk inference. Outliers were discarded using the inter-quartile range rule and several sampling techniques were studied to tackle the class imbalance problem. The first phase selects the most effective and robust model during training using ten-fold stratified cross-validation. The second phase evaluates the final model after isotonic calibration in scenarios with missing inputs and imbalanced class distributions.

Results

More than 50% of infected profiles have daily requested laboratory tests for the six biochemical markers with very promising infection inference results: area under the receiver operating characteristic curve (0.80-0.83), sensitivity (0.64-0.75) and specificity (0.92-0.97). Standardization consistently outperforms normalization and sensitivity is enhanced by using the SMOTE sampling technique. Furthermore, models operated without noticeable loss in performance if at least four biomarkers were available.

Conclusion

The selected biomarkers comprise enough information to perform infection risk inference with a high degree of confidence even in the presence of incomplete and imbalanced data. Since they are commonly available in hospitals, Clinical Decision Support Systems could benefit from these findings to assist clinicians in deciding whether or not to initiate antimicrobial therapy to improve prescription practices.
Literature
1.
go back to reference Wise R, Hart T, Cars O, Streulens M, Helmuth R, Huovinen P, Sprenger M. Antimicrobial resistance is a major threat to public health. Br Med J. 1998; 317(7159):609–11.CrossRef Wise R, Hart T, Cars O, Streulens M, Helmuth R, Huovinen P, Sprenger M. Antimicrobial resistance is a major threat to public health. Br Med J. 1998; 317(7159):609–11.CrossRef
2.
go back to reference O’Neill J. Antimicrobial resistance: tackling a crisis for the health and wealth of nations. London: Review on Antimicrobial Resistance. 2014. p. 1–16. O’Neill J. Antimicrobial resistance: tackling a crisis for the health and wealth of nations. London: Review on Antimicrobial Resistance. 2014. p. 1–16.
3.
go back to reference Holmes AH, Moore LS, Sundsfjord A, Steinbakk M, Regmi S, Karkey A, Guerin PJ, Piddock LJ. Understanding the mechanisms and drivers of antimicrobial resistance. Lancet. 2016; 387(10014):176–87.CrossRefPubMed Holmes AH, Moore LS, Sundsfjord A, Steinbakk M, Regmi S, Karkey A, Guerin PJ, Piddock LJ. Understanding the mechanisms and drivers of antimicrobial resistance. Lancet. 2016; 387(10014):176–87.CrossRefPubMed
4.
go back to reference Banoo S, Bell D, Bossuyt P, Herring A, Mabey D, Poole F, Smith PG, Sriram N, Wongsrichanalai C, Linke R, et al. Evaluation of diagnostic tests for infectious diseases: general principles. Nat Rev Microbiol. 2008; 8:16–28.CrossRef Banoo S, Bell D, Bossuyt P, Herring A, Mabey D, Poole F, Smith PG, Sriram N, Wongsrichanalai C, Linke R, et al. Evaluation of diagnostic tests for infectious diseases: general principles. Nat Rev Microbiol. 2008; 8:16–28.CrossRef
5.
go back to reference Byl B, Clevenbergh P, Jacobs F, Struelens MJ, Zech F, Kentos A, Thys JP. Impact of infectious diseases specialists and microbiological data on the appropriateness of antimicrobial therapy for bacteremia. Clin Infect Dis. 1999; 29(1):60–6.CrossRefPubMed Byl B, Clevenbergh P, Jacobs F, Struelens MJ, Zech F, Kentos A, Thys JP. Impact of infectious diseases specialists and microbiological data on the appropriateness of antimicrobial therapy for bacteremia. Clin Infect Dis. 1999; 29(1):60–6.CrossRefPubMed
6.
go back to reference Harbarth S, Garbino J, Pugin J, Romand JA, Lew D, Pittet D. Inappropriate initial antimicrobial therapy and its effect on survival in a clinical trial of immunomodulating therapy for severe sepsis. Am J Med. 2003; 115(7):529–35.CrossRefPubMed Harbarth S, Garbino J, Pugin J, Romand JA, Lew D, Pittet D. Inappropriate initial antimicrobial therapy and its effect on survival in a clinical trial of immunomodulating therapy for severe sepsis. Am J Med. 2003; 115(7):529–35.CrossRefPubMed
7.
go back to reference Willemsen I, Groenhuijzen A, Bogaers D, Stuurman A, van Keulen P, Kluytmans J. Appropriateness of antimicrobial therapy measured by repeated prevalence surveys. Antimicrob Agents Chemother. 2007; 51(3):864–7.CrossRefPubMedPubMedCentral Willemsen I, Groenhuijzen A, Bogaers D, Stuurman A, van Keulen P, Kluytmans J. Appropriateness of antimicrobial therapy measured by repeated prevalence surveys. Antimicrob Agents Chemother. 2007; 51(3):864–7.CrossRefPubMedPubMedCentral
9.
go back to reference McGregor JC, Weekes E, Forrest GN, Standiford HC, Perencevich EN, Furuno JP, Harris AD. Impact of a computerized clinical decision support system on reducing inappropriate antimicrobial use: a randomized controlled trial. J Am Med Inform Assoc. 2006; 13(4):378–84.CrossRefPubMedPubMedCentral McGregor JC, Weekes E, Forrest GN, Standiford HC, Perencevich EN, Furuno JP, Harris AD. Impact of a computerized clinical decision support system on reducing inappropriate antimicrobial use: a randomized controlled trial. J Am Med Inform Assoc. 2006; 13(4):378–84.CrossRefPubMedPubMedCentral
10.
go back to reference Paul M, Andreassen S, Nielsen AD, Tacconelli E, Almanasreh N, Fraser A, Yahav D, Ram R, Leibovici L, Group TS, et al. Prediction of bacteremia using treat, a computerized decision-support system. Clin Infect Dis. 2006; 42(9):1274–82.CrossRefPubMed Paul M, Andreassen S, Nielsen AD, Tacconelli E, Almanasreh N, Fraser A, Yahav D, Ram R, Leibovici L, Group TS, et al. Prediction of bacteremia using treat, a computerized decision-support system. Clin Infect Dis. 2006; 42(9):1274–82.CrossRefPubMed
11.
go back to reference Mullett CJ, Thomas JG, Smith CL, Sarwari AR, Khakoo RA. Computerized antimicrobial decision support: an offline evaluation of a database-driven empiric antimicrobial guidance program in hospitalized patients with a bloodstream infection. Int J Med Inform. 2004; 73(5):455–60.CrossRefPubMed Mullett CJ, Thomas JG, Smith CL, Sarwari AR, Khakoo RA. Computerized antimicrobial decision support: an offline evaluation of a database-driven empiric antimicrobial guidance program in hospitalized patients with a bloodstream infection. Int J Med Inform. 2004; 73(5):455–60.CrossRefPubMed
12.
go back to reference Cleophas TJ, Zwinderman AH, Cleophas-Allers HI. Machine Learning in Medicine. Netherlands: Springer; 2013.CrossRef Cleophas TJ, Zwinderman AH, Cleophas-Allers HI. Machine Learning in Medicine. Netherlands: Springer; 2013.CrossRef
13.
go back to reference Lucas PJ, van der Gaag LC, Abu-Hanna A. Bayesian networks in biomedicine and health-care. Artif Intell Med. 2004; 30(3):201–14.CrossRefPubMed Lucas PJ, van der Gaag LC, Abu-Hanna A. Bayesian networks in biomedicine and health-care. Artif Intell Med. 2004; 30(3):201–14.CrossRefPubMed
14.
go back to reference Negnevitsky M. Artificial Intelligence: a Guide to Intelligent Systems. England: Pearson Education; 2005. Negnevitsky M. Artificial Intelligence: a Guide to Intelligent Systems. England: Pearson Education; 2005.
15.
go back to reference Richardson AM, Hawkins S, Shadabi F, Sharma D, Fulcher J, Lidbury B, et al.Enhanced laboratory diagnosis of human Chlamydia pneumoniae through pattern recognition derived from pathology database analysis. In: Supplementary proceedings of the third IAPR International Conference on Pattern Recognition in Bioinformatics (PRIB 2008). Melbourne, Australia: 2008. p. 227–34. Richardson AM, Hawkins S, Shadabi F, Sharma D, Fulcher J, Lidbury B, et al.Enhanced laboratory diagnosis of human Chlamydia pneumoniae through pattern recognition derived from pathology database analysis. In: Supplementary proceedings of the third IAPR International Conference on Pattern Recognition in Bioinformatics (PRIB 2008). Melbourne, Australia: 2008. p. 227–34.
16.
go back to reference Richardson AM, Lidbury BA. Infection status outcome, machine learning method and virus type interact to affect the optimised prediction of hepatitis virus immunoassay results from routine pathology laboratory assays in unbalanced data. BMC Bioinformatics. 2013; 14(1):1.CrossRef Richardson AM, Lidbury BA. Infection status outcome, machine learning method and virus type interact to affect the optimised prediction of hepatitis virus immunoassay results from routine pathology laboratory assays in unbalanced data. BMC Bioinformatics. 2013; 14(1):1.CrossRef
17.
go back to reference Mani S, Ozdas A, Aliferis C, Varol HA, Chen Q, Carnevale R, Chen Y, Romano-Keeler J, Nian H, Weitkamp JH. Medical decision support using machine learning for early detection of late-onset neonatal sepsis. J Am Med Inform Assoc. 2014; 21(2):326–36.CrossRefPubMed Mani S, Ozdas A, Aliferis C, Varol HA, Chen Q, Carnevale R, Chen Y, Romano-Keeler J, Nian H, Weitkamp JH. Medical decision support using machine learning for early detection of late-onset neonatal sepsis. J Am Med Inform Assoc. 2014; 21(2):326–36.CrossRefPubMed
18.
19.
go back to reference Sierra R, Rello J, Bailén MA, Benítez E, Gordillo A, León C, Pedraza S. C-reactive protein used as an early indicator of infection in patients with systemic inflammatory response syndrome. Intensive Care Med. 2004; 30(11):2038–45. doi:10.1007/s00134-004-2434-y.CrossRefPubMed Sierra R, Rello J, Bailén MA, Benítez E, Gordillo A, León C, Pedraza S. C-reactive protein used as an early indicator of infection in patients with systemic inflammatory response syndrome. Intensive Care Med. 2004; 30(11):2038–45. doi:10.​1007/​s00134-004-2434-y.CrossRefPubMed
20.
go back to reference Mohri M, Rostamizadeh A, Talwalkar A. Foundations of Machine Learning. England: MIT press; 2012. Mohri M, Rostamizadeh A, Talwalkar A. Foundations of Machine Learning. England: MIT press; 2012.
21.
go back to reference Metsis V, Androutsopoulos I, Paliouras G. Spam filtering with naive bayes-which naive bayes? In: Third Conference on Email and Anti-Spam, CEAS 2006, Mountain View, California, USA: 2006. p. 28–69. Metsis V, Androutsopoulos I, Paliouras G. Spam filtering with naive bayes-which naive bayes? In: Third Conference on Email and Anti-Spam, CEAS 2006, Mountain View, California, USA: 2006. p. 28–69.
22.
go back to reference Hernandez B. Multi-View Object Recognition and Classification. Graph-Based Representation of Visual Features and Structured Learning and Prediction. Stockholm, Sweden: KTH, School of Computer Science and Communication (CSC); 2013. Hernandez B. Multi-View Object Recognition and Classification. Graph-Based Representation of Visual Features and Structured Learning and Prediction. Stockholm, Sweden: KTH, School of Computer Science and Communication (CSC); 2013.
23.
go back to reference Shin H, Cho S. How to deal with large dataset, class imbalance and binary output in svm based response model. Proceedings of the Korean Data Mining Society Conference. 2003:93–107. Best Paper Award. Shin H, Cho S. How to deal with large dataset, class imbalance and binary output in svm based response model. Proceedings of the Korean Data Mining Society Conference. 2003:93–107. Best Paper Award.
24.
go back to reference Platt J, et al. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Adv Large Margin Classifiers. 1999; 10(3):61–74. Platt J, et al. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Adv Large Margin Classifiers. 1999; 10(3):61–74.
25.
go back to reference Johnson AE, Ghassemi MM, Nemati S, Niehaus KE, Clifton DA, Clifford GD. Machine learning and decision support in critical care. Proc IEEE. 2016; 104(2):444–66.CrossRef Johnson AE, Ghassemi MM, Nemati S, Niehaus KE, Clifton DA, Clifford GD. Machine learning and decision support in critical care. Proc IEEE. 2016; 104(2):444–66.CrossRef
26.
go back to reference Osborne JW, Overbay A. The power of outliers (and why researchers should always check for them). Pract Assess Res Eval. 2004; 9(6):1–12. Osborne JW, Overbay A. The power of outliers (and why researchers should always check for them). Pract Assess Res Eval. 2004; 9(6):1–12.
27.
go back to reference Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. Smote: synthetic minority over-sampling technique. J Artif Intell Res. 2002; 16:321–57. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. Smote: synthetic minority over-sampling technique. J Artif Intell Res. 2002; 16:321–57.
28.
go back to reference Niculescu-Mizil A, Caruana R. Predicting good probabilities with supervised learning. In: Proceedings of the 22nd International Conference on Machine Learning (ICML) 2005, vol. 149. Bonn, Germany: ACM: 2005. p. 625–32. Niculescu-Mizil A, Caruana R. Predicting good probabilities with supervised learning. In: Proceedings of the 22nd International Conference on Machine Learning (ICML) 2005, vol. 149. Bonn, Germany: ACM: 2005. p. 625–32.
29.
go back to reference Bekkar M, Djemaa HK, Alitouche TA. Evaluation measures for models assessment over imbalanced datasets. J Inf Eng Appl.2013;3(10). Bekkar M, Djemaa HK, Alitouche TA. Evaluation measures for models assessment over imbalanced datasets. J Inf Eng Appl.2013;3(10).
30.
go back to reference Baldi P, Brunak S, Chauvin Y, Andersen CA, Nielsen H. Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics. 2000; 16(5):412–24.CrossRefPubMed Baldi P, Brunak S, Chauvin Y, Andersen CA, Nielsen H. Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics. 2000; 16(5):412–24.CrossRefPubMed
31.
go back to reference He H, Garcia EA. Learning from imbalanced data. IEEE Trans Knowl Data Eng. 2009; 21(9):1263–84.CrossRef He H, Garcia EA. Learning from imbalanced data. IEEE Trans Knowl Data Eng. 2009; 21(9):1263–84.CrossRef
32.
go back to reference Fawcett T. An introduction to roc analysis. Pattern Recogn Lett. 2006; 27(8):861–74.CrossRef Fawcett T. An introduction to roc analysis. Pattern Recogn Lett. 2006; 27(8):861–74.CrossRef
33.
go back to reference Davis J, Goadrich M. The relationship between precision-recall and roc curves. In: Proceedings of the 23rd international conference on Machine learning (ICML) 2006. vol, 148. Pittsburgh, Pennsylvania, USA: ACM: 2006. p. 233–40. Davis J, Goadrich M. The relationship between precision-recall and roc curves. In: Proceedings of the 23rd international conference on Machine learning (ICML) 2006. vol, 148. Pittsburgh, Pennsylvania, USA: ACM: 2006. p. 233–40.
34.
go back to reference Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E. Scikit-learn: Machine learning in Python. J Mach Learn Res. 2011; 12:2825–30. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E. Scikit-learn: Machine learning in Python. J Mach Learn Res. 2011; 12:2825–30.
35.
go back to reference Lemaître G, Nogueira F, Aridas CK. Imbalanced-learn: A python toolbox to tackle the curse of imbalanced datasets in machine learning. J Machine Learning Research. 2017; 18:17:1–17:5. Lemaître G, Nogueira F, Aridas CK. Imbalanced-learn: A python toolbox to tackle the curse of imbalanced datasets in machine learning. J Machine Learning Research. 2017; 18:17:1–17:5.
36.
go back to reference McKinney W. pandas: a foundational python library for data analysis and statistics. Python for High Performance and Scientific Computing. 2011:1–9. McKinney W. pandas: a foundational python library for data analysis and statistics. Python for High Performance and Scientific Computing. 2011:1–9.
37.
go back to reference McKinney W, et al. Data structures for statistical computing in python. In: Proceedings of the 9th Python in Science Conference. vol. 445: 2010. p. 51–6. McKinney W, et al. Data structures for statistical computing in python. In: Proceedings of the 9th Python in Science Conference. vol. 445: 2010. p. 51–6.
38.
go back to reference Hunter JD. Matplotlib: A 2d graphics environment. Comput Sci Eng. 2007; 9(3):90–5.CrossRef Hunter JD. Matplotlib: A 2d graphics environment. Comput Sci Eng. 2007; 9(3):90–5.CrossRef
39.
go back to reference Waskom M, Botvinnik O, drewokane, Hobson P, David, Halchenko Y, Lukauskas S, Cole JB, Warmenhoven J, de Ruiter J, Hoyer S, Vanderplas J, Villalba S, Kunter G, Quintero E, Martin M, Miles A, Meyer K, Augspurger T, Yarkoni T, Bachant P, Williams M, Evans C, Fitzgerald C, Brian, Wehner D, Hitz G, Ziegler E, Qalieh A, Lee A. seaborn: v0.7.1 (June 2016). 2016. doi: doi:10.5281/zenodo.54844. Waskom M, Botvinnik O, drewokane, Hobson P, David, Halchenko Y, Lukauskas S, Cole JB, Warmenhoven J, de Ruiter J, Hoyer S, Vanderplas J, Villalba S, Kunter G, Quintero E, Martin M, Miles A, Meyer K, Augspurger T, Yarkoni T, Bachant P, Williams M, Evans C, Fitzgerald C, Brian, Wehner D, Hitz G, Ziegler E, Qalieh A, Lee A. seaborn: v0.7.1 (June 2016). 2016. doi: doi:10.5281/zenodo.54844.
40.
go back to reference Leibovici L, Paul M, Nielsen AD, Tacconelli E, Andreassen S. The treat project: decision support and prediction using causal probabilistic networks. Int J Antimicrob Agents. 2007; 30:93–102.CrossRef Leibovici L, Paul M, Nielsen AD, Tacconelli E, Andreassen S. The treat project: decision support and prediction using causal probabilistic networks. Int J Antimicrob Agents. 2007; 30:93–102.CrossRef
41.
go back to reference Nargis W, Md I, Ahamed BU. Procalcitonin versus C-reactive protein: Usefulness as biomarker of sepsis in ICU patient. International Journal of Critical Illness and Injury Science. 2014; 54(3):195–99.CrossRef Nargis W, Md I, Ahamed BU. Procalcitonin versus C-reactive protein: Usefulness as biomarker of sepsis in ICU patient. International Journal of Critical Illness and Injury Science. 2014; 54(3):195–99.CrossRef
42.
go back to reference Previsdomini M, Gini M, Cerutti B, Dolina M, Perren A. Predictors of positive blood cultures in critically ill patients: a retrospective evaluation. Croatian Medical Journal. 2012; 53(1):30–9.CrossRefPubMedPubMedCentral Previsdomini M, Gini M, Cerutti B, Dolina M, Perren A. Predictors of positive blood cultures in critically ill patients: a retrospective evaluation. Croatian Medical Journal. 2012; 53(1):30–9.CrossRefPubMedPubMedCentral
43.
go back to reference Hernandez B, Herrero P, Rawson TM, Moore LSP, Charani E, Holmes AH, Georgiou P. Data-driven Web-based Intelligent Decision Support System for Infection Management at Point-Of-Care: Case-Based Reasoning Benefits and Limitations. In: Proceedings of the 10th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 5: HEALTHINF, (BIOSTEC 2017). Porto, Portugal: ScitePress: 2017. p. 119–27. doi:10.5220/0006148401190127. Hernandez B, Herrero P, Rawson TM, Moore LSP, Charani E, Holmes AH, Georgiou P. Data-driven Web-based Intelligent Decision Support System for Infection Management at Point-Of-Care: Case-Based Reasoning Benefits and Limitations. In: Proceedings of the 10th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 5: HEALTHINF, (BIOSTEC 2017). Porto, Portugal: ScitePress: 2017. p. 119–27. doi:10.​5220/​0006148401190127​.
Metadata
Title
Supervised learning for infection risk inference using pathology data
Authors
Bernard Hernandez
Pau Herrero
Timothy Miles Rawson
Luke S. P. Moore
Benjamin Evans
Christofer Toumazou
Alison H. Holmes
Pantelis Georgiou
Publication date
01-12-2017
Publisher
BioMed Central
Published in
BMC Medical Informatics and Decision Making / Issue 1/2017
Electronic ISSN: 1472-6947
DOI
https://doi.org/10.1186/s12911-017-0550-1

Other articles of this Issue 1/2017

BMC Medical Informatics and Decision Making 1/2017 Go to the issue