ALL Metrics
-
Views
-
Downloads
Get PDF
Get XML
Cite
Export
Track
Systematic Review
Revised

Evidence-based Clinical Decision Support Systems for the prediction and detection of three disease states in critical care: A systematic literature review

[version 2; peer review: 2 approved]
PUBLISHED 27 Nov 2019
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS

This article is included in the Artificial Intelligence and Machine Learning gateway.

Abstract

Background: Clinical decision support (CDS) systems have emerged as tools providing intelligent decision making to address challenges of critical care. CDS systems can be based on existing guidelines or best practices; and can also utilize machine learning to provide a diagnosis, recommendation, or therapy course.
Methods: This research aimed to identify evidence-based study designs and outcome measures to determine the clinical effectiveness of clinical decision support systems in the detection and prediction of hemodynamic instability, respiratory distress, and infection within critical care settings. PubMed, ClinicalTrials.gov and Cochrane Database of Systematic Reviews were systematically searched to identify primary research published in English between 2013 and 2018. Studies conducted in the USA, Canada, UK, Germany and France with more than 10 participants per arm were included.
Results: In studies on hemodynamic instability, the prediction and management of septic shock were the most researched topics followed by the early prediction of heart failure. For respiratory distress, the most popular topics were pneumonia detection and prediction followed by pulmonary embolisms. Given the importance of imaging and clinical notes, this area combined Machine Learning with image analysis and natural language processing. In studies on infection, the most researched areas were the detection, prediction, and management of sepsis, surgical site infections, as well as acute kidney injury. Overall, a variety of Machine Learning algorithms were utilized frequently, particularly support vector machines, boosting techniques, random forest classifiers and neural networks. Sensitivity, specificity, and ROC AUC were the most frequently reported performance measures.
Conclusion: This review showed an increasing use of Machine Learning for CDS in all three areas. Large datasets are required for training these algorithms; making it imperative to appropriately address, challenges such as class imbalance, correct labelling of data and missing data. Recommendations are formulated for the development and successful adoption of CDS systems.

Keywords

sepsis, hemodynamic instability, respiratory distress, infection, machine learning, clinical trials, critical care.

Revised Amendments from Version 1

All comments from the Reviewers were addressed in the updated version. We could not address the layout issue that Reviewer 1 made as this is the Journal's decision how tables are made in the PDF. 
The question of Reviewer 2 regarding the rationale for including the studies predicting AKI within the Infection/sepsis results section is addressed here: 
Severe infection is a major cause of AKI in ICU patients, while conversely, AKI patients are at increased risk for infection [1]. Sepsis is an important cause of AKI, and AKI is a common complication of sepsis [2]. We felt that given this relationship, CDS for AKI fits well under this section. The reviewer is correct to propose the link between AKI and shock, however, not all AKI cases lead to shock- so we felt it matched this section more.
 
[1] Vandijck DM, Reynvoet E, Blot SI, Vandecasteele E, Hoste EA. Severe infection, sepsis and acute kidney injury. Acta Clin Belg. 2007;62 Suppl 2:332-6.
[2] Steven J. Skube, Stephen A. Katz, Jeffrey G. Chipman, and Christopher J. Tignanelli.Surgical Infections.http://doi.org/10.1089/sur.2017.261 Volume: 19 Issue 2: February 1, 2018

To read any peer review reports and author responses for this article, follow the "read" links in the Open Peer Review table.

Introduction

Critical care, including intensive and emergency care, is the most expensive and human resource intensive area of in-hospital care. Despite having the most technologically advanced devices, it is the area associated with the highest morbidity and mortality rates1. Decision-making for clinical teams in this area is complex due to variability in procedures and data-overload from the plethora of existing devices. In fact, misdiagnosis in the intensive care unit (ICU) is 50% more common than other areas2, and errors, especially medication errors which account for 78% of serious medication errors3, can have a long lasting effect even after patients are discharged.

Computerized decision support (CDS) systems have emerged as tools providing intelligent decision making based on patient data to address many of the challenges of critical care. CDS systems can be based on existing guidelines or best practices; and can also utilize machine learning as a means of compiling several data inputs to provide a diagnosis, recommendation, or therapy course. CDS systems can improve medication safety by providing recommendations relating to dosing46, administration frequencies5, medication discontinuation6 and medication avoidance5. Moreover, these novel systems can improve the quality of prescribing decisions by triggering alerts or warning messages on drug duplication, contraindications, drug interaction errors7, side-effects and inappropriate medication orders5. CDS system notifications can be applied during the prescribing, administering or monitoring stages to detect and prevent medication errors8. These systems can also target patients to facilitate shared decision-making to empower as well as to motivate them911. The need for such systems stems from hospitals having to deal with strict guidelines to improve outcomes, document care cycles (raising the need for administrative tasks) and reduce readmissions. This is combined with the need to cope with financial constraints, such as staff shortages and increased pressure to reduce the length of stay12,13.

Strategies for bringing CDS to clinics have been the topic of several workshops, conferences and focus groups14. Factors for success in designing CDS include providing measurable value, producing actionable insights, delivering information to the user at the right time, and demonstrating good usability principles14.

Early warning systems (EWS) are CDS systems designed for initial assessment and identification of patients at risk of deterioration in in-patient ward areas1517. These systems have shown that they can enable caregivers and rapid response teams to respond earlier – in time to make a difference18. By alerting clinicians to higher risk patients, treatments can be administered early or harmful medications can be stopped, potentially leading to improved outcomes. Early recognition and timely intervention are also critical steps for the successful management of shock19, cardiorespiratory instability20 and severe sepsis. In sepsis management, adequate timing of administration of antibiotics is directly associated with survival rates21, and incidence, severity and duration of infections.

According to the Society of Critical Care Medicine (SCCM)22, the five primary ICU admission diagnoses for adults are respiratory insufficiency/failure with ventilator support, acute myocardial infarction, intracranial hemorrhage or cerebral infarction, percutaneous cardiovascular procedures, and septicemia or severe sepsis without mechanical ventilation. SCCM also highlights other conditions involving high ICU demand such as poisoning and toxic effects of drugs, pulmonary edema and respiratory failure, heart failure and shock, cardiac arrhythmia and renal failure. Given the above, three high-impact areas were selected for the current research where early detection and treatment could impact outcomes for patients in the ICU. The first is that of hemodynamic instability, where early detection could help patients prevent deterioration into shock. The second is that of respiratory distress, affecting many ventilated patients (up to 40% are ventilated according to SCCM)22. The third area selected is that of infection, with a focus on sepsis. Sepsis is the most common cause of death among critically ill patients, with occurrence rates varying from 13.6% to 39.3%23,24. All three areas are major areas of concern with relatively high prevalence in critical care having long term effects on patients.

The study focuses on both detection, which alerts the clinician to the presence of these specific conditions, as well as prediction of deterioration by alerting the clinician in advance that a patient will deteriorate into one of these disease states. The aims of this study were to perform and report a systematic review of the utilization of CDS systems in the three selected disease areas and summarize the methodological aspects of identified studies.

Methods

Search strategy

A systematic literature review was carried out to identify evidence-based study designs, methods and outcome measures that have been used to determine the clinical effectiveness of CDS systems in the detection and prediction of three populations representing the variety and majority of morbid conditions in a critical care setting: Shock (hemodynamic (in-)stability), respiratory distress/failure and infection/sepsis. The search strategy combined ‘intervention terms’ and ‘disease terms’ to identify primary research evaluating the diagnostic performance of CDS systems and other machine learning algorithms in three different populations of any age, sex, and race. Systematic literature reviews were also included for locating further relevant primary research. The search was conducted in MEDLINE (PubMed), ClinicalTrials.gov and Cochrane Database of Systematic Reviews (CDSR); and limited to studies published or registered between January 1, 2013 and November 8, 2018 and reported in English. Publication dates were limited to focus results on the most recent developments in this fast-evolving research domain. Another method to ensure up-to-date results was to include conference abstracts from 2017 onwards regardless of whether or not they were followed up with a detailed publication. Ongoing studies identified in the clinical trials register were also kept in the review. Study protocols identified from bibliographic databases were, however, excluded assuming that final study results would be available and identified elsewhere. The strategy employed in PubMed is provided as Extended data, Table 1–Table 32527.

Studies conducted in US, Canada, UK, Germany or France with more than 10 subjects per arm were included. These countries were selected because they are known to be active in CDS development. The inclusion and exclusion criteria for selecting abstracts and subsequent full-text publications were based on the population, interventions, comparators, outcomes, and study design (PICOS). These criteria are listed in Table 1.

Table 1. Study selection criteria for the systematic literature review.

CriteriaInclusionExclusion
STUDY DESIGNAbstract
selection
Randomized controlled trials (RCT)
Observational (retrospective and prospective)
studies
In-hospital settings: Acute care, Intensive care
unit (ICU), Emergency department (ED), Medical
Surgery, General ward
Geography: US, Canada, Europe
Systematic Literature Reviews or meta-
analyses*
Review papers, newsletters and opinion
papers where treatments of interest are only
discussed
Methodology studies or protocols
Case studies (sample size of 1 patient)
Studies with less than 10 patients per arm;
Conference abstracts published only as
abstracts in 2013, 2014, 2015 and 2016
Geography**: All countries and regions
except: US, Canada, UK, Germany, France
Publications without an abstract
Full-text
selection
Randomized controlled trials (RCT)
Observational (retrospective and prospective)
studies
In-hospital settings: Acute care, Intensive care
unit (ICU), Emergency department (ED), Medical
Surgery, General ward
Geography**: US, Canada, UK, Germany, France
Conference abstracts published only as abstracts in
2017 and 2018
Systematic Literature Reviews or meta-
analyses*
Review papers, newsletters and opinion
papers where treatments of interest are only
discussed
Methodology studies or protocols
Case studies (sample size of 1 patient)
Studies with less than 10 patients per arm;
Geography**: All countries and regions
except: US, Canada, UK, Germany, France
Publications published only as abstracts in
2013, 2014, 2015 and 2016 (which were not
superseded by full-text publication).
POPULATIONAbstract
and full-text
selection
Studies that include humans only – adults, children
and neonates (or (electronic) medical records)
Both sexes are included Patients with or at risk of
developing shock (hemodynamic (in-stability)
Patients with or at risk of developing respiratory
distress/failure
Patients with or at risk of developing infection or
sepsis
Healthy people only; Healthy people and patients
In-vitro studies
Animal studies
TREATMENT /
INTERVENTION
Abstract
and full-text
selection
Artificial intelligence
Machine learning (i.e. Deep learning models)
Clinical decision support
Computer aided detection
Early Warning System
Automatic diagnosis systems (i.e. ELISA
tests)
Screening tests (i.e. Automated analysis of
portable oximetry)
Sequencing tests
Mathematical models*** - which model
the predictability of disease or treatment/
intervention (i.e. Modelling studies have been
widely used to inform human papillomavirus
vaccination policy decisions)
Multivariable hierarchal logistic regression
models*** (models which are based only on
statistics - but there is no machine learning)
COMPARATORAbstract
and full-text
selection
All comparatorsNo selection will be made regarding
comparator
OUTCOMESAbstract
and full-text
selection
Detection and/or prediction outcomes, such as:
        •    Sensitivity (SD) (%)
        •    Specificity (SD) (%)
        •    NPV (%)
        •    PPV (%)
        •    Likelihood ratio
        •    Accuracy (SD) (%)
        •    Prevalence of disease (%)
        •    OR; 95% CI; p-value
        •    HR; 95% CI; p-value
        •    Median (IQR); p-value
        •    ROC AUC
        For all outcomes (if reported): Measure
of variability (i.e. Standard error of mean
(SE), Standard deviation (SD)); measure of
uncertainty (i.e. 95% CI)

The outcomes should be reported in the following
manner:

        •    per arm (study group vs. control group)
individually;
        •    difference between 2 arms.
Studies not reporting detection and/or
prediction outcomes
Studies discussing interventions of interest,
but no outcomes are reported

* Systematic Literature Reviews and (network) meta-analysis are excluded from data extraction since the pooled results cannot be used in our analysis. However, good quality (network) meta-analysis and systematic literature reviews (i.e. Cochrane reviews) will be used for cross-checking of references if the search did not omit any articles.

** If studies are conducted in multiple countries and at least 1 of the included countries is included – the study will be included in the selection.

*** Mathematical and logistic regression models – can be used to validate and evaluate Interventions of interest (that are listed as included intervention), but the texts discussing these models without any “learning potential” or artificial intelligence potential will be excluded. Therefore, these models can be the foundation of the included listed interventions but will not be included in the Data Extraction Files unless they have also machine learning or artificial intelligence or some other form of “learning potential” on top of the statistical mathematical model. Researchers will pay special attention and caution when screening these abstracts and/or full-text articles.

AUC = Area under the curve; ED = Emergency department; ELISA = Enzyme-linked immunosorbent assay; HR = Hazard ratio; ICU = Intensive care unit; IQR = interquartile range; NPV = Negative predictive value; OR = Odds ratio; PPV = Positive predictive value; RCT = Randomized controlled trial; ROC = Receiver Operating Characteristic; SD = Standard deviation; SE = Standard error; UK = United Kingdom; US = United States.

Study selection and data extraction

Study selection and data extraction was carried out by a single reviewer (MKK or SP). In cases of uncertainty, a second, or even third reviewer, was consulted. Data extraction was performed using a standard data extraction form (DEF). Key data from each additional eligible study were extracted by recording data from original reports into the DEF. The DEF included information on study design, inclusion/exclusion criteria, sample size and characteristics, interventions, outcome measures (measures of predictability like: sensitivity, specificity, negative predictive value (NPV), positive predictive value (PPV), likelihood ratio, accuracy (percentage of correctly identified cases in relation to the whole sample), odds ratio (OR), hazard ratio (HR), median, receiver operating characteristic (ROC) area under the curve (AUC); and length of hospitalization among others).

Studies identified from the ClinicalTrials.gov registry that did not report results were also included in the extraction to give some indication of the outcomes being collected.

Study quality appraisal

This research was not aimed at summarizing study results and assessing the relative effectiveness of CDS systems. Therefore, an appraisal of study quality was not deemed necessary.

Results

Shock (hemodynamic (in-)stability)

The search yielded 1588 hits. Screening the titles and abstracts led to 1502 being excluded. The full texts of the remaining 86 titles were obtained and assessed against the PICOS criteria. Studies were excluded due to irrelevant study design (n=22), population (n=1), intervention (n=5), and outcomes (n=38). A total of 20 studies were finally included in this systematic literature review. This included 5 trials identified from ClinicalTrials.gov. The study selection process is depicted in Figure 1.

9e584307-c538-437a-ae3e-769efb66c031_figure1.gif

Figure 1. Study selection – Shock.

Pop. = Population.

Study characteristics. Of the 15 published studies, five were conducted by research groups outside the USA2832. Ten studies were conducted in the US19,3341, Thirteen studies were retrospective19,2833,35,3741 and only two were prospective34,36. Nine studies were single-center28,30,31,33,3741 and six studies were multi-center19,29,32,3436. Five studies were time-series28,3032,40 and nine were case-series19,29,3335,3739,41.

Across all studies, three had sample sizes ≤10029,30,36; three had sample sizes of 101–100028,31,32; four studies had sample sizes of 1001–10,00019,33,34,37,42; and another five studies, four retrospective single-center studies and one multi-center, had sample sizes larger than 10,00035,3841. The three largest studies included patients admitted to various wards of a specified hospital. The majority of the studies did not restrict their sample to a specific in-patient hospital setting. Five studies reported on patients in the ICU19,28,32,40,41 and one study reported on patients admitted to the surgical ward33.

The characteristics of the published studies are summarized in Table 2.

Table 2. Design aspects of published studies on shock.

StudyStudy DesignCountry and
institution(s)
Number
of
patients
(records)
Population/disease
definition
In-
patient
setting
Collected data
Ghosh 2017Retrospective time
series
single center
Australia
University of
Technology Sydney
& The University of
Melbourne
209Sepsis or severe
sepsis
ICU(mean arterial pressure),
heart rate,
respiratory rate
Hu 2016Retrospective case
series
single center
USA, Minnesota
University of Minnesota
NR (8909)NRSurgeryEHRs
Li 2014Retrospective case
series
multi-centric (3
centers)
UK, Oxford
University of Oxford &
Mindray
NR (67)Ventricular flutter,
fibrillation and
tachycardia
NRElectrocardiography
Mahajan 2014Prospective case
series
multi-centric (4
centers)
USA
University of Southern
California, Mayo Clinic-
Rochester, University of
North Carolina, Sanger
Heart & Vascular
Institute & Boston
Scientific
410 (908)Ventricular
fibrillation, ventricular
tachycardia and other
arrhythmias
NRElectrograms
Mao 2018Retrospective case
series
multi-centric (5
centers)
USA
University of California,
Stanford Medical
Centre, Oroville
Hospital, Bakersfield
Heart Hospital, Cape
Regional Medical
Centre, Beth Israel
Deaconess Medical
Center
359,390NRvariousVital signs
Reljin 2018Prospective case-
control
multi-centric (2
centers)
USA
University of
Connecticut, Campbell
University School of
Medicine, University
of Massachusetts
Medical School,Yale
University School of
Medicine & Worcester
Polytechnic Institute
36 (94)Traumatic injury,
healthy controls
NRPhotoplethysmographic
signals
Sideris 2016Retrospective case
series
single center
USA, Los Angeles
University of California
1948Primarily heart failurevariousEHRs
Blecker 2016Retrospective case
series
single center
USA, New York
NewYork-Presbyterian
Hospital & New York
University
NR
(47,119)
NRvariousEHRs
Blecker 2018Retrospective case
series
single center
USA, New York
New York University
NR (37229)NRvariousEHRs
Calvert 2016Retrospective time
series
single center
USA, California
Dascena Inc. &
University of California
29083NRICUvital signs
Donald 2018Retrospective
time series +
Prospective time
series
multi-centric (22
centers)
Europe173Traumatic brain injuryICUDemographic, clinical
and physiological data
Ebrahimzadeh
2018
Retrospective time
series
single center
Iran
University of Tehran,
Iran University
of Science and
Technology, University
of Sheikhbahaee
& Payame Noor
University of North
Tehran
53 (106)Paroxysmal atrial
fibrillation
NRElectrocardiography
Potes 2017Retrospective case
series
multi-centric (2
centers)
USA, California & UK,
London
Children`s Hospital Los
Angeles, St. Mary`s
Hospital, London &
Philips
8022NRICUVital signs, laboratory
values, and ventilator
parameters.
Henry 2015Retrospective case
series
single center
USA, Maryland
John Hopkins
University
16234NRICUEHRs
Strodthoff
2018
Retrospective time
series
single center
Germany, Berlin
Fraunhofer Heinrich
Hertz Institute &
University Medical
Center Schleswig-
Holstein, Kiel
200 (228)Myocardial infarction
and healthy controls
NRElectrocardiography

USA: United States of America. UK: United Kingdom. NR: Not reported. ICU: Intensive care unit. EHR: Electronic health records.

CDS systems. Machine learning algorithms were developed to detect or predict septic shock28,33,35,40,41, various heart arrhythmias29,30,34, heart failure3739, hemodynamic instability and hypovolemia19,36, myocardial infarction31, as well as hypotension32.

All studies, except one, trained a single algorithm. Ebrahimzadeh et al. 201830 trained and compared support vector machine (SVM), instance-based and neural network models to predict paroxysmal atrial fibrillation. SVMs were the most frequently used algorithms, followed by least absolute shrinkage and selection operator (LASSO) regularization. In one study, the SVM was trained using sequential minimal optimization37.

Machine learning models were trained and validated in 14 studies and subsequently tested in an independent dataset in 3 studies19,35,37. In one study an algorithm trained to classify arrythmias was not validated but compared to physician`s manual classifications34.

An overview of the investigated machine learning algorithms is presented in Table 3.

Table 3. Overview of the algorithms developed to detect shock.

StudyPredicted diseaseLearning algorithm
CHMMDecision treesLR, LASSO
regularisation
LR, not
specified
SVMkNNRFgradient tree
boosting
Adaptive
boosting
Bayesian
neural network
convolutional
neural network
Multilayer
perceptron
mixture of
expert
Ebrahimzadeh
2018
paroxysmal atrial
fibrillation
Li 2014Ventricular fibrillation
and tachycardia
Mahajan 2014heart arrhythmias
Strodthoff
2018
myocardial
infarction
Sideris 2016heart failure
Blecker 2016heart failure
Blecker 2018heart failure
Reljin 2018Hypovolemia
Potes 2017hemodynamic
instability
Donald 2018Hypotension
Ghosh 2017septic shock
Hu 2016septic shock
Mao 2018septic shock
Calvert 2016septic shock
Henry 2015septic shock

CHMM: clustered hidden Markov model. LR: Logistic regression. SVM: Support vector machine. kNN: k nearest neighbor. RF: Random forest. Conv.: Convolutional.

Outcome measures. Three of the 15 papers measured a single outcome of model performance. In two studies the preferred measure was accuracy28,34; whereas in another study this was the ROC AUC. This study was large and based their algorithm on EHRs33. Across all studies, accuracy was reported in about half of the instances and the ROC AUC was one of the most frequently reported outcomes.

Sensitivity and specificity were reported together in 10 studies. Blecker et al. 201638 reported sensitivity together with PPV. Sensitivity and specificity were not measured in the study by Sideris et al. 201637, instead model accuracy and the ROC AUC were preferred. This study was concerned with developing an alternative `comorbidity` framework based on disease and symptom diagnostic codes to cluster individuals at low to high risk of developing chronic heart failure.

PPVs were reported in six studies and accompanied with negative predictive values in two studies. These studies developed and validated machine-learning algorithms for the early detection of less investigated health conditions, these being hemodynamic instability in children19 and acute decompensated heart failure39. The highest number of outcome measures, including likelihood ratios, was observed in Calvert et al. 201640 who investigated an under-represented population of patients with Alcohol Use Disorder.

The outcomes measured are summarized in Table 4.

Table 4. Overview of measured outcomes in studies on shock.

StudySensitivitySpecificityNPVPPVNegative
LR
Positive
LR
AccuracyPrevalenceORRRROC AUC
Ghosh 2017
Hu 2016
Li 2014
Mahajan 2014
Mao 2018
Reljin 2018
Sideris 2016
Blecker 2016
Blecker 2018
Calvert 2016
Donald 2018
Ebrahimzadeh
2018
Potes 2017
Henry 2015
Strodthoff 2018

NPV: Negative predictive value. PPV: Positive predictive value. LR: Likelihood ratio. OR: Odds ratio. RR: Risk ratio. ROC AUC: Receiver operating characteristic area under the curve.

Ongoing studies. Five studies are currently ongoing, one in Germany43 and the others in the USA4447. Two studies are prospective case series44,47, two studies are prospective cohort studies43,45 and one is a RCT46. Two of the studies are concerned with developing prediction models, and the others are concerned with implementing machine learning algorithms into clinical practice as early warning systems.

The details of these trials are summarized in Table 5.

Table 5. Overview of ongoing studies on shock.

Identifier codeStudy DesignCountries
and study
centers
Hospital
setting
InterventionSample
characteristics
Outcome(s)
NCT03582501Prospective
case series
Year of study:
2019–20
Duration: 12
months
USA
Mayo Clinic
Arizona,
Florida &
Rochester
NRLower body
negative pressure
to simulate
hypovolemia
Estimated: 24
Age: 18–55
Definition: Healthy
non-smoker,
no history of
hypertension,
diabetes, CAD
and neurologic
diseases
Primary outcome
Blood pressure
Secondary outcome
Heart rate
NCT02934971Prospective
cohort study
Year of study:
2017–19
Duration: 24
months (up
to 6 months
follow-up)
Germany,
Aachen
Aachen
University
Hospital
Out-patientChemotherapy or
no chemotherapy
Estimated: 400
Age: ≥ 18
Definition: Patients
scheduled for
chemotherapy
at increased risk
of cardiotoxicity
and age-matched
controls
Primary outcome
change in left ventricular
ejection fraction
NCT03235193Prospective
cohort study
Year of study:
2017
Duration: 3
months
USA, West
Virginia
Dascena
Inc.&
University of
California
ED, ICUThe InSight
algorithm used
as an EWS to
detect sepsis and
severe sepsis
detection from
EHRs compared
to severe sepsis
detection from
EHRs alone
Estimated: 1241
Age: ≥ 18
Definition: All
admitted patients
Primary outcome
in-hospital mortality
Secondary outcomes
length of stay in hospital
and ICU, hospital
readmission
NCT03644940RCT
Year of study:
2020–21
Duration:
6 months
USA,
California
Dascena
Inc.&
University of
California
Cardiology,
GI, ICU,
Medicine,
Oncology,
Surgery,
Transplant
and ED
subpopulation-
optimized
version of InSight
compared to the
original version
used as an early
warning system to
identify patients at
high risk of severe
sepsis; followed
by physician
assessment of
sepsis
Estimated n: 51645
Age: >18
Definition: NR
Primary outcomes
in-hospital SIRS-based
mortality
Secondary outcomes
in-hospital severe sepsis/
shock-coded mortality;
SIRS-based hospital
length of stay; Severe
sepsis/shock-coded
hospital length of stay
NCT03655626Single-arm
trial up to
Year of study:
2018–19 up to
Duration:
6 months
USA, North
Carolina
Duke
University
Hospital
EDmachine learning
algorithm to
predict sepsis,
custom dashboard
and monitoring
Estimated n: 3200
Age: >18
Definition: NR
Primary outcome
rate of CMS bundle
completion for patients
with sepsis
Secondary outcomes
time to sepsis diagnosis;
number of patients
developing sepsis;
number of patients
developing sepsis and
not treated; length of
stay in ED and hospital;
inpatient mortality; ICU
requirement rate; time
from sepsis onset to
blood culture, antibiotics,
IV fluids, lactate, CMS
bundle completion; rate
of lactate complete;
number of sepsis
diagnostic codes per
month

USA: United States of America. NR: Not reported. ED: Emergency department. ICU: Intensive care unit. GI: Gastroenterology.

Respiratory distress/failure

The search yielded 1279 hits. Screening the titles and abstracts lead to 1142 being excluded. The full texts of the remaining 137 titles were obtained and assessed against the PICOS criteria. Studies were excluded due to irrelevant study design (n=42), population (n=6); intervention (n=18) and outcomes (n=47), and conference proceeding from before 2017 (n=2). A total of 22 studies were finally included in this systematic literature review. None of the trials retrieved from ClinicalTrials.gov were included. The study selection process is depicted in Figure 2.

9e584307-c538-437a-ae3e-769efb66c031_figure2.gif

Figure 2. Study selection - Respiratory distress-failure.

Pop. = Population.

Study characteristics. Of the included studies, 17 were conducted in the US33,4863. Five studies were conducted outside the US; two in Canada64,65 by the same research group, two in France66,67 and one in the UK68. In total, 17 studies were retrospective33,4850,5255,5866 and five were prospective51,56,57,67,68. Of these studies, 12 were single-center33,48,49,51,52,54,55,58,59,6466 and 10 studies were multi-center50,53,56,57,6063,67,68. Five studies were time-series48,52,55,56,64, 14 studies were case-series33,49,51,53,54,5762,65,66,68, one was case-control50 and one was case/time series study63.

The smallest sample of 100 patients came from two single-center retrospective studies48,66. Ten studies had sample sizes of 101–100033,4953,57,63,67,68; seven studies had sample sizes of 1001–10,00054,55,59,60,62,64,65; and three had sample sizes larger than 10,00056,58,61. The largest study included more than 50,000 patients admitted to the ED of two centers over a 3-year period61. Several published studies did not report their in-patient setting. When reported, some evaluated data from different wards56,59,64,65,68, and some included patients admitted only to the ED53,54,61,63, the ICU48,60,67 and the surgical ward33,51,55.

The characteristics of all published studies are given in Table 6.

Table 6. Design aspects of published studies on respiratory distress or failure.

StudyStudy DesignCountries and institution(s)Number of
patients
(records)
Population/disease definitionIn-patient
setting
Bejan 2013Retrospective time
series
single center
USA, Washington
University of Washington
100NRICU
Kumamaru
2016
Retrospective case
series
single center
USA, Massachusetts
Brigham and Women’s Hospital
125acute pulmonary embolismNR
Bodduluri
2013
Retrospective
case-control
multi-center
(national data)
USA, Iowa
The University of Iowa
153smokers with or without COPD
and non-smokers
NR
Biesiada 2014Prospective case
series
single center
USA, Cincinnati
Children's Hospital Medical
Center & University of Cincinnati
347current tonsillitis, adenotonsillar
hypertrophy or obstructive sleep
apnea
Surgery
Reamaroon
2018
Retrospective time
series
single-center
USA, Michigan
University of Michigan
401mild hypoxia and acute hypoxic
respiratory failure
NR
Vinson 2015Retrospective case series
multi-center (4
centers)
USA, California
the Kaisers Permanente CREST
Network
593acute pulmonary embolismED
Huesch 2018Retrospective case
series
single center
USA, Pennsylvania
Milton S. Hershey Medical Center
1133individuals suspected of
pulmonary embolism
ED
Mortazavi
2017
Retrospective time
series
single center
USA, Connecticut
Yale University
5214patients undergoing
cardiovascular procedures:
CABG, PCI and ICD procedures
Surgery
Pham 2014Retrospective case
series
single center
France
CHU de Caen, Caen & Hôpital
Européen Georges-Pompidou,
Paris
NR (100)individuals suspected of having
Venous thromboembolism
NR
Rochefort
2015
Retrospective time
series
single center
Canada, Quebec
McGill University
1649 (2000)individuals suspected of having
Venous thromboembolism
various
Silva 2017Prospective
before-after
multi-center (3
centers)
France
University Teaching Hospital
of Purpan, Toulouse; Hopital Dieu
Hospital, Narbonne; Saint Eloi
Hospital, Montpellier
136hemodynamic instability,
respiratory failure, multiple
trauma, nontraumatic coma, and
postoperative complication of
abdominal surgery
ICU
Gonzalez
2018
Prospective time
series
multi-center, multi-
national
USA
Binham and Women`s Hospital
(on behalf of the COPD and
ECLIPSE Study investigators)
11655smokers with or without COPDvarious
Tian 2017Retrospective case
series
single center
Canada, Quebec
Mcgill University
2819
(4000)
individuals suspected of having
Venous thromboembolism
various
Choi 2018Prospective case
series
multi-center (3
centers)
USA
Mayo Clinic, Scottsdale; National
Jewish Health, Denve; University
of Washington Medical Center,
Seattle & Veracyte Inc.
139 (403)suspected interstitial lung diseaseNR
Yu 2014Retrospective case
series
single center
USA, Massachusetts
Brigham, and Women’s Hospital &
Harvard Medical School,
NR
(10,330)
individuals suspected of
pulmonary embolism
NR
Swartz 2017Retrospective case
series
single center
USA, New York
New York University & Mount Sinai
St. Luke`s Hospital
NR (2400)individuals suspected of having
Venous thromboembolism
various
Liu 2013Retrospective case
series
multi-center (21
centers)
USA, California
Kaiser Permanente
NR (2466)NRICU
Haug 2013Retrospective case
series
multi-center(2
centers)
USA, Utah
LDS Hospital and Intermountain
Medical Centre
NR
(362,924)
NRED
Dublin 2013Retrospective case
series
multi-center
(regional data)
USA, Seattle
Group Health Research Institute &
University of Washington
NR (5000)NRNR
Phillips 2014Prospective case
series
multi-center
UK, Llaneli
Swansea University, Aberystwyth
University & Hywel Dda University
Health Board
181with and without COPDvarious
Hu 2016Retrospective case
series
single center
USA, Minnesota
University of Minnesota
NR (8909)NRSurgery
Jones 2018Retrospective
case/time series
multi-center
(number of centers unknown)
USA, Utah & Washington
VA Salt Lake City Health Care
System, University of Utah &
George Washington University
NR (911)individuals suspected of
pneumonia
ED

NA: Not applicable. NR: Not reported. USA: United States of America. COPD: Chronic obstructive pulmonary disease. ECLIPSE: Evaluations of COPD Longitudinally to Identify Predictive Surrogate Endpoints. UK: United Kingdom. CABG: Coronary artery bypass grafting. PCI: Percutaneous coronary intervention. ICD: Implantable cardioverter defibrillator. ICU: Intensive care unit. ED: Emergency department.

CDS systems. About half of the studies developed machine-learning algorithms, whereas the other half focused on natural language processing (NLP) algorithms. One study differed from the rest by developing a computer-aided detection (CAD) system to measure the axial diameter of the right and left pulmonary ventricles, aiding in the diagnosis of pulmonary embolisms49. Many learning algorithms were concerned with detecting pulmonary embolisms and deep vein thrombosis53,54,58,59,6467 as well as pneumonia33,48,57,6063. Three studies developed machine-learning algorithms to detect COPD50,56,69. One study developed a machine learning algorithm to detect acute respiratory distress syndrome52; while other studies developed machine learning algorithms to detect respiratory distress or failure following a pressure support ventilation trial67, cardiovascular surgery55 and pediatric tonsillectomy51.

The classifiers used in the NLP-based studies were various. However, some commonalities emerged between the studies developing machine-learning algorithms. Multiple studies applied SVM, logistic regression, random forests, K- nearest neighbor (kNN), gradient boosting and neural network models. Various classifiers were explored in 5 studies.

Machine learning and NLP-based algorithms were trained and validated in 20 studies and subsequently tested in an independent dataset in 6 studies52,56,6062,67. The CAD system mentioned above and an electronic pulmonary embolism severity index were trained and compared to a reference dataset classified by physicians49,53.

An overview of the developed learning algorithms is provided in Table 7.

Table 7. Overview of the algorithms developed to detect respiratory distress or failure.

Learning algorithm
StudyPredicted diseaseNLPassertion
classification
symbolic
classifiers
rule or probability
based
kNNONYXRFLR, LASSO
penalized
LR, LASSO
regularization
LR, not specifiedgradient (descent)
boosting
Maximum EntropySVMPartial least-
squares regression
NegEXhierarchical
classification
Bayesian networkneural networkJ48JRIPPART
Reamaroon 2018ARDS
Gonzalez 2018COPD, ARDE
Bodduluri 2013COPD
Phillips 2014COPD
Bejan 2013Pneumonia
Dublin 2013Pneumonia
Haug 2013Pneumonia
Hu 2016Pneumonia
Liu 2013Pneumonia
Choi 2018Pneumonia
Jones 2018Pneumonia
Silva 2017Postintubation distress
Mortazavi 2017Postoperative
respiratory failure
Vinson 2015Pulmonary embolism
Yu 2014Pulmonary embolism
Huesch 2018Pulmonary embolism
Kumamaru 2016Pulmonary embolism*
Pham 2014Pulmonary embolism,
DVT
Rochefort 2015Pulmonary embolism,
DVT
Swartz 2017Pulmonary embolism,
DVT
Tian 2017Pulmonary embolism,
DVT
Biesiada 2014Respiratory depression

*A computer aided detection system was developed for measuring the right ventricular/left ventricular axial diameter ratio and detecting pulmonary embolism. ARDS: Acute respiratory distress syndrome. ARDE: Acute respiratory disease events. COPD: Chronic obstructive pulmonary disease. DVT: Deep vein thrombosis.

One study, Reamoroon et al. 201852, used a novel sampling technique to accommodate for inter-dependency in longitudinal data. Model accuracy and ROC AUC with this method was <5% better than random sampling and 4–11% better than no sampling.

Outcome measures. The majority of the studies reported multiple outcome measures of model performance. The most frequently reported outcome measure was sensitivity, followed by specificity and ROC AUC. Likelihood ratios, on the other hand, were only reported in one study: Silva et al. 201767 reported eight outcome measures of their novel machine learning model to predict post extubation distress. The outcomes measured across all studies are summarized in Table 8.

Table 8. Overview of measured outcomes in studies predicting respiratory distress or failure.

StudyAlgorithmSensitivitySpecificityNPVPPVnegative
LR
positive
LR
AccuracyPrevalenceORRRROC AUCDiagnostic
yield
Kumamaru 2016CAD
Bodduluri 2013ML
Hu 2016ML
Mortazavi 2017ML
Rochefort 2015ML
Silva 2017ML
Vinson 2015ML
Biesiada 2014ML
Choi 2018ML
Gonzalez 2018ML
Phillips 2014ML
Reamaroon 2018ML
Bejan 2013NLP
Dublin 2013NLP
Haug 2013NLP
Liu 2013NLP
Pham 2014NLP
Swartz 2017NLP
Tian 2017NLP
Yu 2014NLP
Huesch 2018NLP
Jones 2018NLP

NLP: Natural language processing. ML: Machine learning. CAD: Computer aided detection. NPV: Negative predictive value. PPV: Positive predictive value. LR: Likelihood ratio. OR: Odds ratio. RR: Risk ratio. ROC AUC: Receiver operating characteristic area under the curve.

Many of the studies that developed NLP-based algorithms reported negative and positive predictive values, as well as sensitivity and specificity. In contrast, the ROC AUC was the most frequently reported outcome measure of machine learning algorithm performance. It was also the single preferred outcome in three studies33,50,55. About half of the studies additionally reported sensitivity, specificity, and accuracy. One study reported specificity with sensitivity set at 90% and 95% to ensure that few disease positive cases were missed52. The single study that developed a CAD system measured the ROC AUC and model accuracy49.

Infection or sepsis

The search yielded 2659 hits. Screening the titles and abstracts lead to 2562 being excluded. The full texts of the remaining 97 titles were obtained and assessed against the PICOS criteria. Studies were excluded due to irrelevant study design (n=41), population (n=4); intervention (n=6) and outcomes (n=14). A total of 31 studies were finally included in this systematic literature review. Four of these were ongoing trials. The study selection process is depicted in Figure 3.

9e584307-c538-437a-ae3e-769efb66c031_figure3.gif

Figure 3. Study selection - infection or sepsis.

Pop. = Population.

Study characteristics. Of the included studies, 24 were conducted in the US. Three studies were conducted outside the US; one in France; one in the Netherlands and one in the UK. In total, 21 studies were retrospective33,35,7088 and six were prospective8994. There were 21 single-center studies33,7075,7783,8688,9092,94 and six multi-center studies35,76,84,85,89,93. Seven studies were time series71,78,82,8486,92, 18 studies were case series33,35,70,7276,80,81,83,8791,93,94, one was a case-control77 and one was a matched-controlled study79.

The smallest studies included patients with leukemia89 and combat casualty patients90. Four studies had a sample size below 100070,72,73,79, three had a sample size between 1001–10,00033,71,87 and 12 had a sample size larger than 10,00035,74,7778,8082,8487,88. Eight studies had samples even larger than 50,00035,74,77,78,82,84,85,88. Large samples were achieved by less restrictive inclusion criteria where all patients admitted to specific ward(s) or hospital(s) over a given time were defined.

Majority of the published studies evaluated data from different wards; several studies included patients admitted only to the ICU70,72,81,8486,93 and surgical ward73,76,78,87,91,92, less often the General ward33 and Emergency Department74. Of these, 23 studies included data collected at their own hospital; and four utilized previously collated databases76,81,84,86.

The characteristics of all published studies are given in Table 9.

Table 9. Design aspects of published studies on infection or sepsis.

StudyStudy DesignCountry and institution(s)Number of
patients
(records)
Population/disease
definition
In-patient
setting
Ahmed 2015Retrospective case
series
single center
USA, Minnesota
Mayo Clinic Rochester
944NRICU
Brasier, 2015Prospective case
series
multi-center (3
sites)
USA, Texas
Aspergillus Technology
Consortium & University of Texas
57LeukemiaNR
Dente, 2017Prospective case
series
single center
USA, Maryland
Emory University, Walter Reed
National Military Medical Centre
73Combat casualty patientsNR
Hu, 2016Retrospective case
series
single center
USA, Minnesota
University of Minnesota
NR (8,909)NRGeneral
Konerman, 2017Retrospective time
series
single center
USA, Michigan
University of Michigan
1,233Chronic hepatitis cNR
Legrand, 2013Prospective case
series
single center
France, Paris
Hôpital Européen Georges
Pompidou Assistance Publique-
Hopitaux de Paris
202Infective endocarditisSurgery
Mani, 2014Retrospective case
series
single center
USA, New Mexico
University of New Mexico
299SepsisICU
Mao 2018Retrospective case
series
multi-center (5
centers)
USA
University of California, Stanford
Medical Centre, Oroville Hospital,
Bakersfield Heart Hospital, Cape
Regional Medical Centre, Beth
Israel Deaconess Medical Center
359,390NRvarious
Sanger, 2016Prospective time
series
single center
USA, Washington
University of Washington
851Open-abdominal surgery
patients
Surgery
Scicluna, 2017Prospective case
series
multi-center (2
sites + national
database)
Netherlands & UK Amsterdam
Academic Medical Center, Utrecht
University Medical Center & UK
Genomic Advances in Sepsis
study
787SepsisICU
Sohn, 2016Retrospective case
series
single center
USA, Minnesota
Mayo Clinic Rochester
751Colorectal surgery patientsSurgery
Taylor, 2018Retrospective case
series
single center
USA, Connecticut
Yale University School of
Medicine,
55,365
(80,387)
Suspected urine tract
infection
ED
Hernandez 2017Retrospective case
series
single center
UK, London
Imperial College Healthcare NHS
Trust
> 500,000NRNR
Bartz-Kurycki
2018
Retrospective case
series
multi-center
(national database)
USA, Texas
University of Texas
13,589NRSurgery
Beeler 2018Retrospective
case-control
single center
USA, Indiana
Indiana University Health
Academic Health Center
NR (70,218)Central venous line with
or without central line-
associated bloodstream
infections
NR
Bihorac 2018Retrospective time
series
single center
USA, Florida
University of Florida Health
51,457NRSurgery
Chen 2018Retrospective
matched pairs (1:1
case matching)
single center
USA, Kansas
University of Kansas Health
System
358Stage 3 AKI and non-AKI
controls
NR
Cheng 2017Retrospective case
series
single center
USA, Kansas
University of Kansas Medical
Center
33,703
(48,955)
NRNR
Desautels 2016Retrospective case
series
single center
USA, California
Dascena Inc.& University of
California
NR (21,176)NRICU
Koyner 2015Retrospective time
series
single center
USA, Chicago University of
Chicago
NR (121,158)NRNR
LaBarbera 2015Retrospective case
series
single center
USA, Pennsylvania
Pinnacle Health Hospital,
Harrisburg
198Clostridium difficile infectionNR
Mohamadlou
2018
Retrospective time
series
multi-center (2
sites)
USA
Dascena Inc., University of
California & Stanford University
68,319NRICU
Nemati 2018Retrospective time
series
multi-center (3
sites)
USA, Georgia
Emory University School of
Medicine & Georgia Institute of
Technology
69,938NRICU
Parreco 2018Retrospective time
series
single center
USA, Florida
University of Miami
NA (22,201)NAICU
Taneja 2017Prospective case
series
single center
USA, Illinois
University of Illinois
444Suspected sepsisNR
Weller 2018Retrospective case
series
single center
USA, Minnesota
Mayo Clinic Rochester
1,283Colorectal surgery patientsSurgery
Wiens 2014Retrospective case
series
single center
USA
single center not specified
NR (69,568)NRvarious

NA: Not applicable. NR: Not reported. USA: United States of America. UK: United Kingdom. ICU: Intensive care unit. ED: Emergency department. AKI: Acute kidney injury.

CDS systems. The machine learning algorithms evaluated in the studies were developed to predict a range of diseases. These included sepsis33,35,72,78,81,85,93,94, acute kidney injury70,7880,82,84,91, surgical site infections33,73,76,87,92, central line-associated bloodstream infections77,86, Clostridium difficile83,88, pulmonary aspergillosis89, bacteremia90, fibrosis71, urine tract infection33,74 and infections in general75.

Almost half of the studies compared different machine learning algorithms, while the others focused only on Bayesian algorithms73,92, decision tree algorithms84, ensemble algorithms35,71,82,83,90,93, regression algorithms33,78,85, regularization algorithms81,88 and rule learning70. The most frequently applied model was random forest (15 studies) followed by logistic regression (10 studies), support vector machines (5 studies), naïve Bayes (5 studies) and gradient tree boosting (5 studies).

One study compared three different sampling methods for handling class imbalance; under-sampling the majority class (RANDu), over-sampling the minority class (RANDo) and synthetic minority over-sampling (SMOTE). This was a very large study including more than 500,000 patients to predict the onset of infections75. The authors found that SMOTE outperformed the other techniques and improved model sensitivity. Two other very large studies used the RANDu method80 and mini-batch stochastic gradient descent with backpropagation85. No other studies were concerned with imbalance in disease positive and negative classification.

Machine learning models were trained and validated in 26 studies and subsequently tested in an independent dataset in four studies35,72,75,77.

The machine learning algorithms used are illustrated in Table 10.

Table 10. Overview of machine learning algorithms evaluated in studies on infection or sepsis.

Machine learning algorithm
StudyPredicted
disease
Rule learningNBtree augmented NBAODElazy Bayesian rulesBayesian GLMBayesian network
analysis
CARTdecision tree
classifier
neural networkRF(extreme) gradient
boosting
adaptive boostingensemble classifierk nearest neighborMARSGPSLaaso penalized
LR
LR, not specifiedSVMgeneralized
additive model
GLMstepwise
regression
polynomial linear
model
ploynomial spline
regression
Weibull PH modelL2-regularised LRelastic net
regularization
Ahmed 2015AKI
Legrand,
2013
AKI
Cheng 2017AKI
Koyner 2015AKI
Bihorac 2018AKI, sepsis
Mohamadlou
2018
AKI, Stage 2/3
Chen 2018AKI, Stage 3
Dente, 2017bacteremia
Beeler 2018CLABSI
Parreco 2018CLABSI
LaBarbera
2015
clostridium
difficile
Wiens 2014clostridium
difficile
Konerman,
2017
fibrosis
Hernandez
2017
infection
Brasier, 2015pulmonary
aspergillosis
Mani, 2014sepsis
Mao, 2018sepsis
Scicluna,
2017
sepsis
Desautels
2016
sepsis
Nemati 2018sepsis
Taneja 2017sepsis
Sanger, 2016SSI
Sohn, 2016SSI
Bartz-Kurycki
2018
SSI
Weller 2018SSI
Hu 2016SSI, UTI,
pneumonia,
sepsis
Taylor, 2018UTI

AKI: Acute kidney injury. SSI: Surgical site infection. UTI: Urinary tract infections. CLABSI: Central line-associated bloodstream infections. NB: Naive Bayes. AODE: Averaged one dependence estimators. CART: Classification and regression tree. RF: Random forest. MARS: Multivariate Adaptive Regression Splines GPS: Generalized path seeker algorithm. LR: Logistic regression. SVM: Support vector machine. GLM: Generalized linear model. PH: Proportional hazards.

Outcome measures. The most frequently reported outcome measure was the ROC AUC. Three studies did not report this measure: Ahmed et al. 201570 developed an algorithm based on decision rules; Legrand et al. 201391 was primarily interested in identifying risk factors of AKI after cardiac surgery; and Scicluna et al. 201793 was primarily concerned with identifying genetic biomarkers of sepsis.

Sensitivity and specificity were reported together in 14 studies35,7072,74,75,78,8184,87,90,92. When specificity was not reported, sensitivity was reported together with PPV; and when sensitivity was not reported, this was due to sensitivity being set at a fixed value to report other diagnostic performance measures. In relation to the prior observation, more studies reported PPV than NPV. Four studies reporting likelihood ratios reported both negative and positive likelihood ratios70,74,81,84.

An overview of measured outcomes is illustrated in Table 11.

Table 11. Overview of measured outcomes in studies predicting sepsis or infection.

StudySensitivitySpecificityNPVPPVnegative
LR
positive
LR
AccuracyPrevalenceORRRROC AUC
Ahmed 2015
Brasier, 2015
Dente, 2017
Hu, 2016
Konerman,
2017
Legrand, 2013
Mani, 2014
Mao 2018
Sanger, 2016
Scicluna, 2017
Sohn, 2016
Taylor, 2018
Hernandez
2017
Bartz-Kurycki
2018
Beeler 2018
Bihorac 2018
Chen 2018
Cheng 2017
Desautels
2016
Koyner 2015
LaBarbera
2015
Mohamadlou
2018
Nemati 2018
Parreco 2018
Taneja 2017
Weller 2018
Wiens 2014

NPV: Negative predictive value. PPV: Positive predictive value. LR: Likelihood ratio. OR: Odds ratio, RR: Risks ratio. ROC AUC: Receiver operator curve area under the curve.

Ongoing studies. Four trials are currently ongoing, one in Germany and the others in the USA, all concerned with the prediction of sepsis. Three of them are prospective studies and one is retrospective. The retrospective study aims to develop a prediction algorithm based on claims data, EHRs, risk factors and survey data of an estimated 50,000 adult patients admitted to the ED. The German study NCT0366145095 is a single-arm trial evaluating the utility of a CDS system to identify SIRS or sepsis from EHRs in a pediatric ICU population. Another single-arm trial NCT0365562647 is concerned with implementing a sepsis prediction algorithm in clinical practice as an early warning system. NCT0364494046 is comparing two versions of InSight introduced into clinical practice as an early warning system.

Discussion and conclusions

This systematic literature review shows that over the last 2 decades, there has been an increased interest in CDS as means of supporting clinicians in acute care. CDS has been investigated for several applications ranging from the detection of health conditions60,61, to the prediction of deterioration or adverse events40,55,76,81,83,84. Applications also include therapy guidance, as well as updating clinicians on new or changed recommendations96. CDS can also provide guidance by predicting clinical trajectories for different patient profiles over time97.

From rule-based algorithms and simple regression models, CDS has evolved to encompass a multitude of techniques in Machine-Learning98. These techniques can be dependent on the problem selected and the data types used. Across the three disease areas investigated, the frequent use of random forest classifiers (28.1%), support vector machines (21.9%), boosting techniques (20.3%), LASSO regression (18.8%) and unspecified logistic regression models (10.9%) were observed. The use of more complex modeling such as maximum entropy, Hidden Markov Models (for temporal data analysis) as well as Convolutional Neural Networks has also emerged over the last few years. In the respiratory distress area, the use of NLP models is more common as radiology reports and clinical notes are the main source of input. Different image analysis techniques have been developed to aid in the prediction and diagnosis of respiratory events from radiology images.

Typical measures of NLP model performance include sensitivity, specificity and predictive values. In measuring ML algorithm performance, sensitivity, specificity and ROC AUC are more common. A wide range of outcome measure were reported in research on less-investigated health conditions40,67; and also when uncommon, more complex algorithms were compared to basic algorithms74,78,81,84. This is not surprising given the novelty of these applications.

Many of the ML algorithms and all of the NLP models covered in this work were based on medical data collected in certain clinical sites rather than publicly available data. Datasets from national audits, completed studies or other online sources can additionally play a role, particularly in model validation and testing. This could aid in the adoption and wider use of CDS systems. In this SLR, publicly available datasets were mainly utilized for developing prediction models of heart arrhythmias2931, hypotension32, septic shock28,33,40,41, COPD50, pneumonia33 and a range of infections33,76,78,81,84,86. In only three cases were they used for testing model performance in sepsis and septic shock prediction; this included the Insight algorithm35,85,93.

Most of the studies identified in this SLR were retrospective and originated in the USA where electronic health records (EHR) are commonly used. This makes it easier to access and compile large amounts of patient-level information. Many of the studies on shock and infection/sepsis based their models on data extracted from EHRs and utilized large sample sizes. The diversity in the identified CDS systems makes it challenging to draw conclusions on methodology. The lack of comparisons between different classifiers within studies, especially for the indication of shock, adds to this challenge. To assess the effectiveness of ML algorithms, future research should evaluate multiple algorithms on standard well-labeled datasets.

Class imbalance can be an important issue when training classifiers on datasets for the conditions highlighted in this work. Unequal distributions can arise naturally between disease negative and positive classes when forming validation sets, particularly when disease prevalence is low75. We refer the reader to several machine learning reviews that have addressed this issue99101. Another important issue in forming disease positive classes relates to the analysis of repeated-measures within subjects, for example, when clinical records are available for each hospitalization day. Several studies have approached this by selecting the first record indicating positive for a health condition. Few researchers have utilized all records and corrected for within-subject variation. An example is the selection of cases depending on observed correlation decay52.

In all three areas investigated, the number of retrospective studies exceeded by far the number of prospective studies conducted in a clinical setting. This highlights the challenges in substantiating clinical performance while bringing new clinical decision tools to routine in-hospital patientcare. Examples of algorithms that can be integrated in clinical practice include InSight45,46 and Sepsis Watch47 which are intended for predicting sepsis and septic shock.

The current systematic literature review did not search multiple bibliographic databases or clinical trial registers; and focused on diagnostic performance rather than other outcomes. In fact, during study screening, trials that evaluated the impact of early warning systems on measures of clinical workflow, rate of re-admissions and/or mortality were discarded as they are somehow out of the focus of this work. This implies that there may be more CDS systems used in practice for the three populations investigated within this research, where the outcomes measured are different. Limiting the search to publications in English and to studies conducted in particular countries; and the exclusion of study protocols identified from the bibliographic database search without checking for later publications from the same authors may have further limited the studies selected. Nevertheless, studies identified within each population represented a diverse range of models applied in different hospital settings trained to predict a range of health conditions. The most widely researched conditions were sepsis and septic shock, venous thromboembolisms, acute kidney injury and surgical site infections.

Specific challenges were identified in collecting sufficient data for training CDS systems on hemodynamic instability. Patients who are, for example, at risk of hemorrhage due to a traumatic injury need to be carefully monitored; and the speed by which they reach a critical state may influence data and study management. It may also be difficult to find healthy volunteers who are willing to undergo procedures like lower body negative pressure which can be unpleasant36. Identification of cases in need of hemodynamic interventions can lend towards larger sample size19. Other conditions that need further attention are clostridium difficile and CLABSI. Prediction models were driven by almost perfect specificity and very low (<10%) sensitivity77,83,86,88. Considering that these studies used a wide range of features from the EHRs and a large number of patients, except LaBarbera, Nikiforov83, there is a need to better understand the risk factors to improve sensitivity.

Based on the literature reviewed in this work, as well as several recent surveys and workshops, we would recommend the following points to be addressed when bringing a new CDS tool to critical care14,102104:

  • Integrating CDS in clinical workflows without adding unnecessary extra work to busy clinical teams. The CDS101 toolbox by HIMMS highlights the “CDS five rights”, which are certainly applicable to critical care105: Providing the right information in the right intervention format, to the right person at the right point in their workflow, and through the right channel.

  • Developing tools and concrete proof-points able to assess CDS efficacy in the clinic. This also highlights the importance of providing continuous feedback to clinicians.

  • The importance of easy to use user interfaces and focusing on human-computer interaction during deployment.

  • Efficient training that is available when needed.

  • Being aware of alert or alarm fatigue and not overloading clinicians with alerts due to CDS. The intensive care unit is already plagued with alarms, and if anything, CDS should help in reducing alarms by bundling alerts according to underlying conditions.

  • Displaying the rationale for decisions as well as the underlying data to clinical users would lead to improved adoption.

  • Understanding ethical challenges for CDS, as well as a careful risk assessment in every site before deployment106.

  • Being able to repeat/standardize implementation across organizations – most prospective studies reviewed in this work covered single centers. Only a few were multi-center studies.

Data availability

Underlying data

All data underlying the results are available as part of the article and no additional source data are required

Extended data

Figshare: Evidence-based Clinical Decision Support Systems for the prediction and detection of three disease states in critical care: A systematic literature review. Extended data - Table 1-Search strategy for shock (hemodynamic (in-stability) in MEDLINE.docx. https://doi.org/10.6084/m9.figshare.9892109.v125.

Figshare: Working title: Evidence-based Clinical Decision Support Systems for the prediction and detection of three disease states in critical care: A systematic literature review. Extended data - Table 2-Search strategy for respiratory distress or respiratory failure in MEDLINE.docx. https://doi.org/10.6084/m9.figshare.9892112.v126.

Figshare: Working title: Evidence-based Clinical Decision Support Systems for the prediction and detection of three disease states in critical care: A systematic literature review. Extended data - Table 3-Search strategy for infection or sepsis in MEDLINE.docx. https://doi.org/10.6084/m9.figshare.9892115.v127.

Reporting guidelines

Figshare: PRISMA checklist for ‘Evidence-based Clinical Decision Support Systems for the prediction and detection of three disease states in critical care: A systematic literature review’. https://doi.org/10.6084/m9.figshare.9894107.v1107.

Data are available under the terms of the Creative Commons Zero “No rights reserved” data waiver (CC0 1.0 Public domain dedication).

Comments on this article Comments (0)

Version 2
VERSION 2 PUBLISHED 08 Oct 2019
Comment
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
Views Downloads
F1000Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
Medic G, Kosaner Kließ M, Atallah L et al. Evidence-based Clinical Decision Support Systems for the prediction and detection of three disease states in critical care: A systematic literature review [version 2; peer review: 2 approved] F1000Research 2019, 8:1728 (https://doi.org/10.12688/f1000research.20498.2)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions
Version 2
VERSION 2
PUBLISHED 27 Nov 2019
Revised
Views
6
Cite
Reviewer Report 04 Dec 2019
Stavros Nikolakopoulos, Department of Biostatistics & Research Support, Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht, The Netherlands 
Approved
VIEWS 6
My comments were adequately addressed ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Nikolakopoulos S. Reviewer Report For: Evidence-based Clinical Decision Support Systems for the prediction and detection of three disease states in critical care: A systematic literature review [version 2; peer review: 2 approved]. F1000Research 2019, 8:1728 (https://doi.org/10.5256/f1000research.23644.r57156)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
Views
5
Cite
Reviewer Report 28 Nov 2019
Milena Kovacevic, Department of Pharmacokinetics and Clinical Pharmacy, Faculty of Pharmacy, University of Belgrade, Belgrade, Serbia 
Approved
VIEWS 5
Thank you for addressing my comments. ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Kovacevic M. Reviewer Report For: Evidence-based Clinical Decision Support Systems for the prediction and detection of three disease states in critical care: A systematic literature review [version 2; peer review: 2 approved]. F1000Research 2019, 8:1728 (https://doi.org/10.5256/f1000research.23644.r57155)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
Version 1
VERSION 1
PUBLISHED 08 Oct 2019
Views
9
Cite
Reviewer Report 18 Nov 2019
Milena Kovacevic, Department of Pharmacokinetics and Clinical Pharmacy, Faculty of Pharmacy, University of Belgrade, Belgrade, Serbia 
Approved
VIEWS 9
The review summarizes the utilization of clinical decision support (CDS) systems in three selected states in critical care – shock/hemodynamic (in-)stability; respiratory distress/failure; and infection/sepsis. The background of the study has a strong rationale.

The study comprised ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Kovacevic M. Reviewer Report For: Evidence-based Clinical Decision Support Systems for the prediction and detection of three disease states in critical care: A systematic literature review [version 2; peer review: 2 approved]. F1000Research 2019, 8:1728 (https://doi.org/10.5256/f1000research.22530.r56200)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
Views
53
Cite
Reviewer Report 06 Nov 2019
Stavros Nikolakopoulos, Department of Biostatistics & Research Support, Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht, The Netherlands 
Approved with Reservations
VIEWS 53
The authors report on a systematic review in order to assess the state-of-the -art in the field of Clinical Decision Support (CDS) systems in the last 5 years (2013-2018). They review and report on study designs, outcomes and methods employed ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Nikolakopoulos S. Reviewer Report For: Evidence-based Clinical Decision Support Systems for the prediction and detection of three disease states in critical care: A systematic literature review [version 2; peer review: 2 approved]. F1000Research 2019, 8:1728 (https://doi.org/10.5256/f1000research.22530.r56199)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.

Comments on this article Comments (0)

Version 2
VERSION 2 PUBLISHED 08 Oct 2019
Comment
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.