Top

BMC Medical Informatics and Decision Making

Published in:

Open Access 01-07-2020 | Care | Research

Supervised-actor-critic reinforcement learning for intelligent mechanical ventilation and sedative dosing in intensive care units

Authors: Chao Yu, Guoqi Ren, Yinzhao Dong

Published in: BMC Medical Informatics and Decision Making | Special Issue 3/2020

Abstract

Background

Reinforcement learning (RL) provides a promising technique to solve complex sequential decision making problems in healthcare domains. Recent years have seen a great progress of applying RL in addressing decision-making problems in Intensive Care Units (ICUs). However, since the goal of traditional RL algorithms is to maximize a long-term reward function, exploration in the learning process may have a fatal impact on the patient. As such, a short-term goal should also be considered to keep the patient stable during the treating process.

Methods

We use a Supervised-Actor-Critic (SAC) RL algorithm to address this problem by combining the long-term goal-oriented characteristics of RL with the short-term goal of supervised learning. We evaluate the differences between SAC and traditional Actor-Critic (AC) algorithms in addressing the decision making problems of ventilation and sedative dosing in ICUs.

Results

Results show that SAC is much more efficient than the traditional AC algorithm in terms of convergence rate and data utilization.

Conclusions

The SAC algorithm not only aims to cure patients in the long term, but also reduces the degree of deviation from the strategy applied by clinical doctors and thus improves the therapeutic effect.

Sutton RS, Barto AG. Reinforcement Learning: An Introduction. Cambridge, Massachusetts: The MIT press; 1998.

Yu C, Liu J, Nemati S. Reinforcement learning in healthcare: A survey. 2019. arXiv preprint arXiv:1908.08796.

Bothe MK, Dickens L, Reichel K, Tellmann A, Ellger B, Westphal M, Faisal AA. The use of reinforcement learning algorithms to meet the challenges of an artificial pancreas. Expert Rev Med Devices. 2013; 10(5):661–73.CrossRef

Tseng HH, Luo Y, Cui S, Chien JT, Ten Haken RK, El Naqa I. Deep reinforcement learning for automated radiation adaptation in lung cancer. Med Phys. 2017; 44(12):6690–705.CrossRef

Yu C, Ren G, Liu J. Deep Inverse Reinforcement Learning for Sepsis Treatment. In: 2019 IEEE International Conference on Healthcare Informatics (ICHI). New York: IEEE: 2019. p. 1–3.

Shortreed SM, Laber E, Lizotte DJ, Stroup TS, Pineau J, Murphy SA. Informing sequential clinical decision-making through reinforcement learning: an empirical study. Mach Learn. 2011; 84(1-2):109–36.CrossRef

Nagaraj V, Lamperski A, Netoff TI. Seizure control in a computational model using a reinforcement learning stimulation paradigm. Int J Neural Syst. 2017; 27(07):1750012.CrossRef

Yu C, Dong Y, Liu J, Ren G. Incorporating causal factors into reinforcement learning for dynamic treatment regimes in HIV. BMC Med Inform Decis Making. 2019; 19(2):60.CrossRef

Konda VR, Tsitsiklis JN. Actor-critic algorithms. In: Advances in neural information processing systems. Cambridge: MIT Press: 2000. p. 1008–14.

10.

Johnson AE, Ghassemi MM, Nemati S, Niehaus KE, Clifton DA, Clifford GD. Machine learning and decision support in critical care. Proc IEEE Inst Electr Electron Eng. 2016; 104(2):444–66.CrossRef

11.

Komorowski M, Celi LA, Badawi O, Gordon AC, Faisal AA. The artificial intelligence clinician learns optimal treatment strategies for sepsis in intensive care. Nat Med. 2018; 24(11):1716–20.CrossRef

12.

Raghu A, Komorowski M, Ahmed I, Celi L, Szolovits P, Ghassemi M. Deep reinforcement learning for sepsis treatment. 2017. arXiv preprint arXiv:1711.09602.

13.

Raghu A, Komorowski M, Celi LA, Szolovits P, Ghassemi M. Continuous State-Space Models for Optimal Sepsis Treatment: A Deep Reinforcement Learning Approach. In: Machine Learning for Healthcare Conference. Cambridge: MIT Press: 2017. p. 147–63.

14.

Padmanabhan R, Meskin N, Haddad WM. Closed-loop control of anesthesia and mean arterial pressure using reinforcement learning. Biomed Signal Process Control. 2015; 22:54–64.CrossRef

15.

Padmanabhan R, Meskin N, Haddad WM. Optimal adaptive control of drug dosing using integral reinforcement learning. Math Biosci. 2019; 309:131–42.CrossRef

16.

Prasad N, Cheng LF, Chivers C, Draugelis M, Engelhardt BE. A reinforcement learning approach to weaning of mechanical ventilation in intensive care units. 2017. arXiv preprint arXiv:1704.06300.

17.

Utomo CP, Li X, Chen W. Treatment Recommendation in Critical Care: A Scalable and Interpretable Approach in Partially Observable Health States. In: 39th International Conference on Information Systems. New York: Curran Associates: 2018. p. 1–9.

18.

Nemati S, Ghassemi MM, Clifford GD. Optimal medication dosing from suboptimal clinical examples: A deep reinforcement learning approach. In: 2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). New York: IEEE: 2016. p. 2978–81.

19.

Yu C, Liu J, Zhao H. Inverse reinforcement learning for intelligent mechanical ventilation and sedative dosing in intensive care units. BMC Med Inform Decis Making. 2019; 19(2):57.CrossRef

20.

Chang CH, Mai M, Goldenberg A. Dynamic Measurement Scheduling for Event Forecasting using Deep RL. 2019. arXiv preprint arXiv:1901.09699.

21.

Johnson AE, Pollard TJ, Shen L, Li-wei HL, Feng M, Ghassemi M, Moody B, Szolovits P, Celi LA, Mark RG. MIMIC-III, a freely accessible critical care database. Sci Data. 2016; 3:160035.CrossRef

22.

Shawe-Taylor J, Cristianini N. Support vector machines. An Introduction to Support Vector Machines and Other Kernel-based Learning Methods. United Kingdom: Cambridge university press; 2000, pp. 93–112.

23.

Si J, Barto AG, Powell WB, Wunsch D. Supervised actor-critic reinforcement learning. In: Handbook of learning and approximate dynamic programming. London: IEEE Press: 2004. p. 359–80.CrossRef

24.

Zinkevich M, Weimer M, Li L, Smola AJ. Parallelized stochastic gradient descent. In: Advances in neural information processing system. Cambridge: MIT Press: 2010. p. 2595–603.

25.

Golson S. One-hot state machine design for FPGAs. In: Proc. 3rd Annual PLD Design Conference & Exhibit, vol. 1. New York: IEEE: 1993.

Title: Supervised-actor-critic reinforcement learning for intelligent mechanical ventilation and sedative dosing in intensive care units
Authors: Chao Yu
Guoqi Ren
Yinzhao Dong
Publication date: 01-07-2020
Publisher: BioMed Central
Keyword: Care
Published in: BMC Medical Informatics and Decision Making / Issue Special Issue 3/2020
Electronic ISSN: 1472-6947
DOI: https://doi.org/10.1186/s12911-020-1120-5

At a glance: The ONWARDS insulin icodec trials

Springer Medicine

Supervised-actor-critic reinforcement learning for intelligent mechanical ventilation and sedative dosing in intensive care units

Abstract

Background

Methods

Results

Conclusions

At a glance: The ONWARDS insulin icodec trials

Springer Medicine

Abstract

Background

Methods

Results

Conclusions

Please log in to get access to this content

Other articles of this Special Issue 3/2020

ECCParaCorp: a cross-lingual parallel corpus towards cancer education, dissemination and application

A benchmark dataset and case study for Chinese medical question intent classification

Prediction of blood culture outcome using hybrid neural network model based on electronic health records

A mobile app for Glaucoma diagnosis and its possible clinical applications

Identifying diagnosis evidence of cardiogenic stroke from Chinese echocardiograph reports

Modelling clinical experience data as an evidence for patient-oriented decision support