Skip to main content
Top
Published in: BMC Medical Informatics and Decision Making 3/2020

Open Access 01-07-2020 | Care | Research

Supervised-actor-critic reinforcement learning for intelligent mechanical ventilation and sedative dosing in intensive care units

Authors: Chao Yu, Guoqi Ren, Yinzhao Dong

Published in: BMC Medical Informatics and Decision Making | Special Issue 3/2020

Login to get access

Abstract

Background

Reinforcement learning (RL) provides a promising technique to solve complex sequential decision making problems in healthcare domains. Recent years have seen a great progress of applying RL in addressing decision-making problems in Intensive Care Units (ICUs). However, since the goal of traditional RL algorithms is to maximize a long-term reward function, exploration in the learning process may have a fatal impact on the patient. As such, a short-term goal should also be considered to keep the patient stable during the treating process.

Methods

We use a Supervised-Actor-Critic (SAC) RL algorithm to address this problem by combining the long-term goal-oriented characteristics of RL with the short-term goal of supervised learning. We evaluate the differences between SAC and traditional Actor-Critic (AC) algorithms in addressing the decision making problems of ventilation and sedative dosing in ICUs.

Results

Results show that SAC is much more efficient than the traditional AC algorithm in terms of convergence rate and data utilization.

Conclusions

The SAC algorithm not only aims to cure patients in the long term, but also reduces the degree of deviation from the strategy applied by clinical doctors and thus improves the therapeutic effect.
Literature
1.
go back to reference Sutton RS, Barto AG. Reinforcement Learning: An Introduction. Cambridge, Massachusetts: The MIT press; 1998. Sutton RS, Barto AG. Reinforcement Learning: An Introduction. Cambridge, Massachusetts: The MIT press; 1998.
2.
go back to reference Yu C, Liu J, Nemati S. Reinforcement learning in healthcare: A survey. 2019. arXiv preprint arXiv:1908.08796. Yu C, Liu J, Nemati S. Reinforcement learning in healthcare: A survey. 2019. arXiv preprint arXiv:1908.08796.
3.
go back to reference Bothe MK, Dickens L, Reichel K, Tellmann A, Ellger B, Westphal M, Faisal AA. The use of reinforcement learning algorithms to meet the challenges of an artificial pancreas. Expert Rev Med Devices. 2013; 10(5):661–73.CrossRef Bothe MK, Dickens L, Reichel K, Tellmann A, Ellger B, Westphal M, Faisal AA. The use of reinforcement learning algorithms to meet the challenges of an artificial pancreas. Expert Rev Med Devices. 2013; 10(5):661–73.CrossRef
4.
go back to reference Tseng HH, Luo Y, Cui S, Chien JT, Ten Haken RK, El Naqa I. Deep reinforcement learning for automated radiation adaptation in lung cancer. Med Phys. 2017; 44(12):6690–705.CrossRef Tseng HH, Luo Y, Cui S, Chien JT, Ten Haken RK, El Naqa I. Deep reinforcement learning for automated radiation adaptation in lung cancer. Med Phys. 2017; 44(12):6690–705.CrossRef
5.
go back to reference Yu C, Ren G, Liu J. Deep Inverse Reinforcement Learning for Sepsis Treatment. In: 2019 IEEE International Conference on Healthcare Informatics (ICHI). New York: IEEE: 2019. p. 1–3. Yu C, Ren G, Liu J. Deep Inverse Reinforcement Learning for Sepsis Treatment. In: 2019 IEEE International Conference on Healthcare Informatics (ICHI). New York: IEEE: 2019. p. 1–3.
6.
go back to reference Shortreed SM, Laber E, Lizotte DJ, Stroup TS, Pineau J, Murphy SA. Informing sequential clinical decision-making through reinforcement learning: an empirical study. Mach Learn. 2011; 84(1-2):109–36.CrossRef Shortreed SM, Laber E, Lizotte DJ, Stroup TS, Pineau J, Murphy SA. Informing sequential clinical decision-making through reinforcement learning: an empirical study. Mach Learn. 2011; 84(1-2):109–36.CrossRef
7.
go back to reference Nagaraj V, Lamperski A, Netoff TI. Seizure control in a computational model using a reinforcement learning stimulation paradigm. Int J Neural Syst. 2017; 27(07):1750012.CrossRef Nagaraj V, Lamperski A, Netoff TI. Seizure control in a computational model using a reinforcement learning stimulation paradigm. Int J Neural Syst. 2017; 27(07):1750012.CrossRef
8.
go back to reference Yu C, Dong Y, Liu J, Ren G. Incorporating causal factors into reinforcement learning for dynamic treatment regimes in HIV. BMC Med Inform Decis Making. 2019; 19(2):60.CrossRef Yu C, Dong Y, Liu J, Ren G. Incorporating causal factors into reinforcement learning for dynamic treatment regimes in HIV. BMC Med Inform Decis Making. 2019; 19(2):60.CrossRef
9.
go back to reference Konda VR, Tsitsiklis JN. Actor-critic algorithms. In: Advances in neural information processing systems. Cambridge: MIT Press: 2000. p. 1008–14. Konda VR, Tsitsiklis JN. Actor-critic algorithms. In: Advances in neural information processing systems. Cambridge: MIT Press: 2000. p. 1008–14.
10.
go back to reference Johnson AE, Ghassemi MM, Nemati S, Niehaus KE, Clifton DA, Clifford GD. Machine learning and decision support in critical care. Proc IEEE Inst Electr Electron Eng. 2016; 104(2):444–66.CrossRef Johnson AE, Ghassemi MM, Nemati S, Niehaus KE, Clifton DA, Clifford GD. Machine learning and decision support in critical care. Proc IEEE Inst Electr Electron Eng. 2016; 104(2):444–66.CrossRef
11.
go back to reference Komorowski M, Celi LA, Badawi O, Gordon AC, Faisal AA. The artificial intelligence clinician learns optimal treatment strategies for sepsis in intensive care. Nat Med. 2018; 24(11):1716–20.CrossRef Komorowski M, Celi LA, Badawi O, Gordon AC, Faisal AA. The artificial intelligence clinician learns optimal treatment strategies for sepsis in intensive care. Nat Med. 2018; 24(11):1716–20.CrossRef
12.
go back to reference Raghu A, Komorowski M, Ahmed I, Celi L, Szolovits P, Ghassemi M. Deep reinforcement learning for sepsis treatment. 2017. arXiv preprint arXiv:1711.09602. Raghu A, Komorowski M, Ahmed I, Celi L, Szolovits P, Ghassemi M. Deep reinforcement learning for sepsis treatment. 2017. arXiv preprint arXiv:1711.09602.
13.
go back to reference Raghu A, Komorowski M, Celi LA, Szolovits P, Ghassemi M. Continuous State-Space Models for Optimal Sepsis Treatment: A Deep Reinforcement Learning Approach. In: Machine Learning for Healthcare Conference. Cambridge: MIT Press: 2017. p. 147–63. Raghu A, Komorowski M, Celi LA, Szolovits P, Ghassemi M. Continuous State-Space Models for Optimal Sepsis Treatment: A Deep Reinforcement Learning Approach. In: Machine Learning for Healthcare Conference. Cambridge: MIT Press: 2017. p. 147–63.
14.
go back to reference Padmanabhan R, Meskin N, Haddad WM. Closed-loop control of anesthesia and mean arterial pressure using reinforcement learning. Biomed Signal Process Control. 2015; 22:54–64.CrossRef Padmanabhan R, Meskin N, Haddad WM. Closed-loop control of anesthesia and mean arterial pressure using reinforcement learning. Biomed Signal Process Control. 2015; 22:54–64.CrossRef
15.
go back to reference Padmanabhan R, Meskin N, Haddad WM. Optimal adaptive control of drug dosing using integral reinforcement learning. Math Biosci. 2019; 309:131–42.CrossRef Padmanabhan R, Meskin N, Haddad WM. Optimal adaptive control of drug dosing using integral reinforcement learning. Math Biosci. 2019; 309:131–42.CrossRef
16.
go back to reference Prasad N, Cheng LF, Chivers C, Draugelis M, Engelhardt BE. A reinforcement learning approach to weaning of mechanical ventilation in intensive care units. 2017. arXiv preprint arXiv:1704.06300. Prasad N, Cheng LF, Chivers C, Draugelis M, Engelhardt BE. A reinforcement learning approach to weaning of mechanical ventilation in intensive care units. 2017. arXiv preprint arXiv:1704.06300.
17.
go back to reference Utomo CP, Li X, Chen W. Treatment Recommendation in Critical Care: A Scalable and Interpretable Approach in Partially Observable Health States. In: 39th International Conference on Information Systems. New York: Curran Associates: 2018. p. 1–9. Utomo CP, Li X, Chen W. Treatment Recommendation in Critical Care: A Scalable and Interpretable Approach in Partially Observable Health States. In: 39th International Conference on Information Systems. New York: Curran Associates: 2018. p. 1–9.
18.
go back to reference Nemati S, Ghassemi MM, Clifford GD. Optimal medication dosing from suboptimal clinical examples: A deep reinforcement learning approach. In: 2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). New York: IEEE: 2016. p. 2978–81. Nemati S, Ghassemi MM, Clifford GD. Optimal medication dosing from suboptimal clinical examples: A deep reinforcement learning approach. In: 2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). New York: IEEE: 2016. p. 2978–81.
19.
go back to reference Yu C, Liu J, Zhao H. Inverse reinforcement learning for intelligent mechanical ventilation and sedative dosing in intensive care units. BMC Med Inform Decis Making. 2019; 19(2):57.CrossRef Yu C, Liu J, Zhao H. Inverse reinforcement learning for intelligent mechanical ventilation and sedative dosing in intensive care units. BMC Med Inform Decis Making. 2019; 19(2):57.CrossRef
20.
go back to reference Chang CH, Mai M, Goldenberg A. Dynamic Measurement Scheduling for Event Forecasting using Deep RL. 2019. arXiv preprint arXiv:1901.09699. Chang CH, Mai M, Goldenberg A. Dynamic Measurement Scheduling for Event Forecasting using Deep RL. 2019. arXiv preprint arXiv:1901.09699.
21.
go back to reference Johnson AE, Pollard TJ, Shen L, Li-wei HL, Feng M, Ghassemi M, Moody B, Szolovits P, Celi LA, Mark RG. MIMIC-III, a freely accessible critical care database. Sci Data. 2016; 3:160035.CrossRef Johnson AE, Pollard TJ, Shen L, Li-wei HL, Feng M, Ghassemi M, Moody B, Szolovits P, Celi LA, Mark RG. MIMIC-III, a freely accessible critical care database. Sci Data. 2016; 3:160035.CrossRef
22.
go back to reference Shawe-Taylor J, Cristianini N. Support vector machines. An Introduction to Support Vector Machines and Other Kernel-based Learning Methods. United Kingdom: Cambridge university press; 2000, pp. 93–112. Shawe-Taylor J, Cristianini N. Support vector machines. An Introduction to Support Vector Machines and Other Kernel-based Learning Methods. United Kingdom: Cambridge university press; 2000, pp. 93–112.
23.
go back to reference Si J, Barto AG, Powell WB, Wunsch D. Supervised actor-critic reinforcement learning. In: Handbook of learning and approximate dynamic programming. London: IEEE Press: 2004. p. 359–80.CrossRef Si J, Barto AG, Powell WB, Wunsch D. Supervised actor-critic reinforcement learning. In: Handbook of learning and approximate dynamic programming. London: IEEE Press: 2004. p. 359–80.CrossRef
24.
go back to reference Zinkevich M, Weimer M, Li L, Smola AJ. Parallelized stochastic gradient descent. In: Advances in neural information processing system. Cambridge: MIT Press: 2010. p. 2595–603. Zinkevich M, Weimer M, Li L, Smola AJ. Parallelized stochastic gradient descent. In: Advances in neural information processing system. Cambridge: MIT Press: 2010. p. 2595–603.
25.
go back to reference Golson S. One-hot state machine design for FPGAs. In: Proc. 3rd Annual PLD Design Conference & Exhibit, vol. 1. New York: IEEE: 1993. Golson S. One-hot state machine design for FPGAs. In: Proc. 3rd Annual PLD Design Conference & Exhibit, vol. 1. New York: IEEE: 1993.
Metadata
Title
Supervised-actor-critic reinforcement learning for intelligent mechanical ventilation and sedative dosing in intensive care units
Authors
Chao Yu
Guoqi Ren
Yinzhao Dong
Publication date
01-07-2020
Publisher
BioMed Central
Keyword
Care
DOI
https://doi.org/10.1186/s12911-020-1120-5

Other articles of this Special Issue 3/2020

BMC Medical Informatics and Decision Making 3/2020 Go to the issue