Skip to main content
Top
Published in: Journal of Medical Systems 1/2023

Open Access 01-12-2023 | Artificial Intelligence | Original Paper

“ChatGPT, Can You Help Me Save My Child’s Life?” - Diagnostic Accuracy and Supportive Capabilities to Lay Rescuers by ChatGPT in Prehospital Basic Life Support and Paediatric Advanced Life Support Cases – An In-silico Analysis

Authors: Stefan Bushuven, Michael Bentele, Stefanie Bentele, Bianka Gerber, Joachim Bansbach, Julian Ganter, Milena Trifunovic-Koenig, Robert Ranisch

Published in: Journal of Medical Systems | Issue 1/2023

Login to get access

Abstract

Background

Paediatric emergencies are challenging for healthcare workers, first aiders, and parents waiting for emergency medical services to arrive. With the expected rise of virtual assistants, people will likely seek help from such digital AI tools, especially in regions lacking emergency medical services. Large Language Models like ChatGPT proved effective in providing health-related information and are competent in medical exams but are questioned regarding patient safety. Currently, there is no information on ChatGPT’s performance in supporting parents in paediatric emergencies requiring help from emergency medical services. This study aimed to test 20 paediatric and two basic life support case vignettes for ChatGPT and GPT-4 performance and safety in children.

Methods

We provided the cases three times each to two models, ChatGPT and GPT-4, and assessed the diagnostic accuracy, emergency call advice, and the validity of advice given to parents.

Results

Both models recognized the emergency in the cases, except for septic shock and pulmonary embolism, and identified the correct diagnosis in 94%. However, ChatGPT/GPT-4 reliably advised to call emergency services only in 12 of 22 cases (54%), gave correct first aid instructions in 9 cases (45%) and incorrectly advised advanced life support techniques to parents in 3 of 22 cases (13.6%).

Conclusion

Considering these results of the recent ChatGPT versions, the validity, reliability and thus safety of ChatGPT/GPT-4 as an emergency support tool is questionable. However, whether humans would perform better in the same situation is uncertain. Moreover, other studies have shown that human emergency call operators are also inaccurate, partly with worse performance than ChatGPT/GPT-4 in our study. However, one of the main limitations of the study is that we used prototypical cases, and the management may differ from urban to rural areas and between different countries, indicating the need for further evaluation of the context sensitivity and adaptability of the model. Nevertheless, ChatGPT and the new versions under development may be promising tools for assisting lay first responders, operators, and professionals in diagnosing a paediatric emergency.

Trial registration

Not applicable.
Appendix
Available only for authorised users
Literature
1.
go back to reference Vadakkencherry Ramaswamy, V., et al., A comparative evaluation and appraisal of 2020 American Heart Association and 2021 European Resuscitation Council neonatal resuscitation guidelines. Resuscitation, 2021. 167: p. 151–159.CrossRefPubMed Vadakkencherry Ramaswamy, V., et al., A comparative evaluation and appraisal of 2020 American Heart Association and 2021 European Resuscitation Council neonatal resuscitation guidelines. Resuscitation, 2021. 167: p. 151–159.CrossRefPubMed
2.
go back to reference Eiche, C., et al., Job Satisfaction and Performance Orientation of Paramedics in German Emergency Medical Services-A Nationwide Survey. Int J Environ Res Public Health, 2021. 18(23). Eiche, C., et al., Job Satisfaction and Performance Orientation of Paramedics in German Emergency Medical Services-A Nationwide Survey. Int J Environ Res Public Health, 2021. 18(23).
3.
go back to reference AHA, Pediatric Advanced Life Support Instructor Manual, International English eBook edition. 2020, American Heart Association. AHA, Pediatric Advanced Life Support Instructor Manual, International English eBook edition. 2020, American Heart Association.
4.
go back to reference Mirzaei, A., et al., Predictors of Health Information-Seeking Behavior: Systematic Literature Review and Network Analysis. J Med Internet Res, 2021. 23(7): p. e21680.CrossRefPubMedPubMedCentral Mirzaei, A., et al., Predictors of Health Information-Seeking Behavior: Systematic Literature Review and Network Analysis. J Med Internet Res, 2021. 23(7): p. e21680.CrossRefPubMedPubMedCentral
5.
go back to reference Klasnja, P. and W. Pratt, Healthcare in the pocket: mapping the space of mobile-phone health interventions. J Biomed Inform, 2012. 45(1): p. 184–98.CrossRefPubMed Klasnja, P. and W. Pratt, Healthcare in the pocket: mapping the space of mobile-phone health interventions. J Biomed Inform, 2012. 45(1): p. 184–98.CrossRefPubMed
6.
go back to reference Fraser, H.S., et al., Evaluation of Diagnostic and Triage Accuracy and Usability of a Symptom Checker in an Emergency Department: Observational Study. JMIR mHealth and uHealth, 2022. 10(9): p. e38364. Fraser, H.S., et al., Evaluation of Diagnostic and Triage Accuracy and Usability of a Symptom Checker in an Emergency Department: Observational Study. JMIR mHealth and uHealth, 2022. 10(9): p. e38364.
7.
go back to reference Grundy, Q., A Review of the Quality and Impact of Mobile Health Apps. Annual Review of Public Health, 2022. 43(1): p. 117–134.CrossRefPubMed Grundy, Q., A Review of the Quality and Impact of Mobile Health Apps. Annual Review of Public Health, 2022. 43(1): p. 117–134.CrossRefPubMed
8.
go back to reference Metelmann, B., et al., Medical Correctness and User Friendliness of Available Apps for Cardiopulmonary Resuscitation: Systematic Search Combined With Guideline Adherence and Usability Evaluation. JMIR Mhealth Uhealth, 2018. 6(11): p. e190. Metelmann, B., et al., Medical Correctness and User Friendliness of Available Apps for Cardiopulmonary Resuscitation: Systematic Search Combined With Guideline Adherence and Usability Evaluation. JMIR Mhealth Uhealth, 2018. 6(11): p. e190.
9.
go back to reference Semigran, H.L., et al., Evaluation of symptom checkers for self diagnosis and triage: audit study. bmj, 2015. 351. Semigran, H.L., et al., Evaluation of symptom checkers for self diagnosis and triage: audit study. bmj, 2015. 351.
10.
go back to reference Schmieding, M.L., et al., Triage accuracy of symptom checker apps: 5-year follow-up evaluation. Journal of Medical Internet Research, 2022. 24(5): p. e31810.CrossRefPubMedPubMedCentral Schmieding, M.L., et al., Triage accuracy of symptom checker apps: 5-year follow-up evaluation. Journal of Medical Internet Research, 2022. 24(5): p. e31810.CrossRefPubMedPubMedCentral
11.
12.
go back to reference Wallace, W., et al., The diagnostic and triage accuracy of digital and online symptom checker tools: a systematic review. NPJ Digital Medicine, 2022. 5(1): p. 118.CrossRefPubMedPubMedCentral Wallace, W., et al., The diagnostic and triage accuracy of digital and online symptom checker tools: a systematic review. NPJ Digital Medicine, 2022. 5(1): p. 118.CrossRefPubMedPubMedCentral
13.
go back to reference Moor, M., et al., Foundation models for generalist medical artificial intelligence. Nature, 2023. 616(7956): p. 259–265.CrossRefPubMed Moor, M., et al., Foundation models for generalist medical artificial intelligence. Nature, 2023. 616(7956): p. 259–265.CrossRefPubMed
14.
go back to reference Lee, P., S. Bubeck, and J. Petro, Benefits, Limits, and Risks of GPT-4 as an AI Chatbot for Medicine. N Engl J Med, 2023. 388(13): p. 1233–1239.CrossRefPubMed Lee, P., S. Bubeck, and J. Petro, Benefits, Limits, and Risks of GPT-4 as an AI Chatbot for Medicine. N Engl J Med, 2023. 388(13): p. 1233–1239.CrossRefPubMed
15.
go back to reference Li, J., et al., ChatGPT in Healthcare: A Taxonomy and Systematic Review medRxiv, 2023: p. 2023.03. 30.23287899. Li, J., et al., ChatGPT in Healthcare: A Taxonomy and Systematic Review medRxiv, 2023: p. 2023.03. 30.23287899.
16.
go back to reference Kung, T.H., et al., Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models. PLoS digital health, 2023. 2(2): p. e0000198.CrossRefPubMedPubMedCentral Kung, T.H., et al., Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models. PLoS digital health, 2023. 2(2): p. e0000198.CrossRefPubMedPubMedCentral
17.
go back to reference Gilson, A., et al., How does CHATGPT perform on the United States Medical Licensing Examination? the implications of large language models for medical education and knowledge assessment. JMIR Medical Education, 2023. 9(1): p. e45312.CrossRefPubMedPubMedCentral Gilson, A., et al., How does CHATGPT perform on the United States Medical Licensing Examination? the implications of large language models for medical education and knowledge assessment. JMIR Medical Education, 2023. 9(1): p. e45312.CrossRefPubMedPubMedCentral
18.
go back to reference Fijačko, N., et al., Can ChatGPT pass the life support exams without entering the American heart association course? Resuscitation, 2023. 185. Fijačko, N., et al., Can ChatGPT pass the life support exams without entering the American heart association course? Resuscitation, 2023. 185.
19.
go back to reference Lee, P., Goldberg, C., & Kohane, I., The AI Revolution in Medicine: GPT-4 and Beyond Vol. 1. 2023: Pearson. Lee, P., Goldberg, C., & Kohane, I., The AI Revolution in Medicine: GPT-4 and Beyond Vol. 1. 2023: Pearson.
20.
go back to reference Rao, A., et al., Assessing the Utility of ChatGPT Throughout the Entire Clinical Workflow: Development and Usability Study. J Med Internet Res, 2023. 25: p. e48659.CrossRefPubMedPubMedCentral Rao, A., et al., Assessing the Utility of ChatGPT Throughout the Entire Clinical Workflow: Development and Usability Study. J Med Internet Res, 2023. 25: p. e48659.CrossRefPubMedPubMedCentral
22.
go back to reference Levine, D.M., et al., The Diagnostic and Triage Accuracy of the GPT-3 Artificial Intelligence Model medRxiv, 2023: p. 2023.01. 30.23285067. Levine, D.M., et al., The Diagnostic and Triage Accuracy of the GPT-3 Artificial Intelligence Model medRxiv, 2023: p. 2023.01. 30.23285067.
23.
go back to reference Ayers, J.W., et al., Comparing Physician and Artificial Intelligence Chatbot Responses to Patient Questions Posted to a Public Social Media Forum. JAMA Intern Med, 2023. Ayers, J.W., et al., Comparing Physician and Artificial Intelligence Chatbot Responses to Patient Questions Posted to a Public Social Media Forum. JAMA Intern Med, 2023.
25.
go back to reference Sallam, M., ChatGPT Utility in Healthcare Education, Research, and Practice: Systematic Review on the Promising Perspectives and Valid Concerns. Healthcare (Basel), 2023. 11(6). Sallam, M., ChatGPT Utility in Healthcare Education, Research, and Practice: Systematic Review on the Promising Perspectives and Valid Concerns. Healthcare (Basel), 2023. 11(6).
27.
go back to reference Bender, E.M., et al. On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?? in Proceedings of the 2021 ACM conference on fairness, accountability, and transparency. 2021. Bender, E.M., et al. On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?? in Proceedings of the 2021 ACM conference on fairness, accountability, and transparency. 2021.
28.
go back to reference Weidinger, L., et al. Taxonomy of risks posed by language models. in 2022 ACM Conference on Fairness, Accountability, and Transparency. 2022. Weidinger, L., et al. Taxonomy of risks posed by language models. in 2022 ACM Conference on Fairness, Accountability, and Transparency. 2022.
29.
go back to reference Li, H., et al., Ethics of large language models in medicine and medical research. Lancet Digit Health, 2023. Li, H., et al., Ethics of large language models in medicine and medical research. Lancet Digit Health, 2023.
30.
go back to reference Haltaufderheide, J. and R. Ranisch, Tools, Agents or Something Different? – The Importance of Techno- Philosophical Premises in Analyzing Health Technology. American Journal of Bioethics, 2023. 23(5): p. 19–22.CrossRef Haltaufderheide, J. and R. Ranisch, Tools, Agents or Something Different? – The Importance of Techno- Philosophical Premises in Analyzing Health Technology. American Journal of Bioethics, 2023. 23(5): p. 19–22.CrossRef
31.
go back to reference Müller, R., et al., Ethical, legal, and social aspects of symptom checker applications: a scoping review. Medicine, Health Care and Philosophy, 2022. 25(4): p. 737–755.CrossRefPubMed Müller, R., et al., Ethical, legal, and social aspects of symptom checker applications: a scoping review. Medicine, Health Care and Philosophy, 2022. 25(4): p. 737–755.CrossRefPubMed
32.
go back to reference Born, J., et al., Great Help for Small People - The Development of a Children’s Emergency App. Stud Health Technol Inform, 2020. 270: p. 1297–1298.PubMed Born, J., et al., Great Help for Small People - The Development of a Children’s Emergency App. Stud Health Technol Inform, 2020. 270: p. 1297–1298.PubMed
33.
go back to reference Rose, C., et al., Utilizing Lean Software Methods To Improve Acceptance of Global eHealth Initiatives: Results From the Implementation of the Basic Emergency Care App. JMIR Form Res, 2021. 5(5): p. e14851.CrossRefPubMedPubMedCentral Rose, C., et al., Utilizing Lean Software Methods To Improve Acceptance of Global eHealth Initiatives: Results From the Implementation of the Basic Emergency Care App. JMIR Form Res, 2021. 5(5): p. e14851.CrossRefPubMedPubMedCentral
34.
go back to reference Gálvez, J.A., et al., Interactive pediatric emergency checklists to the palm of your hand - How the Pedi Crisis App traveled around the world. Paediatr Anaesth, 2017. 27(8): p. 835–840.CrossRefPubMed Gálvez, J.A., et al., Interactive pediatric emergency checklists to the palm of your hand - How the Pedi Crisis App traveled around the world. Paediatr Anaesth, 2017. 27(8): p. 835–840.CrossRefPubMed
35.
go back to reference Siebert, J.N., et al., Effect of a Mobile App on Prehospital Medication Errors During Simulated Pediatric Resuscitation: A Randomized Clinical Trial. JAMA Netw Open, 2021. 4(8): p. e2123007.CrossRefPubMedPubMedCentral Siebert, J.N., et al., Effect of a Mobile App on Prehospital Medication Errors During Simulated Pediatric Resuscitation: A Randomized Clinical Trial. JAMA Netw Open, 2021. 4(8): p. e2123007.CrossRefPubMedPubMedCentral
36.
go back to reference Doucet, L., et al., App-based learning as an alternative for instructors in teaching basic life support to school children: a randomized control trial. Acta Clin Belg, 2019. 74(5): p. 317–325.CrossRefPubMed Doucet, L., et al., App-based learning as an alternative for instructors in teaching basic life support to school children: a randomized control trial. Acta Clin Belg, 2019. 74(5): p. 317–325.CrossRefPubMed
37.
go back to reference Johnson, D., et al., Assessing the Accuracy and Reliability of AI-Generated Medical Responses: An Evaluation of the Chat-GPT Model. Res Sq, 2023. Johnson, D., et al., Assessing the Accuracy and Reliability of AI-Generated Medical Responses: An Evaluation of the Chat-GPT Model. Res Sq, 2023.
38.
go back to reference Gräsner, J.T., et al., Survival after out-of-hospital cardiac arrest in Europe - Results of the EuReCa TWO study. Resuscitation, 2020. 148: p. 218–226.CrossRefPubMed Gräsner, J.T., et al., Survival after out-of-hospital cardiac arrest in Europe - Results of the EuReCa TWO study. Resuscitation, 2020. 148: p. 218–226.CrossRefPubMed
39.
go back to reference Morrison, A.K., A. Glick, and H.S. Yin, Health Literacy: Implications for Child Health. Pediatr Rev, 2019. 40(6): p. 263–277.CrossRefPubMed Morrison, A.K., A. Glick, and H.S. Yin, Health Literacy: Implications for Child Health. Pediatr Rev, 2019. 40(6): p. 263–277.CrossRefPubMed
40.
go back to reference DeWalt, D.A. and A. Hink, Health literacy and child health outcomes: a systematic review of the literature. Pediatrics, 2009. 124 Suppl 3: p. S265-74.PubMed DeWalt, D.A. and A. Hink, Health literacy and child health outcomes: a systematic review of the literature. Pediatrics, 2009. 124 Suppl 3: p. S265-74.PubMed
41.
go back to reference Sanders, L.M., et al., Literacy and child health: a systematic review. Arch Pediatr Adolesc Med, 2009. 163(2): p. 131–40.CrossRefPubMed Sanders, L.M., et al., Literacy and child health: a systematic review. Arch Pediatr Adolesc Med, 2009. 163(2): p. 131–40.CrossRefPubMed
42.
go back to reference Morreel, S., et al., How accurate is telephone triage in out-of-hours care? An observational trial in real patients. Acta Clin Belg, 2022. 77(2): p. 301–306.CrossRefPubMed Morreel, S., et al., How accurate is telephone triage in out-of-hours care? An observational trial in real patients. Acta Clin Belg, 2022. 77(2): p. 301–306.CrossRefPubMed
43.
go back to reference Giesen, P., et al., Safety of telephone triage in general practitioner cooperatives: do triage nurses correctly estimate urgency? Qual Saf Health Care, 2007. 16(3): p. 181–4.CrossRefPubMedPubMedCentral Giesen, P., et al., Safety of telephone triage in general practitioner cooperatives: do triage nurses correctly estimate urgency? Qual Saf Health Care, 2007. 16(3): p. 181–4.CrossRefPubMedPubMedCentral
44.
45.
go back to reference Meischke, H.W., et al., The effect of language barriers on dispatching EMS response. Prehosp Emerg Care, 2013. 17(4): p. 475–80.CrossRefPubMed Meischke, H.W., et al., The effect of language barriers on dispatching EMS response. Prehosp Emerg Care, 2013. 17(4): p. 475–80.CrossRefPubMed
46.
go back to reference Hagendorff, T., Machine Psychology: Investigating Emergent Capabilities and Behavior in Large Language Models Using Psychological Methods arXiv preprint arXiv:2303.13988, 2023. Hagendorff, T., Machine Psychology: Investigating Emergent Capabilities and Behavior in Large Language Models Using Psychological Methods arXiv preprint arXiv:2303.13988, 2023.
47.
go back to reference Brown, T., et al., Language models are few-shot learners. Advances in neural information processing systems, 2020. 33: p. 1877–1901. Brown, T., et al., Language models are few-shot learners. Advances in neural information processing systems, 2020. 33: p. 1877–1901.
48.
go back to reference Wei, J., et al., Chain of thought prompting elicits reasoning in large language models arXiv preprint arXiv:2201.11903, 2022. Wei, J., et al., Chain of thought prompting elicits reasoning in large language models arXiv preprint arXiv:2201.11903, 2022.
Metadata
Title
“ChatGPT, Can You Help Me Save My Child’s Life?” - Diagnostic Accuracy and Supportive Capabilities to Lay Rescuers by ChatGPT in Prehospital Basic Life Support and Paediatric Advanced Life Support Cases – An In-silico Analysis
Authors
Stefan Bushuven
Michael Bentele
Stefanie Bentele
Bianka Gerber
Joachim Bansbach
Julian Ganter
Milena Trifunovic-Koenig
Robert Ranisch
Publication date
01-12-2023
Publisher
Springer US
Published in
Journal of Medical Systems / Issue 1/2023
Print ISSN: 0148-5598
Electronic ISSN: 1573-689X
DOI
https://doi.org/10.1007/s10916-023-02019-x

Other articles of this Issue 1/2023

Journal of Medical Systems 1/2023 Go to the issue