Skip to main content
Top
Published in: Surgical Endoscopy 5/2024

17-04-2024 | Gastroesophageal Reflux Disease | SAGES/EAES Official Publication

The performance of artificial intelligence large language model-linked chatbots in surgical decision-making for gastroesophageal reflux disease

Authors: Bright Huo, Elisa Calabrese, Patricia Sylla, Sunjay Kumar, Romeo C. Ignacio, Rodolfo Oviedo, Imran Hassan, Bethany J. Slater, Andreas Kaiser, Danielle S. Walsh, Wesley Vosburg

Published in: Surgical Endoscopy | Issue 5/2024

Login to get access

Abstract

Background

Large language model (LLM)-linked chatbots may be an efficient source of clinical recommendations for healthcare providers and patients. This study evaluated the performance of LLM-linked chatbots in providing recommendations for the surgical management of gastroesophageal reflux disease (GERD).

Methods

Nine patient cases were created based on key questions addressed by the Society of American Gastrointestinal and Endoscopic Surgeons (SAGES) guidelines for the surgical treatment of GERD. ChatGPT-3.5, ChatGPT-4, Copilot, Google Bard, and Perplexity AI were queried on November 16th, 2023, for recommendations regarding the surgical management of GERD. Accurate chatbot performance was defined as the number of responses aligning with SAGES guideline recommendations. Outcomes were reported with counts and percentages.

Results

Surgeons were given accurate recommendations for the surgical management of GERD in an adult patient for 5/7 (71.4%) KQs by ChatGPT-4, 3/7 (42.9%) KQs by Copilot, 6/7 (85.7%) KQs by Google Bard, and 3/7 (42.9%) KQs by Perplexity according to the SAGES guidelines. Patients were given accurate recommendations for 3/5 (60.0%) KQs by ChatGPT-4, 2/5 (40.0%) KQs by Copilot, 4/5 (80.0%) KQs by Google Bard, and 1/5 (20.0%) KQs by Perplexity, respectively. In a pediatric patient, surgeons were given accurate recommendations for 2/3 (66.7%) KQs by ChatGPT-4, 3/3 (100.0%) KQs by Copilot, 3/3 (100.0%) KQs by Google Bard, and 2/3 (66.7%) KQs by Perplexity. Patients were given appropriate guidance for 2/2 (100.0%) KQs by ChatGPT-4, 2/2 (100.0%) KQs by Copilot, 1/2 (50.0%) KQs by Google Bard, and 1/2 (50.0%) KQs by Perplexity.

Conclusions

Gastrointestinal surgeons, gastroenterologists, and patients should recognize both the promise and pitfalls of LLM’s when utilized for advice on surgical management of GERD. Additional training of LLM’s using evidence-based health information is needed.
Appendix
Available only for authorised users
Literature
1.
go back to reference Meyer JG, Urbanowicz RJ, Martin PCN, O’Connor K, Li R, Peng PC, Bright TJ, Tatonetti N, Won KJ, Gonzalez-Hernandez G, Moore JH (2023) ChatGPT and large language models in academia: opportunities and challenges. BioData Min 16:1–11CrossRef Meyer JG, Urbanowicz RJ, Martin PCN, O’Connor K, Li R, Peng PC, Bright TJ, Tatonetti N, Won KJ, Gonzalez-Hernandez G, Moore JH (2023) ChatGPT and large language models in academia: opportunities and challenges. BioData Min 16:1–11CrossRef
2.
go back to reference Thirunavukarasu AJ, Ting DSJ, Elangovan K, Gutierrez L, Tan TF, Ting DSW (2023) Large language models in medicine. Nat Med 29:1930–1940CrossRefPubMed Thirunavukarasu AJ, Ting DSJ, Elangovan K, Gutierrez L, Tan TF, Ting DSW (2023) Large language models in medicine. Nat Med 29:1930–1940CrossRefPubMed
4.
go back to reference Dave T, Athaluri SA, Singh S (2023) ChatGPT in medicine: an overview of its applications, advantages, limitations, future prospects, and ethical considerations. Front Artif Intell 6:1–5CrossRef Dave T, Athaluri SA, Singh S (2023) ChatGPT in medicine: an overview of its applications, advantages, limitations, future prospects, and ethical considerations. Front Artif Intell 6:1–5CrossRef
6.
go back to reference Lee TC, Staller K, Botoman V, Pathipati MP, Varma S, Kuo B (2023) ChatGPT answers common patient questions about colonoscopy. Gastroenterology 165:509-511.e7CrossRefPubMed Lee TC, Staller K, Botoman V, Pathipati MP, Varma S, Kuo B (2023) ChatGPT answers common patient questions about colonoscopy. Gastroenterology 165:509-511.e7CrossRefPubMed
9.
go back to reference Wang C, Liu S, Yang H, Guo J, Wu Y, Liu J (2023) Ethical considerations of using ChatGPT in health care. J Med Internet Res 25:1–9CrossRef Wang C, Liu S, Yang H, Guo J, Wu Y, Liu J (2023) Ethical considerations of using ChatGPT in health care. J Med Internet Res 25:1–9CrossRef
10.
11.
13.
go back to reference Huo B, Cacciamani GE, Collins GS, McKechnie T, Lee Y, Guyatt G (2023) Reporting standards for the use of large language model-linked chatbots for health advice. Nat Med 29:1CrossRef Huo B, Cacciamani GE, Collins GS, McKechnie T, Lee Y, Guyatt G (2023) Reporting standards for the use of large language model-linked chatbots for health advice. Nat Med 29:1CrossRef
14.
go back to reference El-Serag HB, Sweet S, Winchester CC, Dent J (2014) Update on the epidemiology of gastro-oesophageal reflux disease: a systematic review. Gut 63:871–880CrossRefPubMed El-Serag HB, Sweet S, Winchester CC, Dent J (2014) Update on the epidemiology of gastro-oesophageal reflux disease: a systematic review. Gut 63:871–880CrossRefPubMed
15.
go back to reference Henson JB, Glissen Brown JR, Lee JP, Patel A, Leiman DA (2023) Evaluation of the potential utility of an artificial intelligence chatbot in gastroesophageal reflux disease management. Am J Gastroenterol 118:1–4CrossRef Henson JB, Glissen Brown JR, Lee JP, Patel A, Leiman DA (2023) Evaluation of the potential utility of an artificial intelligence chatbot in gastroesophageal reflux disease management. Am J Gastroenterol 118:1–4CrossRef
16.
go back to reference Slater BJ, Dirks RC, McKinley SK, Ansari MT, Kohn GP, Thosani N, Qumseya B, Billmeier S, Daly S, Crawford C, Ehlers PA, Hollands C, Palazzo F, Rodriguez N, Train A, Wassenaar E, Walsh D, Pryor AD, Stefanidis D (2021) SAGES guidelines for the surgical treatment of gastroesophageal reflux (GERD). Surg Endosc 35:4903–4917. https://doi.org/10.1007/s00464-021-08625-5CrossRefPubMed Slater BJ, Dirks RC, McKinley SK, Ansari MT, Kohn GP, Thosani N, Qumseya B, Billmeier S, Daly S, Crawford C, Ehlers PA, Hollands C, Palazzo F, Rodriguez N, Train A, Wassenaar E, Walsh D, Pryor AD, Stefanidis D (2021) SAGES guidelines for the surgical treatment of gastroesophageal reflux (GERD). Surg Endosc 35:4903–4917. https://​doi.​org/​10.​1007/​s00464-021-08625-5CrossRefPubMed
18.
go back to reference Sachs GF, Ourshalimian S, Jensen AR, Kelley-Quon LI, Padilla BE, Shew SB, Lofberg KM, Smith CA, Roach JP, Pandya SR, Russell KW, Ignacio RC (2023) Machine learning to predict pediatric choledocholithiasis: a western pediatric surgery research consortium retrospective study. Surgery 174:934–939CrossRef Sachs GF, Ourshalimian S, Jensen AR, Kelley-Quon LI, Padilla BE, Shew SB, Lofberg KM, Smith CA, Roach JP, Pandya SR, Russell KW, Ignacio RC (2023) Machine learning to predict pediatric choledocholithiasis: a western pediatric surgery research consortium retrospective study. Surgery 174:934–939CrossRef
19.
go back to reference Marcinkevičs R, Wolfertstetter PR, Klimiene U, Chin-Cheong K, Paschke A, Zerres J, Denzinger M, Niederberger D, Wellmann S, Ozkan E, Knorr C, Vogt JE (2024) Interpretable and intervenable ultrasonography-based machine learning models for pediatric appendicitis. Med Image Anal 91:103042. https://doi.org/10.5281/zenodo.7CrossRefPubMed Marcinkevičs R, Wolfertstetter PR, Klimiene U, Chin-Cheong K, Paschke A, Zerres J, Denzinger M, Niederberger D, Wellmann S, Ozkan E, Knorr C, Vogt JE (2024) Interpretable and intervenable ultrasonography-based machine learning models for pediatric appendicitis. Med Image Anal 91:103042. https://​doi.​org/​10.​5281/​zenodo.​7CrossRefPubMed
23.
go back to reference Bowman SR (2023) Eight things to know about large language models. arXiv 1–16. Bowman SR (2023) Eight things to know about large language models. arXiv 1–16.
24.
go back to reference Eysenbach G (2023) The role of ChatGPT, generative language models, and artificial intelligence in medical education: a conversation with ChatGPT and a call for papers. JMIR Med Educ 9:1–13CrossRef Eysenbach G (2023) The role of ChatGPT, generative language models, and artificial intelligence in medical education: a conversation with ChatGPT and a call for papers. JMIR Med Educ 9:1–13CrossRef
28.
go back to reference Lee K, Hoti K, Hughes JD, Emmerton L (2014) Dr google and the consumer: a qualitative study exploring the navigational needs and online health information-seeking behaviors of consumers with chronic health conditions. J Med Internet Res 16:1–14. https://doi.org/10.2196/jmir.3706CrossRef Lee K, Hoti K, Hughes JD, Emmerton L (2014) Dr google and the consumer: a qualitative study exploring the navigational needs and online health information-seeking behaviors of consumers with chronic health conditions. J Med Internet Res 16:1–14. https://​doi.​org/​10.​2196/​jmir.​3706CrossRef
29.
go back to reference Ayoub NF, Lee Y-J, Grimm D, Balakrishnan K (2023) Comparison between ChatGPT and google search as sources of postoperative patient instructions. JAMA Otolaryngol Head Neck Surg 149:555–556CrossRef Ayoub NF, Lee Y-J, Grimm D, Balakrishnan K (2023) Comparison between ChatGPT and google search as sources of postoperative patient instructions. JAMA Otolaryngol Head Neck Surg 149:555–556CrossRef
Metadata
Title
The performance of artificial intelligence large language model-linked chatbots in surgical decision-making for gastroesophageal reflux disease
Authors
Bright Huo
Elisa Calabrese
Patricia Sylla
Sunjay Kumar
Romeo C. Ignacio
Rodolfo Oviedo
Imran Hassan
Bethany J. Slater
Andreas Kaiser
Danielle S. Walsh
Wesley Vosburg
Publication date
17-04-2024
Publisher
Springer US
Published in
Surgical Endoscopy / Issue 5/2024
Print ISSN: 0930-2794
Electronic ISSN: 1432-2218
DOI
https://doi.org/10.1007/s00464-024-10807-w

Other articles of this Issue 5/2024

Surgical Endoscopy 5/2024 Go to the issue