Top

Surgical Endoscopy

Published in:

Open Access 12-03-2024 | Bariatric Surgery

Large language models and bariatric surgery patient education: a comparative readability analysis of GPT-3.5, GPT-4, Bard, and online institutional resources

Authors: Nitin Srinivasan, Jamil S. Samaan, Nithya D. Rajeev, Mmerobasi U. Kanu, Yee Hui Yeo, Kamran Samakar

Published in: Surgical Endoscopy | Issue 5/2024

Abstract

Background

The readability of online bariatric surgery patient education materials (PEMs) often surpasses the recommended 6th grade level. Large language models (LLMs), like ChatGPT and Bard, have the potential to revolutionize PEM delivery. We aimed to evaluate the readability of PEMs produced by U.S. medical institutions compared to LLMs, as well as the ability of LLMs to simplify their responses.

Methods

Responses to frequently asked questions (FAQs) related to bariatric surgery were gathered from top-ranked health institutions. FAQ responses were also generated from GPT-3.5, GPT-4, and Bard. LLMs were then prompted to improve the readability of their initial responses. The readability of institutional responses, initial LLM responses, and simplified LLM responses were graded using validated readability formulas. Accuracy and comprehensiveness of initial and simplified LLM responses were also compared.

Results

Responses to 66 FAQs were included. All institutional and initial LLM responses had poor readability, with average reading levels ranging from 9th grade to college graduate. Simplified responses from LLMs had significantly improved readability, with reading levels ranging from 6th grade to college freshman. When comparing simplified LLM responses, GPT-4 responses demonstrated the highest readability, with reading levels ranging from 6th to 9th grade. Accuracy was similar between initial and simplified responses from all LLMs. Comprehensiveness was similar between initial and simplified responses from GPT-3.5 and GPT-4. However, 34.8% of Bard's simplified responses were graded as less comprehensive compared to initial.

Conclusion

Our study highlights the efficacy of LLMs in enhancing the readability of bariatric surgery PEMs. GPT-4 outperformed other models, generating simplified PEMs from 6th to 9th grade reading levels. Unlike GPT-3.5 and GPT-4, Bard’s simplified responses were graded as less comprehensive. We advocate for future studies examining the potential role of LLMs as dynamic and personalized sources of PEMs for diverse patient populations of all literacy levels.

Available only for authorised users

Buchwald H, Williams SE (2004) Bariatric surgery worldwide 2003. Obes Surg 14(9):1157–1164. https://doi.org/10.1381/0960892042387057CrossRefPubMed

Christou NV, Sampalis JS, Liberman M et al (2004) Surgery decreases long-term mortality, morbidity, and health care use in morbidly obese patients. Ann Surg 240(3):416–423. https://doi.org/10.1097/01.sla.0000137343.63376.19. (discussion 423–424)CrossRefPubMedPubMedCentral

Pories WJ, Swanson MS, MacDonald KG et al (1995) Who would have thought it? An operation proves to be the most effective therapy for adult-onset diabetes mellitus. Ann Surg 222(3):339–350. https://doi.org/10.1097/00000658-199509000-00011. (discussion 350–352)CrossRefPubMedPubMedCentral

Sjöström L, Lindroos AK, Peltonen M et al (2004) Lifestyle, diabetes, and cardiovascular risk factors 10 years after bariatric surgery. N Engl J Med. 351(26):2683–2693. https://doi.org/10.1056/NEJMoa035622CrossRefPubMed

Martin M, Beekley A, Kjorstad R, Sebesta J (2010) Socioeconomic disparities in eligibility and access to bariatric surgery: a national population-based analysis. Surg Obes Relat Dis 6(1):8–15. https://doi.org/10.1016/j.soard.2009.07.003CrossRefPubMed

Rajeev ND, Samaan JS, Premkumar A, Srinivasan N, Yu E, Samakar K (2023) Patient and the public’s perceptions of bariatric surgery: a systematic review. J Surg Res 283:385–406. https://doi.org/10.1016/j.jss.2022.10.061CrossRefPubMed

Mahoney ST, Strassle PD, Farrell TM, Duke MC (2019) Does lower level of education and health literacy affect successful outcomes in bariatric surgery? J Laparoendosc Adv Surg Tech A 29(8):1011–1015. https://doi.org/10.1089/lap.2018.0806CrossRefPubMed

Erdogdu UE, Cayci HM, Tardu A, Demirci H, Kisakol G, Guclu M (2019) Health literacy and weight loss after bariatric surgery. Obes Surg 29(12):3948–3953. https://doi.org/10.1007/s11695-019-04060-7CrossRefPubMed

Miller-Matero LR, Hecht L, Patel S, Martens KM, Hamann A, Carlin AM (2021) The influence of health literacy and health numeracy on weight loss outcomes following bariatric surgery. Surg Obes Relat Dis 17(2):384–389. https://doi.org/10.1016/j.soard.2020.09.021CrossRefPubMed

10.

Pew Research Center (2009) The social life of health information. https://www.pewresearch.org/internet/2009/06/11/the-social-life-of-health-information/. Accessed August 10, 2023

11.

Makar B, Quilliot D, Zarnegar R et al (2008) What is the quality of information about bariatric surgery on the internet? Obes Surg 18(11):1455–1459. https://doi.org/10.1007/s11695-008-9507-xCrossRefPubMed

12.

Paolino L, Genser L, Fritsch S, De’ Angelis N, Azoulay D, Lazzati A (2015) The web-surfing bariatic patient: the role of the internet in the decision-making process. Obes Surg 25(4):738–743. https://doi.org/10.1007/s11695-015-1578-xCrossRefPubMed

13.

Weiss BD (2003) Health literacy: a manual for clinicians. American Medical Association. http://lib.ncfh.org/pdfs/6617.pdf. Accessed July 29, 2023

14.

Hansberry DR, Agarwal N, Shah R et al (2014) Analysis of the readability of patient education materials from surgical subspecialties. Laryngoscope 124(2):405–412. https://doi.org/10.1002/lary.24261CrossRefPubMed

15.

Lee KC, Berg ET, Jazayeri HE, Chuang SK, Eisig SB (2019) Online patient education materials for orthognathic surgery fail to meet readability and quality standards. J Oral Maxillofac Surg 77(1):180.e1-180.e8. https://doi.org/10.1016/j.joms.2018.08.033CrossRefPubMed

16.

Gutterman SA, Schroeder JN, Jacobson CE, Obeid NR, Suwanabol PA (2023) Examining the accessibility of online patient materials for bariatric surgery. Obes Surg 33(3):975–977. https://doi.org/10.1007/s11695-022-06440-yCrossRefPubMed

17.

Rouhi AD, Ghanem YK, Hoeltzel GD et al (2023) Quality and readability of online patient information on adolescent bariatric surgery. Obes Surg 33(1):397–399. https://doi.org/10.1007/s11695-022-06385-2CrossRefPubMed

18.

Daraz L, Morrow AS, Ponce OJ et al (2019) Can patients trust online health information? A meta-narrative systematic review addressing the quality of health information on the Internet. J Gen Intern Med 34(9):1884–1891. https://doi.org/10.1007/s11606-019-05109-0CrossRefPubMedPubMedCentral

19.

Meleo-Erwin Z, Basch C, Fera J, Ethan D, Garcia P (2019) Readability of online patient-based information on bariatric surgery. Health Promot Perspect 9(2):156–160. https://doi.org/10.15171/hpp.2019.22CrossRefPubMedPubMedCentral

20.

Mollman S (2022) ChatGPT gained 1 million users in under a week. Here’s why the AI chatbot is primed to disrupt search as we know it. Yahoo! Finance. https://finance.yahoo.com/news/chatgpt-gained-1-million-followers-224523258.html. Accessed August 8, 2023

21.

Carr DF (2023) ChatGPT growth flattened in May; Google Bard up 187%. Similarweb. https://www.similarweb.com/blog/insights/ai-news/chatgpt-bard/. Accessed August 8, 2023

22.

Sarraju A, Bruemmer D, Van Iterson E, Cho L, Rodriguez F, Laffin L (2023) Appropriateness of cardiovascular disease prevention recommendations obtained from a popular online chat-based artificial intelligence model. JAMA 329(10):842–844. https://doi.org/10.1001/jama.2023.1044CrossRefPubMedPubMedCentral

23.

Yeo YH, Samaan JS, Ng WH et al (2023) Assessing the performance of ChatGPT in answering questions regarding cirrhosis and hepatocellular carcinoma. Clin Mol Hepatol 29(3):721–732. https://doi.org/10.3350/cmh.2023.0089CrossRefPubMedPubMedCentral

24.

Samaan JS, Yeo YH, Rajeev N et al (2023) Assessing the accuracy of responses by the language model ChatGPT to questions regarding bariatric surgery. Obes Surg 33(6):1790–1796. https://doi.org/10.1007/s11695-023-06603-5CrossRefPubMedPubMedCentral

25.

U.S. News and World Report (n.d.) The best hospitals for gastroenterology and GI surgery. https://health.usnews.com/best-hospitals/rankings/gastroenterology-and-gi-surgery. Accessed July 25, 2023

26.

Harder N (2023) America’s best hospitals: the 2022-2023 honor roll and overview. US News & World Report. https://health.usnews.com/health-care/best-hospitals/articles/best-hospitals-honor-roll-and-overview. Accessed July 25, 2023

27.

Open AI (2022) Introducing ChatGPT. https://openai.com/blog/chatgpt. Accessed July 25, 2023

28.

Hsiao S, Collins E (2023) Try Bard and share your feedback. Google. https://blog.google/technology/ai/try-bard/. Accessed August 9, 2023

29.

OpenAI (n.d.) OpenAI platform. https://platform.openai.com. Accessed September 24, 2023

30.

OpenAI (2023) GPT-4 technical report. https://doi.org/10.48550/arXiv.2303.08774

31.

Ouyang L, Wu J, Jiang X et al (2022) Training language models to follow instructions with human feedback. Adv Neural Inf Process Syst. https://doi.org/10.48550/arXiv.2203.02155CrossRef

32.

Manyika J (2023) An overview of Bard: an early experiment with generative AI. Google

33.

Herbert AS, Nemirovsky A, Hess DS et al (2021) An evaluation of the readability and content-quality of pelvic organ prolapse YouTube transcripts. Urology 154:120–126. https://doi.org/10.1016/j.urology.2021.03.009CrossRefPubMed

34.

Fischer AE, Venter WDF, Collins S, Carman M, Lalla-Edward ST (2021) The readability of informed consent forms for research studies conducted in South Africa. South Afr Med J Suid-Afr Tydskr Vir Geneeskd 111(2):180–183. https://doi.org/10.7196/SAMJ.2021.v111i2.14752CrossRef

35.

O’Callaghan C, Rogan P, Brigo F, Rahilly J, Kinney M (2021) Readability of online sources of information regarding epilepsy surgery and its impact on decision-making processes. Epilepsy Behav 121(Pt A):108033. https://doi.org/10.1016/j.yebeh.2021.108033CrossRefPubMed

36.

Rayess H, Zuliani GF, Gupta A et al (2017) Critical analysis of the quality, readability, and technical aspects of online information provided for neck-lifts. JAMA Facial Plast Surg 19(2):115–120. https://doi.org/10.1001/jamafacial.2016.1219CrossRefPubMed

37.

Azer SA, Alsharafi AA (2023) Can pharmacy students use Wikipedia as a learning resource? Critical assessment of articles on chemotherapeutic drugs. Adv Physiol Educ 47(2):333–345. https://doi.org/10.1152/advan.00212.2022CrossRefPubMed

38.

Flesch R (2016) Guide to academic writing. University of Canterbury School of Business and Economics. https://web.archive.org/web/20160712094308/http://www.mang.canterbury.ac.nz/writing_guide/writing/flesch.shtml. Accessed July 26, 2023

39.

Gunning R (1969) The Fog Index after twenty years. https://doi.org/10.1177/002194366900600202. Accessed August 8, 2023

40.

Kincaid J, Fishburne R, Rogers R, Chissom B (1975) Derivation of new readability formulas (Automated Readability Index, Fog Count And Flesch Reading Ease Formula) for Navy enlisted personnel. Inst Simul Train. https://stars.library.ucf.edu/istlibrary/56

41.

McLaughlin GH (1969) SMOG grading: a new readability formula. J Read 12(8):639–646

42.

Coleman M, Liau TL (1975) A computer readability formula designed for machine scoring. J Appl Psychol 60(2):283–284. https://doi.org/10.1037/h0076540CrossRef

43.

Smith EA, Senter RJ (1967) Automated readability index. Aerospace Medical Research Laboratories, Aerospace Medical Division, Air Force Systems Command. pp 1–14

44.

Klare GR (1974) Assessing readability. Read Res Q 10(1):62–102. https://doi.org/10.2307/747086CrossRef

45.

Automatic Readability Checker, a free readability formula consensus calculator. https://readabilityformulas.com/free-readability-formula-tests.php. Accessed July 26, 2023

46.

American Society for Metabolic and Bariatric Surgery (2020) Bariatric surgery FAQs. https://asmbs.org/patients/faqs-of-bariatric-surgery. Accessed July 28, 2023

47.

Lucy AT, Rakestraw SL, Stringer C et al (2023) Readability of patient education materials for bariatric surgery. Surg Endosc 37(8):6519–6525. https://doi.org/10.1007/s00464-023-10153-3CrossRefPubMed

48.

Padilla G, Awshah S, Mhaskar RS et al (2023) Spanish-language bariatric surgery patient education materials fail to meet healthcare literacy standards of readability. Surg Endosc 37(8):6417–6428. https://doi.org/10.1007/s00464-023-10088-9CrossRefPubMed

49.

Hecht LM, Martens KM, Pester BD, Hamann A, Carlin AM, Miller-Matero LR (2022) Adherence to medical appointments among patients undergoing bariatric surgery: do health literacy, health numeracy, and cognitive functioning play a role? Obes Surg 32(4):1391–1393. https://doi.org/10.1007/s11695-022-05905-4CrossRefPubMed

50.

Hecht L, Cain S, Clark-Sienkiewicz SM et al (2019) Health literacy, health numeracy, and cognitive functioning among bariatric surgery candidates. Obes Surg 29(12):4138–4141. https://doi.org/10.1007/s11695-019-04149-zCrossRefPubMed

51.

Scott B (2023) The Gunning’s Fog Index (or FOG) Readability Formula. ReadabilityFormulas.com. https://readabilityformulas.com/the-gunnings-fog-index-or-fog-readability-formula/. Accessed September 24, 2023

52.

Agency for Healthcare Research and Quality (2015) Tip 6. Use caution with readability formulas for quality reports. Agency for Healthcare Research and Quality. https://www.ahrq.gov/talkingquality/resources/writing/tip6.html. Accessed July 29, 2023

Title: Large language models and bariatric surgery patient education: a comparative readability analysis of GPT-3.5, GPT-4, Bard, and online institutional resources
Authors: Nitin Srinivasan
Jamil S. Samaan
Nithya D. Rajeev
Mmerobasi U. Kanu
Yee Hui Yeo
Kamran Samakar
Publication date: 12-03-2024
Publisher: Springer US
Keywords: Bariatric Surgery
Bariatric Surgery
Air Pollutants
Published in: Surgical Endoscopy / Issue 5/2024
Print ISSN: 0930-2794
Electronic ISSN: 1432-2218
DOI: https://doi.org/10.1007/s00464-024-10720-2

At a glance: The STEP trials

Springer Medicine

Large language models and bariatric surgery patient education: a comparative readability analysis of GPT-3.5, GPT-4, Bard, and online institutional resources

Abstract

Background

Methods

Results

Conclusion

At a glance: The STEP trials

Springer Medicine

Abstract

Background

Methods

Results

Conclusion

Please log in to get access to this content

Other articles of this Issue 5/2024

The development of a deep learning model for automated segmentation of the robotic pancreaticojejunostomy

Natural history of gastric leiomyoma

Enhancing endoscopic measurement: validating a quantitative method for polyp size and location estimation in upper gastrointestinal endoscopy

Robot-assisted versus laparoscopic-assisted gastrectomy among malnourished patients with gastric cancer based on textbook outcome

Risk factors associated with functional esophageal disorders (FED) versus gastroesophageal reflux disease (GERD)

Surgical options and survival prognosis in geriatric patients beyond average lifespan with locally advanced gastric cancer: a propensity score-matched analysis