Skip to main content
Top

Open Access 04-04-2024 | Glaucoma | Glaucoma

Large language models as assistance for glaucoma surgical cases: a ChatGPT vs. Google Gemini comparison

Authors: Matteo Mario Carlà, Gloria Gambini, Antonio Baldascino, Francesco Boselli, Federico Giannuzzi, Fabio Margollicci, Stanislao Rizzo

Published in: Graefe's Archive for Clinical and Experimental Ophthalmology

Login to get access

Abstract

Purpose

The aim of this study was to define the capability of ChatGPT-4 and Google Gemini in analyzing detailed glaucoma case descriptions and suggesting an accurate surgical plan.

Methods

Retrospective analysis of 60 medical records of surgical glaucoma was divided into “ordinary” (n = 40) and “challenging” (n = 20) scenarios. Case descriptions were entered into ChatGPT and Bard’s interfaces with the question “What kind of surgery would you perform?” and repeated three times to analyze the answers’ consistency. After collecting the answers, we assessed the level of agreement with the unified opinion of three glaucoma surgeons. Moreover, we graded the quality of the responses with scores from 1 (poor quality) to 5 (excellent quality), according to the Global Quality Score (GQS) and compared the results.

Results

ChatGPT surgical choice was consistent with those of glaucoma specialists in 35/60 cases (58%), compared to 19/60 (32%) of Gemini (p = 0.0001). Gemini was not able to complete the task in 16 cases (27%). Trabeculectomy was the most frequent choice for both chatbots (53% and 50% for ChatGPT and Gemini, respectively). In “challenging” cases, ChatGPT agreed with specialists in 9/20 choices (45%), outperforming Google Gemini performances (4/20, 20%). Overall, GQS scores were 3.5 ± 1.2 and 2.1 ± 1.5 for ChatGPT and Gemini (p = 0.002). This difference was even more marked if focusing only on “challenging” cases (1.5 ± 1.4 vs. 3.0 ± 1.5, p = 0.001).

Conclusion

ChatGPT-4 showed a good analysis performance for glaucoma surgical cases, either ordinary or challenging. On the other side, Google Gemini showed strong limitations in this setting, presenting high rates of unprecise or missed answers.
Literature
1.
go back to reference Ozdemir S (2023) Quick start guide to large language models: strategies and best practices for using ChatGPT and other LLMs. Addison-Wesley Professional Ozdemir S (2023) Quick start guide to large language models: strategies and best practices for using ChatGPT and other LLMs. Addison-Wesley Professional
2.
go back to reference Thirunavukarasu AJ, Ting DSJ, Elangovan K, Gutierrez L, Tan TF, Ting DSW (2023) Large language models in medicine. Nat Med 29:1930–1940CrossRefPubMed Thirunavukarasu AJ, Ting DSJ, Elangovan K, Gutierrez L, Tan TF, Ting DSW (2023) Large language models in medicine. Nat Med 29:1930–1940CrossRefPubMed
3.
go back to reference Singhal K, Azizi S, Tu T, Mahdavi SS, Wei J, Chung HW, Scales N, Tanwani A, Cole-Lewis H, Pfohl S, Payne P, Seneviratne M, Gamble P, Kelly C, Babiker A, Scharli N, Chowdhery A, Mansfield P, Demner-Fushman D, Aguera YAB, Webster D, Corrado GS, Matias Y, Chou K, Gottweis J, Tomasev N, Liu Y, Rajkomar A, Barral J, Semturs C, Karthikesalingam A, Natarajan V (2023) Large language models encode clinical knowledge. Nature 620:172–180. https://doi.org/10.1038/s41586-023-06291-2CrossRefPubMedPubMedCentral Singhal K, Azizi S, Tu T, Mahdavi SS, Wei J, Chung HW, Scales N, Tanwani A, Cole-Lewis H, Pfohl S, Payne P, Seneviratne M, Gamble P, Kelly C, Babiker A, Scharli N, Chowdhery A, Mansfield P, Demner-Fushman D, Aguera YAB, Webster D, Corrado GS, Matias Y, Chou K, Gottweis J, Tomasev N, Liu Y, Rajkomar A, Barral J, Semturs C, Karthikesalingam A, Natarajan V (2023) Large language models encode clinical knowledge. Nature 620:172–180. https://​doi.​org/​10.​1038/​s41586-023-06291-2CrossRefPubMedPubMedCentral
7.
go back to reference Siad S (2023) The promise and perils of Google’s Bard for scientific research. AI 1:1–5 Siad S (2023) The promise and perils of Google’s Bard for scientific research. AI 1:1–5
8.
go back to reference Koga S, Martin NB, Dickson DW (2023) Evaluating the performance of large language models: ChatGPT and Google Bard in generating differential diagnoses in clinicopathological conferences of neurodegenerative disorders. Brain Pathol 8:e13207 Koga S, Martin NB, Dickson DW (2023) Evaluating the performance of large language models: ChatGPT and Google Bard in generating differential diagnoses in clinicopathological conferences of neurodegenerative disorders. Brain Pathol 8:e13207
9.
go back to reference Gan RK, Ogbodo JC, Wee YZ, Gan AZ, González PA (2024) Performance of Google Bard and ChatGPT in mass casualty incidents triage. Am J Emerg Med 75:72–78CrossRefPubMed Gan RK, Ogbodo JC, Wee YZ, Gan AZ, González PA (2024) Performance of Google Bard and ChatGPT in mass casualty incidents triage. Am J Emerg Med 75:72–78CrossRefPubMed
20.
go back to reference Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A (2020) Language models are few-shot learners. Adv Neural Inf Process Syst 33:1877–1901 Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A (2020) Language models are few-shot learners. Adv Neural Inf Process Syst 33:1877–1901
22.
go back to reference Singh SK, Kumar S, Mehra PS (2023) Chat GPT & Google Bard AI: a review 2023 International Conference on IoT, Communication and Automation Technology (ICICAT). IEEE 1:1–6 Singh SK, Kumar S, Mehra PS (2023) Chat GPT & Google Bard AI: a review 2023 International Conference on IoT, Communication and Automation Technology (ICICAT). IEEE 1:1–6
23.
go back to reference Thoppilan R, De Freitas D, Hall J, Shazeer N, Kulshreshtha A, Cheng H-T, Jin A, Bos T, Baker L, Du Y (2022) Lamda: language models for dialog applications. arXiv:2201.08239 Thoppilan R, De Freitas D, Hall J, Shazeer N, Kulshreshtha A, Cheng H-T, Jin A, Bos T, Baker L, Du Y (2022) Lamda: language models for dialog applications. arXiv:​2201.​08239
25.
go back to reference Bernard A, Langille M, Hughes S, Rose C, Leddin D, Van Zanten SV (2007) A systematic review of patient inflammatory bowel disease information resources on the World Wide Web. Am J Gastroenterol 102:2070–2077CrossRefPubMed Bernard A, Langille M, Hughes S, Rose C, Leddin D, Van Zanten SV (2007) A systematic review of patient inflammatory bowel disease information resources on the World Wide Web. Am J Gastroenterol 102:2070–2077CrossRefPubMed
27.
go back to reference Pryss R, Kraft R, Baumeister H, Winkler J, Probst T, Reichert M, Langguth B, Spiliopoulou M, Schlee W (2019) Using Chatbots to support medical and psychological treatment procedures: challenges, opportunities, technologies, reference architecture. Digital Phenotyping and Mobile Sensing: New Developments in Psychoinformatics 1:249–260CrossRef Pryss R, Kraft R, Baumeister H, Winkler J, Probst T, Reichert M, Langguth B, Spiliopoulou M, Schlee W (2019) Using Chatbots to support medical and psychological treatment procedures: challenges, opportunities, technologies, reference architecture. Digital Phenotyping and Mobile Sensing: New Developments in Psychoinformatics 1:249–260CrossRef
28.
go back to reference Zagabathuni Y (2022) Applications, scope, and challenges for AI in healthcare. Int J 10:195–199 Zagabathuni Y (2022) Applications, scope, and challenges for AI in healthcare. Int J 10:195–199
29.
go back to reference Ren LY (2019) Product: Isabel Pro–the DDX generator. The Journal of the Canadian Health Libraries Association= Journal de l'Association des Bibliothèques de la Santé du Canada 40: 63–69 Ren LY (2019) Product: Isabel Pro–the DDX generator. The Journal of the Canadian Health Libraries Association= Journal de l'Association des Bibliothèques de la Santé du Canada 40: 63–69
32.
go back to reference Alser M, Waisberg E (2023) Concerns with the usage of ChatGPT in academia and medicine: a viewpoint. Am J Med Open 9(100036):1–2 Alser M, Waisberg E (2023) Concerns with the usage of ChatGPT in academia and medicine: a viewpoint. Am J Med Open 9(100036):1–2
33.
go back to reference Marks J, Harding A, Harper R, Williams E, Haque S, Spencer A, Fenerty C (2012) Agreement between specially trained and accredited optometrists and glaucoma specialist consultant ophthalmologists in their management of glaucoma patients. Eye 26:853–861CrossRefPubMedPubMedCentral Marks J, Harding A, Harper R, Williams E, Haque S, Spencer A, Fenerty C (2012) Agreement between specially trained and accredited optometrists and glaucoma specialist consultant ophthalmologists in their management of glaucoma patients. Eye 26:853–861CrossRefPubMedPubMedCentral
34.
go back to reference Fisher S, Rosella LC (2022) Priorities for successful use of artificial intelligence by public health organizations: a literature review. BMC Public Health 22:2146CrossRefPubMedPubMedCentral Fisher S, Rosella LC (2022) Priorities for successful use of artificial intelligence by public health organizations: a literature review. BMC Public Health 22:2146CrossRefPubMedPubMedCentral
35.
go back to reference Carlà MM, Gambini G, Baldascino A, Giannuzzi F, Boselli F, Crincoli E, D’Onofrio NC, Rizzo S (2024) Exploring AI-chatbots’ capability to suggest surgical planning in ophthalmology: ChatGPT versus Google Gemini analysis of retinal detachment cases. British Journal of Ophthalmology: bjo-2023–325143 https://doi.org/10.1136/bjo-2023-325143 Carlà MM, Gambini G, Baldascino A, Giannuzzi F, Boselli F, Crincoli E, D’Onofrio NC, Rizzo S (2024) Exploring AI-chatbots’ capability to suggest surgical planning in ophthalmology: ChatGPT versus Google Gemini analysis of retinal detachment cases. British Journal of Ophthalmology: bjo-2023–325143 https://​doi.​org/​10.​1136/​bjo-2023-325143
36.
go back to reference Radford A, Kim JW, Hallacy C, Ramesh A, Goh G, Agarwal S, Sastry G, Askell A, Mishkin P, Clark J (2021) Learning transferable visual models from natural language supervision. International conference on machine learning. arXiv:2103.00020 Radford A, Kim JW, Hallacy C, Ramesh A, Goh G, Agarwal S, Sastry G, Askell A, Mishkin P, Clark J (2021) Learning transferable visual models from natural language supervision. International conference on machine learning. arXiv:​2103.​00020
Metadata
Title
Large language models as assistance for glaucoma surgical cases: a ChatGPT vs. Google Gemini comparison
Authors
Matteo Mario Carlà
Gloria Gambini
Antonio Baldascino
Francesco Boselli
Federico Giannuzzi
Fabio Margollicci
Stanislao Rizzo
Publication date
04-04-2024
Publisher
Springer Berlin Heidelberg
Published in
Graefe's Archive for Clinical and Experimental Ophthalmology
Print ISSN: 0721-832X
Electronic ISSN: 1435-702X
DOI
https://doi.org/10.1007/s00417-024-06470-5