Skip to main content
Top

22-04-2024 | Pediatrics | Short Communication

ChatGPT’s adherence to otolaryngology clinical practice guidelines

Authors: Idit Tessler, Amit Wolfovitz, Eran E. Alon, Nir A. Gecel, Nir Livneh, Eyal Zimlichman, Eyal Klang

Published in: European Archives of Oto-Rhino-Laryngology

Login to get access

Abstract

Objectives

Large language models, including ChatGPT, has the potential to transform the way we approach medical knowledge, yet accuracy in clinical topics is critical. Here we assessed ChatGPT’s performance in adhering to the American Academy of Otolaryngology-Head and Neck Surgery guidelines.

Methods

We presented ChatGPT with 24 clinical otolaryngology questions based on the guidelines of the American Academy of Otolaryngology. This was done three times (N = 72) to test the model’s consistency. Two otolaryngologists evaluated the responses for accuracy and relevance to the guidelines. Cohen’s Kappa was used to measure evaluator agreement, and Cronbach’s alpha assessed the consistency of ChatGPT’s responses.

Results

The study revealed mixed results; 59.7% (43/72) of ChatGPT’s responses were highly accurate, while only 2.8% (2/72) directly contradicted the guidelines. The model showed 100% accuracy in Head and Neck, but lower accuracy in Rhinology and Otology/Neurotology (66%), Laryngology (50%), and Pediatrics (8%). The model’s responses were consistent in 17/24 (70.8%), with a Cronbach’s alpha value of 0.87, indicating a reasonable consistency across tests.

Conclusions

Using a guideline-based set of structured questions, ChatGPT demonstrates consistency but variable accuracy in otolaryngology. Its lower performance in some areas, especially Pediatrics, suggests that further rigorous evaluation is needed before considering real-world clinical use.
Appendix
Available only for authorised users
Literature
2.
go back to reference Johnson SB, King AJ, Warner EL, Aneja S, Kann BH, Bylund CL (2023) Using ChatGPT to evaluate cancer myths and misconceptions: artificial intelligence and cancer information. JNCI Cancer Spectr 7(2):pkad015CrossRefPubMedPubMedCentral Johnson SB, King AJ, Warner EL, Aneja S, Kann BH, Bylund CL (2023) Using ChatGPT to evaluate cancer myths and misconceptions: artificial intelligence and cancer information. JNCI Cancer Spectr 7(2):pkad015CrossRefPubMedPubMedCentral
3.
go back to reference Kung TH, Cheatham M, Medinilla A, ChatGPT, Sillos C, De Leon L, Elepaño C, Madriaga M, Aggabao R, Diaz-Candido G, Maningo J, Tseng V (2023) Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models. PLOS Digit Health 2(2):e0000198. https://doi.org/10.1371/journal.pdig.0000198 Kung TH, Cheatham M, Medinilla A, ChatGPT, Sillos C, De Leon L, Elepaño C, Madriaga M, Aggabao R, Diaz-Candido G, Maningo J, Tseng V (2023) Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models. PLOS Digit Health 2(2):e0000198. https://​doi.​org/​10.​1371/​journal.​pdig.​0000198
4.
go back to reference Teixeira-Marques F, Medeiros N, Nazaré F, Alves S, Lima N, Ribeiro L et al (2024) Exploring the role of ChatGPT in clinical decision-making in otorhinolaryngology: a ChatGPT designed study. Eur Arch Otorhinolaryngol 281:2023–2030CrossRefPubMed Teixeira-Marques F, Medeiros N, Nazaré F, Alves S, Lima N, Ribeiro L et al (2024) Exploring the role of ChatGPT in clinical decision-making in otorhinolaryngology: a ChatGPT designed study. Eur Arch Otorhinolaryngol 281:2023–2030CrossRefPubMed
5.
go back to reference Juhi A, Pipil N, Santra S, Mondal S, Behera JK, Mondal H (2023) The capability of ChatGPT in predicting and explaining common drug–drug interactions. Cureus 15(3):e36272PubMedPubMedCentral Juhi A, Pipil N, Santra S, Mondal S, Behera JK, Mondal H (2023) The capability of ChatGPT in predicting and explaining common drug–drug interactions. Cureus 15(3):e36272PubMedPubMedCentral
6.
go back to reference Mittermaier M, Raza MM, Kvedar JC (2023) Bias in AI-based models for medical applications: challenges and mitigation strategies. npj Digit Med 6(1):113CrossRefPubMedPubMedCentral Mittermaier M, Raza MM, Kvedar JC (2023) Bias in AI-based models for medical applications: challenges and mitigation strategies. npj Digit Med 6(1):113CrossRefPubMedPubMedCentral
7.
go back to reference Obermeyer Z, Powers B, Vogeli C, Mullainathan S (2019) Dissecting racial bias in an algorithm used to manage the health of populations. Science 366(6464):447–453CrossRefPubMed Obermeyer Z, Powers B, Vogeli C, Mullainathan S (2019) Dissecting racial bias in an algorithm used to manage the health of populations. Science 366(6464):447–453CrossRefPubMed
10.
12.
go back to reference O’Connor S (2023) Open artificial intelligence platforms in nursing education: tools for academic progress or abuse? Nurse Educ Pract 66:103537CrossRefPubMed O’Connor S (2023) Open artificial intelligence platforms in nursing education: tools for academic progress or abuse? Nurse Educ Pract 66:103537CrossRefPubMed
17.
go back to reference Taira K, Itaya T, Hanada A (2023) Performance of the large language model ChatGPT on the national nurse examinations in Japan: evaluation study. JMIR Nurs 6:e47305CrossRefPubMedPubMedCentral Taira K, Itaya T, Hanada A (2023) Performance of the large language model ChatGPT on the national nurse examinations in Japan: evaluation study. JMIR Nurs 6:e47305CrossRefPubMedPubMedCentral
18.
go back to reference Qu RW, Qureshi U, Petersen G, Lee SC (2023) Diagnostic and management applications of ChatGPT in structured otolaryngology clinical scenarios. OTO Open 7(3):e67CrossRefPubMedPubMedCentral Qu RW, Qureshi U, Petersen G, Lee SC (2023) Diagnostic and management applications of ChatGPT in structured otolaryngology clinical scenarios. OTO Open 7(3):e67CrossRefPubMedPubMedCentral
19.
go back to reference Gilson A, Safranek CW, Huang T, Socrates V, Chi L, Taylor RA et al (2023) How does ChatGPT perform on the United States medical licensing examination (USMLE)? The implications of large language models for medical education and knowledge assessment. JMIR Med Educ 9:e45312CrossRefPubMedPubMedCentral Gilson A, Safranek CW, Huang T, Socrates V, Chi L, Taylor RA et al (2023) How does ChatGPT perform on the United States medical licensing examination (USMLE)? The implications of large language models for medical education and knowledge assessment. JMIR Med Educ 9:e45312CrossRefPubMedPubMedCentral
20.
go back to reference Thirunavukarasu AJ, Hassan R, Mahmood S, Sanghera R, Barzangi K, El Mukashfi M et al (2023) Trialling a large language model (ChatGPT) in general practice with the applied knowledge test: observational study demonstrating opportunities and limitations in primary care. JMIR Med Educ 9:e46599CrossRefPubMedPubMedCentral Thirunavukarasu AJ, Hassan R, Mahmood S, Sanghera R, Barzangi K, El Mukashfi M et al (2023) Trialling a large language model (ChatGPT) in general practice with the applied knowledge test: observational study demonstrating opportunities and limitations in primary care. JMIR Med Educ 9:e46599CrossRefPubMedPubMedCentral
21.
go back to reference Tessler I, Wolfovitz A, Livneh N, Gecel NA, Sorin V, Barash Y et al (2024) Advancing medical practice with artificial intelligence: ChatGPT in healthcare. Isr Med Assoc J 26(2):80–85PubMed Tessler I, Wolfovitz A, Livneh N, Gecel NA, Sorin V, Barash Y et al (2024) Advancing medical practice with artificial intelligence: ChatGPT in healthcare. Isr Med Assoc J 26(2):80–85PubMed
22.
go back to reference Hoch CC, Wollenberg B, Lüers J-C, Knoedler S, Knoedler L, Frank K et al (2023) ChatGPT’s quiz skills in different otolaryngology subspecialties: an analysis of 2576 single-choice and multiple-choice board certification preparation questions. Eur Arch Otorhinolaryngol 280(9):4271–4278CrossRefPubMedPubMedCentral Hoch CC, Wollenberg B, Lüers J-C, Knoedler S, Knoedler L, Frank K et al (2023) ChatGPT’s quiz skills in different otolaryngology subspecialties: an analysis of 2576 single-choice and multiple-choice board certification preparation questions. Eur Arch Otorhinolaryngol 280(9):4271–4278CrossRefPubMedPubMedCentral
23.
go back to reference Guirguis CA, Crossley JR, Malekzadeh S (2023) Bilateral vocal fold paralysis in a patient with neurosarcoidosis: a ChatGPT-driven case report describing an unusual presentation. Cureus 15(4):e37368PubMedPubMedCentral Guirguis CA, Crossley JR, Malekzadeh S (2023) Bilateral vocal fold paralysis in a patient with neurosarcoidosis: a ChatGPT-driven case report describing an unusual presentation. Cureus 15(4):e37368PubMedPubMedCentral
24.
go back to reference Kim H-Y (2023) A case report on ground-level alternobaric vertigo due to eustachian tube dysfunction with the assistance of conversational generative pre-trained transformer (ChatGPT). Cureus 15(3):e36830PubMedPubMedCentral Kim H-Y (2023) A case report on ground-level alternobaric vertigo due to eustachian tube dysfunction with the assistance of conversational generative pre-trained transformer (ChatGPT). Cureus 15(3):e36830PubMedPubMedCentral
Metadata
Title
ChatGPT’s adherence to otolaryngology clinical practice guidelines
Authors
Idit Tessler
Amit Wolfovitz
Eran E. Alon
Nir A. Gecel
Nir Livneh
Eyal Zimlichman
Eyal Klang
Publication date
22-04-2024
Publisher
Springer Berlin Heidelberg
Keyword
Pediatrics
Published in
European Archives of Oto-Rhino-Laryngology
Print ISSN: 0937-4477
Electronic ISSN: 1434-4726
DOI
https://doi.org/10.1007/s00405-024-08634-9