Top

European Archives of Oto-Rhino-Laryngology

22-04-2024 | Pediatrics | Short Communication

ChatGPT’s adherence to otolaryngology clinical practice guidelines

Authors: Idit Tessler, Amit Wolfovitz, Eran E. Alon, Nir A. Gecel, Nir Livneh, Eyal Zimlichman, Eyal Klang

Published in: European Archives of Oto-Rhino-Laryngology

Abstract

Objectives

Large language models, including ChatGPT, has the potential to transform the way we approach medical knowledge, yet accuracy in clinical topics is critical. Here we assessed ChatGPT’s performance in adhering to the American Academy of Otolaryngology-Head and Neck Surgery guidelines.

Methods

We presented ChatGPT with 24 clinical otolaryngology questions based on the guidelines of the American Academy of Otolaryngology. This was done three times (N = 72) to test the model’s consistency. Two otolaryngologists evaluated the responses for accuracy and relevance to the guidelines. Cohen’s Kappa was used to measure evaluator agreement, and Cronbach’s alpha assessed the consistency of ChatGPT’s responses.

Results

The study revealed mixed results; 59.7% (43/72) of ChatGPT’s responses were highly accurate, while only 2.8% (2/72) directly contradicted the guidelines. The model showed 100% accuracy in Head and Neck, but lower accuracy in Rhinology and Otology/Neurotology (66%), Laryngology (50%), and Pediatrics (8%). The model’s responses were consistent in 17/24 (70.8%), with a Cronbach’s alpha value of 0.87, indicating a reasonable consistency across tests.

Conclusions

Using a guideline-based set of structured questions, ChatGPT demonstrates consistency but variable accuracy in otolaryngology. Its lower performance in some areas, especially Pediatrics, suggests that further rigorous evaluation is needed before considering real-world clinical use.

Available only for authorised users

Pavlik JV (2023) Collaborating with ChatGPT: considering the implications of generative artificial intelligence for journalism and media education. Journalism Mass Commun Educ. https://doi.org/10.1177/10776958221149577CrossRef

Johnson SB, King AJ, Warner EL, Aneja S, Kann BH, Bylund CL (2023) Using ChatGPT to evaluate cancer myths and misconceptions: artificial intelligence and cancer information. JNCI Cancer Spectr 7(2):pkad015CrossRefPubMedPubMedCentral

Kung TH, Cheatham M, Medinilla A, ChatGPT, Sillos C, De Leon L, Elepaño C, Madriaga M, Aggabao R, Diaz-Candido G, Maningo J, Tseng V (2023) Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models. PLOS Digit Health 2(2):e0000198. https://doi.org/10.1371/journal.pdig.0000198

Teixeira-Marques F, Medeiros N, Nazaré F, Alves S, Lima N, Ribeiro L et al (2024) Exploring the role of ChatGPT in clinical decision-making in otorhinolaryngology: a ChatGPT designed study. Eur Arch Otorhinolaryngol 281:2023–2030CrossRefPubMed

Juhi A, Pipil N, Santra S, Mondal S, Behera JK, Mondal H (2023) The capability of ChatGPT in predicting and explaining common drug–drug interactions. Cureus 15(3):e36272PubMedPubMedCentral

Mittermaier M, Raza MM, Kvedar JC (2023) Bias in AI-based models for medical applications: challenges and mitigation strategies. npj Digit Med 6(1):113CrossRefPubMedPubMedCentral

Obermeyer Z, Powers B, Vogeli C, Mullainathan S (2019) Dissecting racial bias in an algorithm used to manage the health of populations. Science 366(6464):447–453CrossRefPubMed

Wang C, Liu S, Yang H, Guo J, Wu Y, Liu J (2023) Ethical considerations of using ChatGPT in health care. J Med Internet Res 25:e48009CrossRefPubMedPubMedCentral

American Academy of Otolaryngology-Head and Neck Surgery (AAO-HNS). https://www.entnet.org/. Accessed 31 Jan 2023

10.

Zalzal HG, Cheng J, Shah RK (2023) Evaluating the current ability of ChatGPT to assist in professional otolaryngology education. OTO Open 7(4):e94CrossRefPubMedPubMedCentral

11.

Graham F (2022) Daily briefing: will ChatGPT kill the essay assignment? Nature. https://doi.org/10.1038/d41586-022-04437-2CrossRefPubMedPubMedCentral

12.

O’Connor S (2023) Open artificial intelligence platforms in nursing education: tools for academic progress or abuse? Nurse Educ Pract 66:103537CrossRefPubMed

13.

Castelvecchi D (2022) Are ChatGPT and AlphaCode going to replace programmers? Nature. https://doi.org/10.1038/d41586-022-04383-zCrossRefPubMed

14.

Thorp HH (2023) ChatGPT is fun, but not an author. Science 379(6630):313CrossRefPubMed

15.

Alfertshofer M, Hoch CC, Funk PF, Hollmann K, Wollenberg B, Knoedler S et al (2023) Sailing the seven seas: a multinational comparison of ChatGPT’s performance on medical licensing examinations. Ann Biomed Eng. https://doi.org/10.1007/s10439-023-03338-3CrossRefPubMed

16.

Vaira LA, Lechien JR, Abbate V, Allevi F, Audino G, Beltramini GA et al (2023) Accuracy of ChatGPT-generated information on head and neck and oromaxillofacial surgery: a multicenter collaborative analysis. Otolaryngol Head Neck Surg. https://doi.org/10.1002/ohn.489CrossRefPubMed

17.

Taira K, Itaya T, Hanada A (2023) Performance of the large language model ChatGPT on the national nurse examinations in Japan: evaluation study. JMIR Nurs 6:e47305CrossRefPubMedPubMedCentral

18.

Qu RW, Qureshi U, Petersen G, Lee SC (2023) Diagnostic and management applications of ChatGPT in structured otolaryngology clinical scenarios. OTO Open 7(3):e67CrossRefPubMedPubMedCentral

19.

Gilson A, Safranek CW, Huang T, Socrates V, Chi L, Taylor RA et al (2023) How does ChatGPT perform on the United States medical licensing examination (USMLE)? The implications of large language models for medical education and knowledge assessment. JMIR Med Educ 9:e45312CrossRefPubMedPubMedCentral

20.

Thirunavukarasu AJ, Hassan R, Mahmood S, Sanghera R, Barzangi K, El Mukashfi M et al (2023) Trialling a large language model (ChatGPT) in general practice with the applied knowledge test: observational study demonstrating opportunities and limitations in primary care. JMIR Med Educ 9:e46599CrossRefPubMedPubMedCentral

21.

Tessler I, Wolfovitz A, Livneh N, Gecel NA, Sorin V, Barash Y et al (2024) Advancing medical practice with artificial intelligence: ChatGPT in healthcare. Isr Med Assoc J 26(2):80–85PubMed

22.

Hoch CC, Wollenberg B, Lüers J-C, Knoedler S, Knoedler L, Frank K et al (2023) ChatGPT’s quiz skills in different otolaryngology subspecialties: an analysis of 2576 single-choice and multiple-choice board certification preparation questions. Eur Arch Otorhinolaryngol 280(9):4271–4278CrossRefPubMedPubMedCentral

23.

Guirguis CA, Crossley JR, Malekzadeh S (2023) Bilateral vocal fold paralysis in a patient with neurosarcoidosis: a ChatGPT-driven case report describing an unusual presentation. Cureus 15(4):e37368PubMedPubMedCentral

24.

Kim H-Y (2023) A case report on ground-level alternobaric vertigo due to eustachian tube dysfunction with the assistance of conversational generative pre-trained transformer (ChatGPT). Cureus 15(3):e36830PubMedPubMedCentral

25.

Radulesco T, Saibene AM, Michel J, Vaira LA, Lechien JR (2024) ChatGPT-4 performance in rhinology: a clinical case series. Int Forum Allergy Rhinol. https://doi.org/10.1002/alr.23323CrossRefPubMed

Title: ChatGPT’s adherence to otolaryngology clinical practice guidelines
Authors: Idit Tessler
Amit Wolfovitz
Eran E. Alon
Nir A. Gecel
Nir Livneh
Eyal Zimlichman
Eyal Klang
Publication date: 22-04-2024
Publisher: Springer Berlin Heidelberg
Keyword: Pediatrics
Published in: European Archives of Oto-Rhino-Laryngology
Print ISSN: 0937-4477
Electronic ISSN: 1434-4726
DOI: https://doi.org/10.1007/s00405-024-08634-9

Keynote webinar | Spotlight on medication adherence

Springer Medicine

ChatGPT’s adherence to otolaryngology clinical practice guidelines

Abstract

Objectives

Methods

Results

Conclusions

Keynote webinar | Spotlight on medication adherence

Springer Medicine

Abstract

Objectives

Methods

Results

Conclusions

Please log in to get access to this content