Skip to main content
Top
Published in:

28-09-2024 | Artificial Intelligence | ORIGINAL ARTICLE

Comparative performance of artificial intelligence models in rheumatology board-level questions: evaluating Google Gemini and ChatGPT-4o

Authors: Enes Efe Is, Ahmet Kivanc Menekseoglu

Published in: Clinical Rheumatology | Issue 11/2024

Login to get access

Abstract

Objectives

This study evaluates the performance of AI models, ChatGPT-4o and Google Gemini, in answering rheumatology board-level questions, comparing their effectiveness, reliability, and applicability in clinical practice.

Method

A cross-sectional study was conducted using 420 rheumatology questions from the BoardVitals question bank, excluding 27 visual data questions. Both artificial intelligence models categorized the questions according to difficulty (easy, medium, hard) and answered them. In addition, the reliability of the answers was assessed by asking the questions a second time. The accuracy, reliability, and difficulty categorization of the AI models’ response to the questions were analyzed.

Results

ChatGPT-4o answered 86.9% of the questions correctly, significantly outperforming Google Gemini’s 60.2% accuracy (p < 0.001). When the questions were asked a second time, the success rate was 86.7% for ChatGPT-4o and 60.5% for Google Gemini. Both models mainly categorized questions as medium difficulty. ChatGPT-4o showed higher accuracy in various rheumatology subfields, notably in Basic and Clinical Science (p = 0.028), Osteoarthritis (p = 0.023), and Rheumatoid Arthritis (p < 0.001).

Conclusions

ChatGPT-4o significantly outperformed Google Gemini in rheumatology board-level questions. This demonstrates the success of ChatGPT-4o in situations requiring complex and specialized knowledge related to rheumatological diseases. The performance of both AI models decreased as the question difficulty increased. This study demonstrates the potential of AI in clinical applications and suggests that its use as a tool to assist clinicians may improve healthcare efficiency in the future. Future studies using real clinical scenarios and real board questions are recommended.
Key Points
ChatGPT-4o significantly outperformed Google Gemini in answering rheumatology board-level questions, achieving 86.9% accuracy compared to Google Gemini’s 60.2%.
For both AI models, the correct answer rate decreased as the question difficulty increased.
The study demonstrates the potential for AI models to be used in clinical practice as a tool to assist clinicians and improve healthcare efficiency.
Literature
13.
18.
30.
go back to reference Giannakopoulos K, Kavadella A, Aaqel Salim A, Stamatopoulos V, Kaklamanos EG (2023) Evaluation of the performance of generative AI large language models ChatGPT, Google Bard, and Microsoft Bing Chat in supporting evidence-based dentistry: comparative mixed methods study. J Med Internet Res 25:e51580. https://doi.org/10.2196/51580CrossRefPubMedPubMedCentral Giannakopoulos K, Kavadella A, Aaqel Salim A, Stamatopoulos V, Kaklamanos EG (2023) Evaluation of the performance of generative AI large language models ChatGPT, Google Bard, and Microsoft Bing Chat in supporting evidence-based dentistry: comparative mixed methods study. J Med Internet Res 25:e51580. https://​doi.​org/​10.​2196/​51580CrossRefPubMedPubMedCentral
31.
go back to reference Gemini Team Google, Anil R, Borgeaud S, Wu Y, Alayrac JB, Yu J, Soricut R, Schalkwyk J, Dai AM, Hauth A, Millican K et al (2023) Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805. Accessed Aug 2024 Gemini Team Google, Anil R, Borgeaud S, Wu Y, Alayrac JB, Yu J, Soricut R, Schalkwyk J, Dai AM, Hauth A, Millican K et al (2023) Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:​2312.​11805. Accessed Aug 2024
Metadata
Title
Comparative performance of artificial intelligence models in rheumatology board-level questions: evaluating Google Gemini and ChatGPT-4o
Authors
Enes Efe Is
Ahmet Kivanc Menekseoglu
Publication date
28-09-2024
Publisher
Springer International Publishing
Published in
Clinical Rheumatology / Issue 11/2024
Print ISSN: 0770-3198
Electronic ISSN: 1434-9949
DOI
https://doi.org/10.1007/s10067-024-07154-5

Keynote webinar | Spotlight on menopause

Menopause can have a significant impact on the body, with effects ranging beyond the endocrine and reproductive systems. Learn about the broader systemic effects of menopause, so you can help patients in your clinics through the transition.   

Prof. Martha Hickey
Dr. Claudia Barth
Dr. Samar El Khoudary
Developed by: Springer Medicine
Watch now

Keynote webinar | Spotlight on adolescent vaping

  • Live
  • Webinar | 29-01-2025 | 18:00 (CET)

Growing numbers of young people are using e-cigarettes, despite warnings of respiratory effects and addiction. How can doctors tackle the epidemic, and what health effects should you prepare to manage in your clinics?

Watch it live: Wednesday 29th January, 18:00-19:30 CET
 

Prof. Ann McNeill
Dr. Debbie Robson
Benji Horwell
Developed by: Springer Medicine
Join the webinar

Keynote webinar | Spotlight on modern management of frailty

Frailty has a significant impact on health and wellbeing, especially in older adults. Our experts explain the factors that contribute to the development of frailty and how you can manage the condition and reduce the risk of disability, dependency, and mortality in your patients.

Prof. Alfonso Cruz-Jentoft
Prof. Barbara C. van Munster
Prof. Mirko Petrovic
Developed by: Springer Medicine
Watch now

A quick guide to ECGs

Improve your ECG interpretation skills with this comprehensive, rapid, interactive course. Expert advice provides detailed feedback as you work through 50 ECGs covering the most common cardiac presentations to ensure your practice stays up to date. 

PD Dr. Carsten W. Israel
Developed by: Springer Medizin
Start the cases

At a glance: The STEP trials

A round-up of the STEP phase 3 clinical trials evaluating semaglutide for weight loss in people with overweight or obesity.

Developed by: Springer Medicine
Read more