22-04-2025 | Artificial Intelligence | ORIGINAL ARTICLE
Assessment of ChatGPT’s adherence to EULAR diagnostic criteria and therapeutic protocols for rheumatoid arthritis at two distinct time points, 14 days apart, utilizing binary and multiple-choice inquiries
Authors: Neşe Çabuk Çelik, Elif Altunel Kılınç
Published in: Clinical Rheumatology
Login to get accessAbstract
Objectives
Artificial intelligence (AI) possesses considerable promise in healthcare to offer decision help in particular domains, including rheumatoid arthritis (RA). This study assesses the adherence of the advanced AI model ChatGPT-v4 to the European League Against Rheumatism (EULAR) recommendations.
Methods
The research employed a 100-item questionnaire consisting of true/false and multiple-choice formats, accompanied with real-world clinical scenarios developed concurrently with EULAR in the therapy of RA. Inquiries addressed diagnostic criteria, therapeutic alternatives, and follow-up procedures. Two rheumatologists assessed the ChatGPT for accuracy, consistency, and comprehensiveness utilizing a 6-point Likert scale.
Results
Evaluation occurred at baseline and on day 14. AI rectified the majority of errors at baseline in the paired questions. It did not advance on specific responses. One of the two previously incongruent responses remained unaltered, while the other was rectified. The 48 originally congruent responses rose to 49 on day 14. In binary questions, AI exhibited greater coherence than in multiple-choice questions. At baseline, 43 (86%) of the multiple-choice items were answered correctly. Upon reevaluation, 42 (84%) were found to be accurate. One response was erroneous on day 14. Three of the seven initially erroneous responses remained unaltered. Four erroneous responses were later rectified.
Conclusion
ChatGPT demonstrated efficacy in addressing binary and multiple-choice questions formulated according to EULAR guidelines for RA. The findings validated that AI can serve as a clinical support instrument in RA. It demonstrated that AI can be enhanced. AI attained accuracy in objective information and promptly rectified the error.
Key Points
• AI in healthcare: The integration of artificial intelligence, specifically ChatGPT-v4, in clinical practice aims to enhance decision-making in RA by adhering to EULAR recommendations for diagnosis, treatment, and follow-up.
• Inter-rater reliability: High agreement levels were noted among the evaluators, with Cohen’s kappa coefficients of 0.92 for binary questions and 0.94 for multiple-choice questions.
• AI learning dynamics: The study reveals that ChatGPT showed improvement in understanding and answering more complex questions over time, unlike findings in previous studies where AI struggled with consistency.
• Implications for clinical practice: The findings support the growing role of AI as a reliable tool in rheumatology, suggesting potential for personalized, evidence-based patient care.
|
Advertisement