Dear Editor, we would like to comment on “Comparative performance of artificial intelligence models in rheumatology board-level questions: evaluating Google Gemini and ChatGPT-4o” [
1], a study assessing the efficacy of ChatGPT-4o and Google Gemini AI models in answering board-level rheumatology questions found significant content and technique concerns. While the comparative analysis revealed fascinating insights into the model’s correctness, the lack of picture data-related questions may have limited the study’s breadth. Rheumatology frequently requires the evaluation of photographs or visual data, and the model’s capacity to interpret these features could have resulted in a more comprehensive assessment. Furthermore, the criteria for classifying question difficulty were not adequately addressed, raising the question of whether the two models used the same standard criteria for this categorization. …