Top

Journal of Orthopaedic Surgery and Research

Published in:

Open Access 01-12-2024 | Research article

Assessing ChatGPT’s orthopedic in-service training exam performance and applicability in the field

Authors: Neil Jain, Caleb Gottlich, John Fisher, Dominic Campano, Travis Winston

Published in: Journal of Orthopaedic Surgery and Research | Issue 1/2024

Abstract

Background

ChatGPT has gained widespread attention for its ability to understand and provide human-like responses to inputs. However, few works have focused on its use in Orthopedics. This study assessed ChatGPT’s performance on the Orthopedic In-Service Training Exam (OITE) and evaluated its decision-making process to determine whether adoption as a resource in the field is practical.

Methods

ChatGPT’s performance on three OITE exams was evaluated through inputting multiple choice questions. Questions were classified by their orthopedic subject area. Yearly, OITE technical reports were used to gauge scores against resident physicians. ChatGPT’s rationales were compared with testmaker explanations using six different groups denoting answer accuracy and logic consistency. Variables were analyzed using contingency table construction and Chi-squared analyses.

Results

Of 635 questions, 360 were useable as inputs (56.7%). ChatGPT-3.5 scored 55.8%, 47.7%, and 54% for the years 2020, 2021, and 2022, respectively. Of 190 correct outputs, 179 provided a consistent logic (94.2%). Of 170 incorrect outputs, 133 provided an inconsistent logic (78.2%). Significant associations were found between test topic and correct answer (p = 0.011), and type of logic used and tested topic (p = < 0.001). Basic Science and Sports had adjusted residuals greater than 1.96. Basic Science and correct, no logic; Basic Science and incorrect, inconsistent logic; Sports and correct, no logic; and Sports and incorrect, inconsistent logic; had adjusted residuals greater than 1.96.

Conclusions

Based on annual OITE technical reports for resident physicians, ChatGPT-3.5 performed around the PGY-1 level. When answering correctly, it displayed congruent reasoning with testmakers. When answering incorrectly, it exhibited some understanding of the correct answer. It outperformed in Basic Science and Sports, likely due to its ability to output rote facts. These findings suggest that it lacks the fundamental capabilities to be a comprehensive tool in Orthopedic Surgery in its current form.

Level of Evidence: II.

Shen Y, Heacock L, Elias J, Hentel KD, Reig B, Shih G, et al. ChatGPT and other large language models are double-edged swords. Radiology. 2023;307(2): e230163.CrossRefPubMed

Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I. Attention is all you need. Adv Neural Inf Process Syst 2017:5998–6008.

Gilson A, Safranek CW, Huang T, Socrates V, Chi L, Taylor RA, et al. How does ChatGPT perform on the United States medical licensing examination? The Implications of large language models for medical education and knowledge assessment. JMIR Med Educ. 2023;9: e45312.CrossRefPubMedPubMedCentral

Rao A, Kim J, Kamineni M, Pang M, Lie W, Succi MD. Evaluating ChatGPT as an Adjunct for Radiologic Decision-Making. medRxiv. 2023.

Kung TH, Cheatham M, Medenilla A, Sillos C, De Leon L, Elepaño C, et al. Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models. PLOS Digit Health. 2023;2(2): e0000198.CrossRefPubMedPubMedCentral

Haver HL, Ambinder EB, Bahl M, Oluyemi ET, Jeudy J, Yi PH. Appropriateness of breast cancer prevention and screening recommendations provided by ChatGPT. Radiology. 2023. https://doi.org/10.1016/j.ajog.2023.03.009.CrossRefPubMedPubMedCentral

Grünebaum A, Chervenak J, Pollet SL, Katz A, Chervenak FA. The exciting potential for ChatGPT in obstetrics and gynecology. Am J Obstet Gynecol. 2023. https://doi.org/10.1016/j.ajog.2023.03.009.CrossRefPubMedPubMedCentral

Yeo YH, Samaan JS, Ng WH, Ting PS, Trivedi H, Vipani A, et al. Assessing the performance of ChatGPT in answering questions regarding cirrhosis and hepatocellular carcinoma. Clin Mol Hepatol. 2023. https://doi.org/10.1101/2023.02.06.23285449.CrossRefPubMedPubMedCentral

Le HV, Wick JB, Haus BM, Dyer GSM. Orthopaedic in-training examination: history, perspective, and tips for residents. J Am Acad Orthop Surg. 2021;29(9):e427–37.CrossRefPubMed

10.

Dubin JA, Bains SS, Chen Z, Hameed D, Nace J, Mont MA, et al. Using a google web search analysis to assess the utility of ChatGPT in total joint arthroplasty. J Arthroplasty. 2023. https://doi.org/10.1016/j.arth.2023.04.007.CrossRefPubMed

11.

Sinha RK, Deb Roy A, Kumar N, Mondal H. Applicability of ChatGPT in assisting to solve higher order problems in pathology. Cureus. 2023;15(2): e35237.PubMedPubMedCentral

12.

Bhattacharyya M, Miller VM, Bhattacharyya D, Miller LE. High rates of fabricated and inaccurate references in ChatGPT-generated medical Content. Cureus. 2023;15(5): e39238.PubMedPubMedCentral

13.

Wagner MW, Ertl-Wagner BB. Accuracy of Information and references using ChatGPT-3 for retrieval of clinical radiological information. Can Assoc Radiol J. 2023. https://doi.org/10.1177/08465371231171125.CrossRefPubMed

14.

Ge J, Lai JC. Artificial intelligence-based text generators in hepatology: ChatGPT is just the beginning. Hepatol Commun. 2023;7(4):e0097.CrossRefPubMedPubMedCentral

15.

Kung JE, Marshall C, Gauthier C, Gonzalez TA, Jackson JB 3rd. Evaluating ChatGPT performance on the orthopaedic in-training examination. JB JS Open Access. 2023;8(3):e23.PubMedCentral

16.

OpenAI. GPT-4 Technical Report. ArXiv. 2023. https://arxiv.org/abs/2303.08774.

17.

OpenAI. GPT-4V(ision) system card. OpenAI Research. 2023.

18.

Fraval A, Chandrananth J, Chong YM, Coventry LS, Tran P. Internet based patient education improves informed consent for elective orthopaedic surgery: a randomized controlled trial. BMC Musculoskelet Disord. 2015;16:14.CrossRefPubMedPubMedCentral

19.

Fijačko N, Gosak L, Štiglic G, Picard CT, John DM. Can ChatGPT pass the life support exams without entering the American heart association course? Resuscitation. 2023;185: 109732.CrossRefPubMed

20.

Huh S. Are ChatGPT’s knowledge and interpretation ability comparable to those of medical students in Korea for taking a parasitology examination?: A descriptive study. J Educ Eval Health Prof. 2023;20:1.PubMedPubMedCentral

21.

Rees EL, Quinn PJ, Davies B, Fotheringham V. How does peer teaching compare to faculty teaching? A systematic review and meta-analysis. Med Teach. 2016;38(8):829–37.CrossRefPubMed

22.

Lahat A, Shachar E, Avidan B, Shatz Z, Glicksberg BS, Klang E. Evaluating the use of large language model in identifying top research questions in gastroenterology. Sci Rep. 2023;13(1):4164.CrossRefPubMedPubMedCentral

Title: Assessing ChatGPT’s orthopedic in-service training exam performance and applicability in the field
Authors: Neil Jain
Caleb Gottlich
John Fisher
Dominic Campano
Travis Winston
Publication date: 01-12-2024
Publisher: BioMed Central
Published in: Journal of Orthopaedic Surgery and Research / Issue 1/2024
Electronic ISSN: 1749-799X
DOI: https://doi.org/10.1186/s13018-023-04467-0

Keynote webinar | Spotlight on medication adherence

Springer Medicine

Assessing ChatGPT’s orthopedic in-service training exam performance and applicability in the field

Abstract

Background

Methods

Results

Conclusions

Keynote webinar | Spotlight on medication adherence

Springer Medicine

Abstract

Background

Methods

Results

Conclusions

Please log in to get access to this content

Other articles of this Issue 1/2024

Do nutritional assessment tools (PNI, CONUT, GNRI) predict adverse events after spinal surgeries? A systematic review and meta-analysis

Effectiveness of a two-stage posterior-anterior–posterior surgery using subcutaneously preserved autologous bone grafts for adult spinal deformity: a retrospective observational study

Application of machine learning-based multi-sequence MRI radiomics in diagnosing anterior cruciate ligament tears

Application and evaluation of artificial intelligence 3D preoperative planning software in developmental dysplasia of the hip

Silencing p75NTR regulates osteogenic differentiation and angiogenesis of BMSCs to enhance bone healing in fractured rats

Medial patellar ligament reconstruction in combination with derotational distal femoral osteotomy for treating recurrent patellar dislocation in the presence of increased femoral anteversion: a systematic review