Skip to main content
Top
Published in:

28-09-2024 | Artificial Intelligence | Original Article

Artificial intelligence in reproductive endocrinology: an in-depth longitudinal analysis of ChatGPTv4’s month-by-month interpretation and adherence to clinical guidelines for diminished ovarian reserve

Authors: Tugba Gurbuz, Oya Gokmen, Belgin Devranoglu, Arzu Yurci, Asena Ayar Madenli

Published in: Endocrine | Issue 3/2024

Login to get access

Abstract

Objective

To quantitatively assess the performance of ChatGPTv4, an Artificial Intelligence Language Model, in adhering to clinical guidelines for Diminished Ovarian Reserve (DOR) over two months, evaluating the model’s consistency in providing guideline-based responses.

Design

A longitudinal study design was employed to evaluate ChatGPTv4’s response accuracy and completeness using a structured questionnaire at baseline and at a two-month follow-up.

Setting

ChatGPTv4 was tasked with interpreting DOR questionnaires based on standardized clinical guidelines.

Participants

The study did not involve human participants; the questionnaire was exclusively administered to the ChatGPT model to generate responses about DOR.

Methods

A guideline-based questionnaire with 176 open-ended, 166 multiple-choice, and 153 true/false questions were deployed to rigorously assess ChatGPTv4’s ability to provide accurate medical advice aligned with current DOR clinical guidelines. AI-generated responses were rated on a 6-point Likert scale for accuracy and a 3-point scale for completeness. The two-phase design assessed the stability and consistency of AI-generated answers over two months.

Results

ChatGPTv4 achieved near-perfect scores across all question types, with true/false questions consistently answered with 100% accuracy. In multiple-choice queries, accuracy improved from 98.2 to 100% at the two-month follow-up. Open-ended question responses exhibited significant positive enhancements, with accuracy scores increasing from an average of 5.38 ± 0.71 to 5.74 ± 0.51 (max: 6.0) and completeness scores from 2.57 ± 0.52 to 2.85 ± 0.36 (max: 3.0). It underscored the improvements as significant (p < 0.001), with positive correlations between initial and follow-up accuracy (r = 0.597) and completeness (r = 0.381) scores.

Limitations

The study was limited by the reliance on a controlled, albeit simulated, setting that may not perfectly mirror real-world clinical interactions.

Conclusion

ChatGPTv4 demonstrated exceptional and improving accuracy and completeness in handling DOR-related guideline queries over the studied period. These findings highlight ChatGPTv4’s potential as a reliable, adaptable AI tool in reproductive endocrinology, capable of augmenting clinical decision-making and guideline development.
Literature
1.
go back to reference D. Bhaskar, T.A. Chang, S. Wang, Current trends in artificial intelligence in reproductive endocrinology. Curr. Opin. Obstet. Gynecol. 34(4), 159–163 (2022)CrossRefPubMed D. Bhaskar, T.A. Chang, S. Wang, Current trends in artificial intelligence in reproductive endocrinology. Curr. Opin. Obstet. Gynecol. 34(4), 159–163 (2022)CrossRefPubMed
2.
go back to reference Q. Zhu, H. Ma, J. Wang, X. Liang, Understanding the mechanisms of diminished ovarian reserve: insights from genetic variants and regulatory factors. Reprod. Sci. 31, 1521–1532 (2024). Q. Zhu, H. Ma, J. Wang, X. Liang, Understanding the mechanisms of diminished ovarian reserve: insights from genetic variants and regulatory factors. Reprod. Sci. 31, 1521–1532 (2024).
3.
go back to reference K. Feng, Z. Zhang, L. Wu, L. Zhu, X. Li, D. Li, et al. Predictive factors for the formation of viable embryos in subfertile patients with diminished ovarian reserve: a clinical prediction study. Reprod. Sci. 31 (6) 1747–1756 (2024). K. Feng, Z. Zhang, L. Wu, L. Zhu, X. Li, D. Li, et al. Predictive factors for the formation of viable embryos in subfertile patients with diminished ovarian reserve: a clinical prediction study. Reprod. Sci. 31 (6) 1747–1756 (2024).
4.
go back to reference Z. Tan, X. Gong, C.C. Wang, T. Zhang, J. Huang, Diminished ovarian reserve in endometriosis: insights from in vitro, in vivo, and human studies—a systematic review. Int. J. Mol. Sci. 24 (21) (2023). Z. Tan, X. Gong, C.C. Wang, T. Zhang, J. Huang, Diminished ovarian reserve in endometriosis: insights from in vitro, in vivo, and human studies—a systematic review. Int. J. Mol. Sci. 24 (21) (2023).
5.
go back to reference M.I. Cedars, Managing poor ovarian response in the patient with diminished ovarian reserve. Fertil. Steril. 117(4), 655–656 (2022)CrossRefPubMed M.I. Cedars, Managing poor ovarian response in the patient with diminished ovarian reserve. Fertil. Steril. 117(4), 655–656 (2022)CrossRefPubMed
6.
7.
go back to reference Q.L. Zhang, Y.L. Lei, Y. Deng, R.L. Ma, X.S. Ding, W. Xue et al. Treatment progress in diminished ovarian reserve: Western and Chinese Medicine. Chin. J. Integr. Med. 29(4), 361–367 (2023)CrossRefPubMed Q.L. Zhang, Y.L. Lei, Y. Deng, R.L. Ma, X.S. Ding, W. Xue et al. Treatment progress in diminished ovarian reserve: Western and Chinese Medicine. Chin. J. Integr. Med. 29(4), 361–367 (2023)CrossRefPubMed
8.
go back to reference T. Ovarian Stimulation, E. Bosch, S. Broer, G. Griesinger, M. Grynberg, P. Humaidan et al. ESHRE guideline: ovarian stimulation for IVF/ICSI(†). Hum. Reprod. Open 2020(2), hoaa009 (2020)CrossRefPubMed T. Ovarian Stimulation, E. Bosch, S. Broer, G. Griesinger, M. Grynberg, P. Humaidan et al. ESHRE guideline: ovarian stimulation for IVF/ICSI(†). Hum. Reprod. Open 2020(2), hoaa009 (2020)CrossRefPubMed
9.
go back to reference R. Tal, D.B. Seifer, Ovarian reserve testing: a user’s guide. Am. J. Obstet. Gynecol. 217(2), 129–140 (2017)CrossRefPubMed R. Tal, D.B. Seifer, Ovarian reserve testing: a user’s guide. Am. J. Obstet. Gynecol. 217(2), 129–140 (2017)CrossRefPubMed
10.
go back to reference M. Rabijewski, L. Papierska, M. Binkowska, R. Maksym, K. Jankowska, W. Skrzypulec-Plinta et al. Supplementation of dehydroepiandrosterone (DHEA) in pre- and postmenopausal women—position statement of expert panel of Polish Menopause and Andropause Society. Ginekol Pol 91(9), 554–562 (2020)CrossRefPubMed M. Rabijewski, L. Papierska, M. Binkowska, R. Maksym, K. Jankowska, W. Skrzypulec-Plinta et al. Supplementation of dehydroepiandrosterone (DHEA) in pre- and postmenopausal women—position statement of expert panel of Polish Menopause and Andropause Society. Ginekol Pol 91(9), 554–562 (2020)CrossRefPubMed
11.
go back to reference N. Semrl, S. Feigl, N. Taumberger, T. Bracic, H. Fluhr, C. Blockeel et al. AI language models in human reproduction research: exploring ChatGPT’s potential to assist academic writing. Hum Reprod 38(12), 2281–2288 (2023)CrossRefPubMed N. Semrl, S. Feigl, N. Taumberger, T. Bracic, H. Fluhr, C. Blockeel et al. AI language models in human reproduction research: exploring ChatGPT’s potential to assist academic writing. Hum Reprod 38(12), 2281–2288 (2023)CrossRefPubMed
12.
go back to reference M. Eppler, C. Ganjavi, L.S. Ramacciotti, P. Piazza, S. Rodler, E. Checcucci et al. Awareness and use of ChatGPT and Large Language Models: a prospective cross-sectional global survey in urology. Eur Urol 85(2), 146–153 (2024)CrossRefPubMed M. Eppler, C. Ganjavi, L.S. Ramacciotti, P. Piazza, S. Rodler, E. Checcucci et al. Awareness and use of ChatGPT and Large Language Models: a prospective cross-sectional global survey in urology. Eur Urol 85(2), 146–153 (2024)CrossRefPubMed
13.
go back to reference L. Allahqoli, M.M. Ghiasvand, A. Mazidimoradi, H. Salehiniya, I. Alkatout, Diagnostic and management performance of ChatGPT in obstetrics and gynecology. Gynecol. Obstet. Investig. 88(5), 310–313 (2023)CrossRef L. Allahqoli, M.M. Ghiasvand, A. Mazidimoradi, H. Salehiniya, I. Alkatout, Diagnostic and management performance of ChatGPT in obstetrics and gynecology. Gynecol. Obstet. Investig. 88(5), 310–313 (2023)CrossRef
14.
go back to reference P. Irwin, D. Jones, S. Fealy, What is ChatGPT and what do we do with it? Implications of the age of AI for nursing and midwifery practice and education: an editorial. Nurse Educ. Today 127, 105835 (2023)CrossRefPubMed P. Irwin, D. Jones, S. Fealy, What is ChatGPT and what do we do with it? Implications of the age of AI for nursing and midwifery practice and education: an editorial. Nurse Educ. Today 127, 105835 (2023)CrossRefPubMed
16.
go back to reference A. Grünebaum, J. Chervenak, S.L. Pollet, A. Katz, F.A. Chervenak, The exciting potential for ChatGPT in obstetrics and gynecology. Am. J. Obstet. Gynecol. 228(6), 696–705 (2023)CrossRefPubMed A. Grünebaum, J. Chervenak, S.L. Pollet, A. Katz, F.A. Chervenak, The exciting potential for ChatGPT in obstetrics and gynecology. Am. J. Obstet. Gynecol. 228(6), 696–705 (2023)CrossRefPubMed
17.
go back to reference A. Suhag, J. Kidd, M. McGath, R. Rajesh, J. Gelfinbein, N. Cacace et al. ChatGPT: a pioneering approach to complex prenatal differential diagnosis. Am. J. Obstet. Gynecol. 5(8), 101029 (2023) A. Suhag, J. Kidd, M. McGath, R. Rajesh, J. Gelfinbein, N. Cacace et al. ChatGPT: a pioneering approach to complex prenatal differential diagnosis. Am. J. Obstet. Gynecol. 5(8), 101029 (2023)
18.
go back to reference D.S.E. Santo, J.V. Joviano-Santos, Exploring the use of ChatGPT for guidance during unexpected labour. Eur. J. Obstet. Gynecol. Reprod. Biol. 285, 208–209 (2023)CrossRefPubMed D.S.E. Santo, J.V. Joviano-Santos, Exploring the use of ChatGPT for guidance during unexpected labour. Eur. J. Obstet. Gynecol. Reprod. Biol. 285, 208–209 (2023)CrossRefPubMed
19.
go back to reference J. Caterson, O. Ambler, N. Cereceda-Monteoliva, M. Horner, A. Jones, A.T. Poacher, Application of generative language models to orthopaedic practice. BMJ Open 14(3), e076484 (2024)CrossRefPubMedPubMedCentral J. Caterson, O. Ambler, N. Cereceda-Monteoliva, M. Horner, A. Jones, A.T. Poacher, Application of generative language models to orthopaedic practice. BMJ Open 14(3), e076484 (2024)CrossRefPubMedPubMedCentral
20.
go back to reference G. Cil, K. Dogan, The efficacy of artificial intelligence in urology: a detailed analysis of kidney stone-related queries. World J. Urol. 42(1), 158 (2024)CrossRefPubMedPubMedCentral G. Cil, K. Dogan, The efficacy of artificial intelligence in urology: a detailed analysis of kidney stone-related queries. World J. Urol. 42(1), 158 (2024)CrossRefPubMedPubMedCentral
21.
go back to reference D.J. Campbell, L.E. Estephan, E.M. Sina, E.V. Mastrolonardo, R. Alapati, D.R. Amin, et al. Evaluating ChatGPT responses on thyroid nodules for patient education. Thyroid 34 (3) 371–377 (2023) D.J. Campbell, L.E. Estephan, E.M. Sina, E.V. Mastrolonardo, R. Alapati, D.R. Amin, et al. Evaluating ChatGPT responses on thyroid nodules for patient education. Thyroid 34 (3) 371–377 (2023)
22.
go back to reference M.S. Deniz, B.Y. Guler Assessment of ChatGPT’s adherence to ETA-thyroid nodule management guideline over two different time intervals 14 days apart: in binary and multiple-choice queries. Endocrine 85, 794–802 (2024) M.S. Deniz, B.Y. Guler Assessment of ChatGPT’s adherence to ETA-thyroid nodule management guideline over two different time intervals 14 days apart: in binary and multiple-choice queries. Endocrine 85, 794–802 (2024)
23.
go back to reference M. Sievert, O. Conrad, S.K. Mueller, R. Rupp, M. Balk, D. Richter et al. Risk stratification of thyroid nodules: assessing the suitability of ChatGPT for text-based analysis. Am. J. Otolaryngol. 45(2), 104144 (2024)CrossRefPubMed M. Sievert, O. Conrad, S.K. Mueller, R. Rupp, M. Balk, D. Richter et al. Risk stratification of thyroid nodules: assessing the suitability of ChatGPT for text-based analysis. Am. J. Otolaryngol. 45(2), 104144 (2024)CrossRefPubMed
Metadata
Title
Artificial intelligence in reproductive endocrinology: an in-depth longitudinal analysis of ChatGPTv4’s month-by-month interpretation and adherence to clinical guidelines for diminished ovarian reserve
Authors
Tugba Gurbuz
Oya Gokmen
Belgin Devranoglu
Arzu Yurci
Asena Ayar Madenli
Publication date
28-09-2024
Publisher
Springer US
Published in
Endocrine / Issue 3/2024
Print ISSN: 1355-008X
Electronic ISSN: 1559-0100
DOI
https://doi.org/10.1007/s12020-024-04031-8

Keynote webinar | Spotlight on menopause

Menopause can have a significant impact on the body, with effects ranging beyond the endocrine and reproductive systems. Learn about the broader systemic effects of menopause, so you can help patients in your clinics through the transition.   

Prof. Martha Hickey
Dr. Claudia Barth
Dr. Samar El Khoudary
Developed by: Springer Medicine
Watch now

Keynote webinar | Spotlight on adolescent vaping

  • Live
  • Webinar | 29-01-2025 | 18:00 (CET)

Growing numbers of young people are using e-cigarettes, despite warnings of respiratory effects and addiction. How can doctors tackle the epidemic, and what health effects should you prepare to manage in your clinics?

Watch it live: Wednesday 29th January, 18:00-19:30 CET
 

Prof. Ann McNeill
Dr. Debbie Robson
Benji Horwell
Developed by: Springer Medicine
Join the webinar

Keynote webinar | Spotlight on modern management of frailty

Frailty has a significant impact on health and wellbeing, especially in older adults. Our experts explain the factors that contribute to the development of frailty and how you can manage the condition and reduce the risk of disability, dependency, and mortality in your patients.

Prof. Alfonso Cruz-Jentoft
Prof. Barbara C. van Munster
Prof. Mirko Petrovic
Developed by: Springer Medicine
Watch now

A quick guide to ECGs

Improve your ECG interpretation skills with this comprehensive, rapid, interactive course. Expert advice provides detailed feedback as you work through 50 ECGs covering the most common cardiac presentations to ensure your practice stays up to date. 

PD Dr. Carsten W. Israel
Developed by: Springer Medizin
Start the cases

At a glance: The STEP trials

A round-up of the STEP phase 3 clinical trials evaluating semaglutide for weight loss in people with overweight or obesity.

Developed by: Springer Medicine
Read more